Skip to Content

Data – Notorious for Being Dirty 

Last week, I had the pleasure of speaking at the Halloran Speaker Series about “embarking on your data literacy journey” – a discussion that built upon two recent perspectives on data quality that I penned (parts one and two). As a proud, but recovering statistician, it is mind-boggling how many discrepancies still exist when it comes to gathering, analyzing and reporting data in this industry. With artificial intelligence and machine learning gaining momentum in life sciences, the emphasis on quality data is paramount. And as we continue to automate data-driven processes, errors relating to poor data quality will be embarrassingly magnified.

Being in an industry that relies on data to excel and survive, there is still so much that we need to get right. For instance, did you know that only 3% of companies’ data meets basic quality standards? 1. This could be for several reasons — lack of skill sets (analysis and interpretation), biases and hidden data factories. In preparing for this discussion, I pulled a public safety dataset from the San Diego Police Service Calls for 2020 to see how their data was captured and stored. The screenshot of the spreadsheet below indicates all the areas where data was inputted into the database incorrectly, this includes dates, priority levels, call types and addresses.

To effectively navigate and set ourselves up for success, I encourage your organization to build a culture of quality with an eye on the following:

  1. Data quality – determine the current state of your company’s data quality with the Friday Afternoon Measurement (FAM) method to see how high the error rate is in your data. Self-awareness is key.
  2. Data integrity – prioritize the creation of a Master Data Management (MDM) strategy, including a data glossary with clear definitions on collecting, analyzing and communicating the information company-wide.
  3. Data governance – evaluate unnecessary activities that could be costing your company money and simplify the data collection process. It’s in your best interest to do so.
  4. Digital implementation – focusing only on the technology piece will result in a solution that falls flat. Develop a plan and educate all stakeholders on the different roles and processes that will be needed/impacted to make this a seamless implementation that will ultimately serve to benefit the people, the process and the technology infrastructure.
  5. Data visualization – do not burden the data consumer with ineffective visualizations – the goal is to efficiently tell a story with your data. Implement visual practices that can help you communicate your data clearly and effectively without unnecessarily obscuring (i.e. pie charts) or overwhelming (i.e. problematic colors) the key takeaways.

So, what are you waiting for? Give it a go and see what appetite your organization may have for making some changes. Need some help? Drop us a line.


1. Harvard Business Review, September 11, 2017