My New Year’s Data Resolutions
I am always a bit pensive this time of year – wondering if I will again invest a significant amount of time thinking about all the different resolutions I can make for myself… but never really change anything. And then actually thinking that the next year will be any different. So understanding that Halloran’s blog is not all about me, I decided to share some New Year’s resolutions about data – how we can better care for it and make it easier to consume.
As an industry, we produce data. Lots of it. It is inherent to what we do and necessary for bringing innovative therapies to patients. While much of our time is dedicated to ensuring quality for the important safety and efficacy data in our submissions to regulatory agencies, the other data we indirectly produce – our operational data – can at times flounder (un-queried for quality) in our clinical systems (or ugh, myriads of spreadsheets). While “Big Data” is a great buzzword in management circles, small data (with big problems) never gets much attention.
Our operational data is extremely important. It is a reflection of what we do and how well we do it. We can use it to evaluate our processes and initiate discussions for improvement. We can use it to forecast and extrapolate trends. We can use it to predict what will happen next and optimize expected outcomes. We can analyze it for hidden patterns that can help direct future decisions. So what is preventing us from using this rich resource as other industries have? Every major sports organization has invested in analytics to better understand which players will succeed or which plays work best in a given situation, but how can we translate this capability to our industry to resolve challenges with unproductive clinical sites? Apart from the pharmacokinetic/pharmacodynamic (PK/PD) modeling and simulation that our industry currently utilizes, we fail to prioritize operational data to make better decisions. The question is why? While there are likely many different reasons for this (i.e. lack of data culture and/or strategy, technology limitations, lack of adequate skill sets, etc.), I would like to focus on a very basic, but important requirement: data quality.
All the interesting things we can use our operational data for relies squarely on its quality. While analyzing data is exciting, unfortunately, cleaning it takes a great deal of time investment. A survey in the New York Times indicated that Data Scientists spend up to 80% of their time in data preparation1. We may be very familiar with the effort required to lock a clinical database, but we seem to continue to ignore that our other data require the same type of rigor to ensure it can be used for measurement and decision making.
So as we look back on 2019 and look forward to the New Year, we would like to share some data resolutions that we will continue to evangelize throughout 2020 and beyond.
- Exercise more. This resolution is usually at the top of everyone’s list so let’s start here. Exercise is all about measurement and at the end of a long week, there is nothing more rewarding than practicing the Friday Afternoon Measurement (FAM) method. This is an exercise designed by the “Data Doc” and fellow statistician Tom Redman that provides a glimpse into how truly high the error rate is in your data.2 The FAM method is quite simple and can be used on any spreadsheet (or validated data extract). The exercise is to assemble 10-15 critical data fields from the most recent 100 data records – and then simply count the number of records that are error-free. For example, your 100 recent records could be investigative sites and the critical data could include planned and actual visits dates (i.e.qualification, activation, monitoring, closeout, etc.) and subject parameters (i.e. screened, consented, randomized, loss to follow-up, etc.). Even before completing the FAM method, how many error-free records do you estimate you have? If you said 3%, then you fit nicely with what Tom found in his study. And note, this does not even include records that have multiple errors (including missing data). Virtually every dataset is dirty to some extent and requires cleaning. Are you out of breath from all the exercises yet?
- Tell the Truth. What better way to be more honest than to establish a single, trusted source of truth? Yes, it is easier than it sounds, and it is increasingly important as companies inherit new clinical systems. Master Data Management (MDM) initiatives are not at the top of every leader’s priority list. But honestly, they need to be. If they are serious about creating a data culture. How many different answers have you received when you simply ask, “when was first patient in (FPI)?”, “how many patients are enrolled?” or “how many sites are currently in the trial?”? MDM programs not only consistently define each of these milestones (i.e. if FPI screened or randomized), but they also define exactly the source of this data. Think data dictionary and data standards. No, MDM is not the sexiest initiative to focus on, but it is incredibly important and should be initiated before any reporting initiative.
- Save More Money. Automate, automate, automate. Being human, I have no problem trying to eliminate my fellow humans from the data collection, data aggregation, and data reporting process. In a nutshell, we are not very good at any of this. We not only lack the requisite focus and speed, but we insist on working in manual spreadsheets. I hereby am creating Halloran’s Friday Night Measurement (FNM) method – create a “Metric and Reporting” task code in your time management system and then weep over how much time your employees are spending on manually creating dashboards/reports/trackers. It truly is incredible and the perfect ROI example for an automated solution. Stay tuned for much more on this in future blogs – from eliminating data silos and “hidden data factories” to incorporating technology solutions.
- Control Portion Size. Digest only what is meaningful. We all love metrics and reports. But are all of them relevant and meaningful? To everyone? Building a governance around the data that is consumed at your company will enable you to begin eliminating the noise in your dashboards/reports and focus on the relevant signal. Understand there are different consumers of data, with different appetites. Question existing lengthy metric lists if there is no business reason or action expected. Interestingly, regulators continue to push the industry to focus on only what is important for patient safety and data integrity – ICH E6(R2) drives companies to adopt a risk-based approach to quality management. This is a good thing. Not only does this put the patient first, but it also makes good data sense. From vendor oversight metrics and Key Performance Indicators to Key Risk Indicators and Quality Tolerance Limits – we seem to overindulge on what we measure, inevitably leaving unconsumed data on our plates.
- Read More. Become more data literate. Understand that there are different skill sets required to build an analytical capability. Spending time developing critical and analytical thinking skills is an important and necessary step in building a data culture. Data literacy is now required from everyone – not just the statisticians or data scientists that work with it routinely. Understand when to use a different measure of central tendency (median vs. mean). Remind yourself what a standard deviation is. Understand that there is natural variance in your data and this is acceptable (so that we can avoid designing our dashboards or conditionally formatting our spreadsheets (ugh) with narrow thresholds, resulting in a blinding mix of greens and reds – which, by the way, is excruciating for the ~8% of the population that is colorblind). Develop internal standards of performance. Collect. Clean. Access. Analyze. Benchmark. Question.
These resolutions are in no particular order as they are all important. A critical underlying component is having a culture that understands the importance of data and fosters supportive behaviors around it. Just like having a partner to exercise with increases the likelihood that you will go to the gym, having a leadership team (equipped with a data strategy) that evangelizes data as an important asset is the most important element for analytical success.
Next year does not have to be the same as previous years when it comes to instilling a new appreciation and understanding of data. We can do this together and we can do it now.
1. For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights, Aug 17, 2014.
2. Only 3% of Companies’ Data Meets Basic Quality Standards, T Nagle, T Redman, D Sammon, Harvard Business Review, Sep 11, 2017.