Under the Hood #2: Automated Data Check
This post dives into how Move Analytical automates the process of assessing mass spec data quality before moving on to metabolite quantification. In most labs, this step still looks the same: open a few raw data files and decide whether things “look good enough” to move on. We explore the limitations to this approach and why automation is preferable.
Motivation
Mass spectrometrists are skeptics by nature. If you’ve been doing mass spectrometry long enough, you’ve probably lived through an era when mass specs were a lot less dependable. Even today, many of us walk into the lab each morning hoping everything ran overnight…while mentally preparing to troubleshoot. The overall motivation for a reliable “Data Check” before moving on to the data analysis phase is to save time, by not having to go back and recollect any data after the lab and instrument have already moved on to other projects. It is a common practice either during or after data collection to inspect (“check”) the raw data files. This practice typically involves opening a subset of runs from sequence in the mass spec vendor software, and just “taking a look” to see if the data is generally good quality. What mass spectrometrists are looking at in this data does vary, but generally it revolves around a few specific items:
- Intensity – Is there sufficient signal for one or more analytes to indicate that the sample was prepared and run correctly?
- Chromatography – Do the peaks generally seem to have good peak shape?
This practice makes sense; the last thing you want is to turn the instrument over the next project or next person in line and proceed with further data analysis only then to realize there was a problem with the data collected in your experiment. However, there are real drawbacks to an ad hoc approach to the data check.
- Time-consuming. Looking through lots of raw data files is slow.
- Incomplete. Most users do not check every run (who has time for that??), so what happens if one particular sample analysis is missed or low quality?
- Subjective. The decision on quality can vary by user and by day (how much coffee have you had that morning, and did your kids keep you up?)
- Less reliable for non-experts. A non-expert or new user may not be able to confidently make a decision on data quality.
The cost of missing a data quality issue isn’t just bad data—it’s lost instrument time, delayed decisions, and the need to revisit experiments long after the lab has moved on.
How Can It Be Done Differently?
As we have previously discussed with system suitability testing, we believe it is desirable to use the fewest metrics that give the right answer to a simple question: Did the sample preparation and data collection generally go as planned?
To improve upon the status quo we have identified several key themes during our discussions with customers:
- Automation. We’d like the process to run automatically so it’s available to everyone without any direct actions from the user.
- Every file. We’d like a result for every file to lessen the chance of a ‘one off’ missed injection or sample prep error slipping through the cracks during the check.
- Dashboard. A heads-up dashboard display during the progress of the experiment is nice for ‘checking in’ on the sequence that’s running before you go home, or even remotely.
- Actionable. The results of the metrics should yield an intuitive result such that the user generally knows what to do if a sample fails the data check.
- Permanent Record. The results of the data check should remain as part of the experimental record of the project, in case questions arise later.
To accomplish this task, we favor the application of hard cutoffs for each assay metric, designed to meet the needs of a particular analysis. For instance, if it is known that a critical readout of the assay is independent quantification of isomers that need to be chromatographically resolved, assessing the critical pair resolution for every run is a good ‘data check’ readout. This is something that is difficult to do with traditional manual checks (usually no better than an ‘eyeball assessment’ by an expert) but can readily be automated and placed on a dashboard.
Practical Application
In our metabolomics kits and software, we have taken user input (defined above) and specific assay validation data to simplify the data check as much as possible. Currently we have implemented the following tests across all assays (see image below for an example of the implementation). Each metric appears in real-time on a dashboard, within seconds of the raw data file completing acquisition. This allows the operator to act immediately…whether that means stopping a sequence, flagging a sample, or simply gaining confidence that things are running as expected.
- Intensity. We use summed extracted ion signal from standards spanning multiple classes and retention times. In practice, this gives us a fast sanity check that the sample was prepared correctly and that the instrument is responding as expected with minimal drift.
- Retention Index. In order to perform reliable compound peak assignment and quantification, relative elution order and position of compounds in the chromatography is a key metric. Therefore, we calculate retention index (relative retention time) for at least two compounds of different classes, to ensure that there are no changes in separation or sample preparation conditions that would cause poor data quality.
- Critical Pair Resolution. Resolution between a critical analyte pair in the separation is one of the best real-time metrics for separation quality. We establish the expectation for acceptable resolution for at least one analyte critical pair in each kit. In the case of MoveKit CE, we use Isoleucine/Leucine.
- FAIR principle. The data check assessment itself should align with FAIR data principles. We do this in two ways: 1. The raw data analyses used for assessment are saved as a Skyline file in the project folder, and 2. The assessment results are stored as part of the permanent MoveApp project record.
What Metrics Are Left Out?
In reviewing our data check workflow with technical experts, some common questions emerge about metrics that notably we do not evaluate at this step, which might be desirable. A few words on those here.
- Mass Accuracy. MoveApp workflows explicitly check mass accuracy during the system suitability test (SST) step. Mass accuracy failure during a batch is uncommon.
- Metabolite identifications. Less common in metabolomics but very common in proteomics, is the evaluation of the number of identifications on a per-run basis. Since our analysis uses a more traditional targeted analysis workflow we have not implemented this kind of metric.
- Sample loading. In order to allow the same assessment to be used across all kit sample types (blank, QCs, calibrators and unknown samples), we save quantitative sample loading based on specific analytes for the data analysis step.
Of course, we are always looking for ways to improve our product so if we find that we currently exclude a critical measurement that is important to spot a sample prep or instrument performance failure, we will update data check…and we fully expect to do so as time goes on! We remain dedicated to listening to, and working with, our customers to continually improve our products.
Closing Thoughts
Ultimately, a good data check doesn’t replace expertise. An automated workflow encodes it, making high-quality decisions repeatable, scalable, and available to every user in the lab. In a perfect world, we hope this workflow helps labs to have confidence in the raw data very soon after data collection. In a less perfect world (or on a less perfect day) we believe a good “Data Check” helps labs to realize that intervention is required sooner, rather than later. We hope you will give it a try and let us know what you think.
About the Authors
All Move Analytical LLC cofounders contributed to the content of this blog post and implementation of our Data Check. J. Will Thompson serves as Operations Lead, J. Scott Mellors as Science Lead, James Campbell as Software and AI Lead, and J. Michael Ramsey as Strategic Lead.