August 29, 2025
Duncan Holbrook-Smith, PhD
Perhaps the most fundamental challenge in mass spectrometry-based metabolomics is converting the signals from the mass spectrometer into an identified metabolite, generally referred to as “metabolite annotation,” which I will shorten to simply “annotation.” Accurate, high-quality annotation is essential for the correct interpretation of a metabolomics experiment. It can mean the difference between biological insight and confusion. Some types of signal patterns from the mass spectrometer can be associated with specific metabolites by reference to basic chemical properties that are generally known without regard to mass spectrometry, e.g. the accurate mass of a metabolite can be known from its chemical formula. Other types must be empirically connected to a metabolite, e.g. the fragmented mass signal pattern generated when whole metabolites break apart through collisions within the mass spectrometer. Fortunately, over the years, both types of information have been well catalogued in databases, and here in 2025, we can annotate features by matching observed mass spectrometry data to multiple reference libraries of mass spectrometry properties of metabolites.
The details of the annotation process and its outcome can vary in significant ways depending on the assumptions that go into the annotation workflow, the instrumentation acquiring the data, and the analysis mode that was used for the actual measurements. These differences can have huge effects on what data you get and how you can interpret it. In this post, I want to arm you with enough information to understand the key points of how annotation is done and how that affects your research.
I’m going to focus on the three main properties of an ion that are typically detected by a liquid chromatography-mass spectronomy (LC-MS) system – with apologies in advance to ion mobility fans.
These different properties are used together (or in some cases alone – more about that later) to map a feature detected by mass spectrometry onto the set of small molecules that could have potentially given rise to it. This list of approaches is not meant to be exhaustive, but should cover 95% of what you can expect from normal mass spectrometry-based untargeted metabolomics.
| MS1 | MS2 | RT | Description |
| X | X | Ions are not targeted for fragmentation, so annotation is performed by matching the known retention times for compounds with the m/z generated when they ionize (based on MS1). The number of annotations is limited by the size of the RT library against which the annotation is performed; the accuracy of annotations is dependent on the resolution and mass accuracy of the spectrometer. The weakness of this approach is that it’s possible for compounds to have the same RT and chemical formula (e.g. leucine and isoleucine, or ribose-phosphate and ribulose-phosphate), but if only one of them is included in the RT library then a peak which is composed of a mixture of ionized species can be misannotated as only one species. | |
| X | X | Ions are targeted for fragmentation, and the m/z of the unfragmented ion as well as the resulting MS2 spectrum of the fragment ions are matched against a library of MS1/MS2 spectra. Despite MS2 being an empirically determined property, MS2 patterns are consistent enough that MS1/MS2 libraries can be shared between many instruments, and they are not directly affected by chromatography. This form of annotation can be performed easily on newly developed methods without going through the long process of generating an in-house RT library for the specific chromatographic system being used. Indeed, this approach is often used in experiments where a sample is continuously perfused into the mass spectrometer in the absence of any chromatography. Some compounds have the same molecular formulae and generate similar fragments despite being separable by chromatography, so this approach can also lead to mis-annotations based on biases in the MS1/MS2 library. | |
| X | X | X | In this approach, an annotation is only ascribed to a feature when it has an RT match from the library, as well as an acceptable MS1/MS2 match. The approach to annotation will, all other things being equal, provide the highest annotation confidence available. However, even these annotations can be faulty. If a compound has the same RT and mass spec properties as a compound which is not included in the library, the feature can still be mis-annotated. Taking this approach to annotation will also generally generate the shortest list of annotations compared to the other methods listed. |
| X | Annotation based on MS1 alone offers less specificity in annotation compared to other approaches, but this downside can be outweighed by other considerations. When MS1-only annotation is performed in a system with very high mass accuracy and resolution (as is the case in GMet’s ultra high-throughput metabolomics system), a unique chemical formula will match to that ion in over 90% of cases. In many contexts, knowing the molecular formula is enough to formulate a useful hypothesis. Moreover, MS1-only annotation approaches can have very high coverage and sensitivity. Further, not needing a retention time for annotation eliminates the requirement for chromatography which can allow for a very high sample throughput. This annotation mode is most appropriate for screening and hypothesis generation, and can be very powerful and cost-efficient when combined with more extensive annotation approaches for hit validation. |
As you can see, all annotation schemes come with advantages as well as drawbacks. Which type of annotation you will want to use in your study depends on what your biological questions are, as well as the nature of the biological samples and the design of the study.
One final thought to consider: the overall research process of finding and developing biomarkers is rarely limited by the power of mass spectrometry and analytical chemistry. Modern LC-MS/MS methods that include a range of chromatography and fragmentation strategies, combined with the broad ability to access an authentic example of almost any compound, enable the (eventual) identification of the vast majority of potential molecular biomarkers. The dominant challenges almost always come down to the biological reproducibility of the marker’s behavior, the signal-to-noise it exhibits in reasonably sized studies, and the statistical discrimination power that it provides. So, if it makes practical sense in your study to start with simple MS1 accurate mass annotation when you are first establishing hypotheses, you will still be in a good position to find high quality markers that can be validated downstream in what is always a multi-step process.
If you have any questions about what approaches are appropriate for your study don’t hesitate to contact GMet, we would be happy to talk with you.