Metabolite annotation – from m/z to ID

August 29, 2025

Duncan Holbrook-Smith, PhD

Perhaps the most fundamental challenge in mass spectrometry-based metabolomics is converting the signals from the mass spectrometer into an identified metabolite, generally referred to as “metabolite annotation,” which I will shorten to simply “annotation.” Accurate, high-quality annotation is essential for the correct interpretation of a metabolomics experiment. It can mean the difference between biological insight and confusion. Some types of signal patterns from the mass spectrometer can be associated with specific metabolites by reference to basic chemical properties that are generally known without regard to mass spectrometry, e.g. the accurate mass of a metabolite can be known from its chemical formula. Other types must be empirically connected to a metabolite, e.g. the fragmented mass signal pattern generated when whole metabolites break apart through collisions within the mass spectrometer.  Fortunately, over the years, both types of information have been well catalogued in databases, and here in 2025, we can annotate features by matching observed mass spectrometry data to multiple reference libraries of mass spectrometry properties of metabolites.

The details of the annotation process and its outcome can vary in significant ways depending on the assumptions that go into the annotation workflow, the instrumentation acquiring the data, and the analysis mode that was used for the actual measurements. These differences can have huge effects on what data you get and how you can interpret it. In this post, I want to arm you with enough information to understand the key points of how annotation is done and how that affects your research.

Schematic representation of features used for annotation.
Fig. 1: Schematic representation of features used for annotation.

 

I’m going to focus on the three main properties of an ion that are typically detected by a liquid chromatography-mass spectronomy (LC-MS) system – with apologies in advance to ion mobility fans.

  1. MS1 – This is the mass spectrum detected by the instrument in the absence of intentionally fragmenting any of the ions. In this mode, the contents of a sample are ionized such that the intact, charged compound is drawn into the mass spectrometer. Based on their molecular formulae (e.g. C5H10N2O3 for the amino acid glutamine), ionized metabolites will give rise to signals with distinct mass-to-charge ratios (m/z) that follow predictable patterns. The mass spectrometer will then measure the ion intensities associated with a range of mass-to-charge ratios. With sufficient resolution, collecting MS1 data can tell you the molecular formulae of the ions detected by the mass spectrometer. What’s so nice about MS1 is that the expected MS1 m/z of a compound can be calculated from its chemical formula – it is a fundamental chemical description, independent of any mass spectrometer. Since we know the chemical formulae of millions of compounds, we can draw on a huge library of expected m/z values, and thus annotate quite broadly on that basis. (Note: the scale and construction of such libraries can also have an impact on annotation and is a topic all on its own, this will be addressed in a future write-up.)
  2. MS2 – This is the mass spectrum that is detected after an ion or a collection of ions are deliberately fragmented by the mass spectrometer. The mass spectrometer can then detect the m/z values and intensities for those fragments. The set of fragments that are generated when an ion breaks apart depends in large part on its molecular structure, so this approach can often be used to distinguish compounds with the same molecular formula but a different chemical structure. MS2 therefore provides more structural information when added to MS1 data, but to know what MS2 ions are generated from fragmenting a particular compound is usually an empirical question (I won’t cover exceptions here). This means that the library of MS2 spectra you can compare your data to in order to annotate ions based on MS2 spectra is going to be much smaller in scale than for MS1 and is more specific to your instrument and its settings. Therefore, if you want to annotate using MS2, you will have to leave a lot of ions unannotated. Of course, this reduction in the number of annotations comes with increased specificity in the annotation.
  3. Retention time – This is a feature of chromatographic separation, not dependent on the mass spectrometer. It refers to the time into a chromatographic separation process (either liquid chromatography or gas chromatography) when a particular feature is detected.  A chromatography method can be chosen to separate molecules from among a variety of features of the compounds being analyzed. For example, retention of compounds in liquid chromatography on a reverse-phase column will tend to be based on their hydrophobicity; alternately, separation can be based on molecular charge; molecular size can be a factor – the list of options is long. Retention time is a nice property to incorporate for annotation because it’s orthogonal to mass spectrometry. Some metabolites have the same molecular formula and generate very similar fragments, which means that it’s impossible to distinguish between them by their MS1 or MS2 spectra. Some of them, though, can be resolved fairly easily by chromatography. However, the retention time of a compound will depend on the chromatography system used to separate the analytes both in terms of the stationary and mobile phases. It is not an inherent property of a molecule, and it can even shift over time or over the course of a single study! This means that retention time libraries are generally going to be highly empirical, smaller than MS1 libraries, and specific to a particular chromatography system. As a result, while having an accurate retention time library will augment mass information to increase the confidence in annotations, it is also true that requiring retention time in your annotation workflow will reduce the number of annotations you can get.

These different properties are used together (or in some cases alone – more about that later) to map a feature detected by mass spectrometry onto the set of small molecules that could have potentially given rise to it. This list of approaches is not meant to be exhaustive, but should cover 95% of what you can expect from normal mass spectrometry-based untargeted metabolomics.

MS1 MS2 RT Description
X X Ions are not targeted for fragmentation, so annotation is performed by matching the known retention times for compounds with the m/z generated when they ionize (based on MS1). The number of annotations is limited by the size of the RT library against which the annotation is performed; the accuracy of annotations is dependent on the resolution and mass accuracy of the spectrometer. The weakness of this approach is that it’s possible for compounds to have the same RT and chemical formula (e.g. leucine and isoleucine, or ribose-phosphate and ribulose-phosphate), but if only one of them is included in the RT library then a peak which is composed of a mixture of ionized species can be misannotated as only one species.
X X Ions are targeted for fragmentation, and the m/z of the unfragmented ion as well as the resulting MS2 spectrum of the fragment ions are matched against a library of MS1/MS2 spectra. Despite MS2 being an empirically determined property, MS2 patterns are consistent enough that MS1/MS2 libraries can be shared between many instruments, and they are not directly affected by chromatography. This form of annotation can be performed easily on newly developed methods without going through the long process of generating an in-house RT library for the specific chromatographic system being used. Indeed, this approach is often used in experiments where a sample is continuously perfused into the mass spectrometer in the absence of any chromatography. Some compounds have the same molecular formulae and generate similar fragments despite being separable by chromatography, so this approach can also lead to mis-annotations based on biases in the MS1/MS2 library.
X X X In this approach, an annotation is only ascribed to a feature when it has an RT match from the library, as well as an acceptable MS1/MS2 match. The approach to annotation will, all other things being equal, provide the highest annotation confidence available. However, even these annotations can be faulty. If a compound has the same RT and mass spec properties as a compound which is not included in the library, the feature can still be mis-annotated. Taking this approach to annotation will also generally generate the shortest list of annotations compared to the other methods listed.
X Annotation based on MS1 alone offers less specificity in annotation compared to other approaches, but this downside can be outweighed by other considerations. When MS1-only annotation is performed in a system with very high mass accuracy and resolution (as is the case in GMet’s ultra high-throughput metabolomics system), a unique chemical formula will match to that ion in over 90% of cases. In many contexts, knowing the molecular formula is enough to formulate a useful hypothesis. Moreover, MS1-only annotation approaches can have very high coverage and sensitivity. Further, not needing a retention time for annotation eliminates the requirement for chromatography which can allow for a very high sample throughput. This annotation mode is most appropriate for screening and hypothesis generation, and can be very powerful and cost-efficient when combined with more extensive annotation approaches for hit validation.

 

As you can see, all annotation schemes come with advantages as well as drawbacks. Which type of annotation you will want to use in your study depends on what your biological questions are, as well as the nature of the biological samples and the design of the study.

One final thought to consider: the overall research process of finding and developing biomarkers is rarely limited by the power of mass spectrometry and analytical chemistry. Modern LC-MS/MS methods that include a range of chromatography and fragmentation strategies, combined with the broad ability to access an authentic example of almost any compound, enable the (eventual) identification of the vast majority of potential molecular biomarkers. The dominant challenges almost always come down to the biological reproducibility of the marker’s behavior, the signal-to-noise it exhibits in reasonably sized studies, and the statistical discrimination power that it provides.  So, if it makes practical sense in your study to start with simple MS1 accurate mass annotation when you are first establishing hypotheses, you will still be in a good position to find high quality markers that can be validated downstream in what is always a multi-step process.

If you have any questions about what approaches are appropriate for your study don’t hesitate to contact GMet, we would be happy to talk with you.

Previous Article5 Questions with General Metabolics CEO Edward Driggers Next ArticleHigh-throughput metabolomics: is GMet’s flow injection analysis platform right for my study?