1 Introduction

The year 2020 has seen an enormous rise in applications of ion mobility mass-spectrometry (IMS), and data-independent acquisition (DIA) methods of analyses in both metabolomics and lipidomics. In terms of application, mass spectrometry as a technology promises advance care for cancer patients in clinical and intraoperative use (J. Zhang, Ge, et al., 2020; Zhang, Sans, et al., 2020), imaging mass spectrometry (MSI) based natural products (NPs) discovery (Spraker et al. 2020), nanoscale secondary ion mass spectrometry (nanoSIMS) usage in subcellular MS imaging and quantitative analysis in organelles (Thomen et al. 2020), capturing urban sources of contamination from high resolution mass spectrometry (HRMS) (Bowen et al., 2020) to detection of COVID-19 disease signatures (Mahmud & Garrett, 2020).

From an analytical method development stand point, interesting developments such as plasma pseudotargeted metabolomics method using ultra-high-performance liquid chromatography–mass spectrometry (UHPLC-MS) (Zheng et al. 2020) and the need for combined use of nuclear magnetic resonance spectroscopy and mass spectrometry approaches in metabolomics (Letertre et al. 2020) are notable. For volume-limited samples, solutions such as sub-nanoliter metabolomics via LC–MS/MS such as pulsed MS ion generation method known as triboelectric nanogenerator inductive nanoelectrospray ionization (TENGi nanoESI) MS (Li et al. 2020) was introduced. Flow-injection Orbitrap mass spectrometry (FI-MS) enabled reproducible detection of ~ 9,000 and ~ 10,000 m/z features in metabolomics and lipidomics analysis of serum samples, respectively, with a sample scan time of ~ 15 s and duty time of ~ 30 s; a ~ 50% increase versus current spectral-stitching FI-MS methods (Sarvin et al. 2020). A spatial metabolomics pipeline (metaFISH) that combined fluorescence in situ hybridization (FISH) microscopy and high-resolution atmospheric-pressure matrix-assisted laser desorption/ionization mass spectrometry to image host–microbe symbioses and their metabolic interactions (Geier et al. 2020) was also reported. Another study that compared the full-scan, data-dependent acquisition (DDA), and data-independent acquisition (DIA) methods in HR LC–MS/MS based metabolomics to reveal that spectra quality is better in DDA with average dot product score 83.1% higher than DIA and the number of MS2 spectra (spectra quantity) is larger in DIA (Guo & Huan, 2020a). Furthermore, it was shown that DDA mode consistently generated fewer uniquely found significant features than full-scan and DIA modes (Guo & Huan, 2020b).

Using with Raman spectroscopy, followed by stimulated Raman scattering (SRS) microscopy and Raman-guided subcellular pharmaco-metabolomics in metastatic melanoma cells revealed intracellular lipid droplets that helped identify a previously unknown susceptibility of lipid mono-unsaturation within de-differentiated mesenchymal cells with innate resistance to BRAF inhibition (Du et al. 2020). Application of 31P NMR was shown to hold potential of expanding the coverage of the metabolome by detecting phosphorus-containing metabolites (Bhinderwala et al. 2020).

The effectiveness of the flow injection analysis-continuous accumulation of selected ions Fourier transform ion cyclotron resonance mass spectrometry (FIA-CASI-FTMS) workflow utilizing isotopic fine structure (IFS) for molecular formula assignment was realized for metabolomics applications (Thompson et al. 2020). A buffer modification workflow (BMW) in which the same sample is run by LC–MS in both liquid chromatography solvent with 14NH3–acetate buffer and in solvent with the buffer modified with 15NH3–formate, resulted in characteristic mass and signal intensity changes for adduct peaks, facilitating their annotation (Lu et al. 2020) was also demonstrated. In other innovative applications, use of short columns and direct solvent switches allowed for fast screening (3 min per polarity), where a total of 50 commonly reported diagnostic or explorative biomarkers were validated with a limit of quantification that was comparable with conventional LC–MS/MS (van der Laan et al. 2020).

From the stand point of data analysis, metabolomics as a field is starting to benefit by applying machine learning (ML) (Liebal et al. 2020) and deep learning (DL) (Pomyen et al. 2020; Sen et al. 2020) approaches to address diverse challenges from data preprocessing to biological interpretation. In the context of systems and personalized medicine LIONESS (Linear Interpolation to Obtain Network Estimates for Single Samples) and ssPCC (single sample network based on Pearson correlation) were evaluated and compared in the context of metabolite–metabolite association networks (Jahagirdar & Saccenti, 2020). In annotation domains for low resolution GC–MS data, usage of DL ranking for small molecules identification, a deep learning ranking model outperformed other approaches and enabled reducing a fraction of wrong answers (at rank-1) by 9–23% depending on the used data set (Matyushin et al. 2020). In the age of artificial intelligence, spatial metabolomics and IMS promise to revolutionize biology and healthcare (Alexandrov, 2020). Approaches such as an integrated strategy of fusing features and removing redundancy based on graph density (FRRGD) were proposed that greatly enhanced the metabolome detection coverage with low abundance (Ju et al. 2020).

For a software survey of other mass-spectrometry derived omics tools, packages, resources, softwares and databases, readers can consult other treatise for metaproteomics (Sajulga et al. 2020), data‐independent acquisition mass spectrometry‐based proteomics (F. Zhang, Ge, et al., 2020; Zhang, Sans, et al., 2020), single cell and single cell-type metabolomics (B. B. Misra, 2020a) among others.

Diverse online resources such as OMICtools (http://omictools.com/) (Henry et al. 2014), Fiehn laboratory pages (http://fiehnlab.ucdavis.edu/ and http://metabolomics.ucdavis.edu/Downloads), the International Metabolomics Society’s resource pages, software repositories such as Comprehensive R Archive Network (CRAN) (https://cran.r-project.org/web/packages/available_packages_by_name.html), Bioconductor (https://www.bioconductor.org/), the Python Package Index (PyPI) (https://www.pypi.org), GitLab (https://www.gitlab.com), and GitHub (https://www.github.com/) are excellent resources to obtain software tools, databases and resources for metabolomics research. Metabolomics Tools Wiki claimed to be an updated resource for metabolomics tools, databases and software resources has ceased to be updated since 2017 (Spicer et al. 2017). Whilst there exists a plethora of programming languages, modern interpreted scripting languages such as R, Python, Raku, Ruby, and MATLAB are evidently popular in metabolomics.

Building on the previously established review structure this overview of major tools and resources in metabolomics, spanning 2015–2019 (B. Misra & van der Hooft, 2015; O’Shea & Misra, 2020) is organized into the following sections: (1) Platform-specific tools, (2) Preprocessing and QC tools, (3) Annotation tools, (4) Multifunctional tools, (5) Tools for statistical analysis and visualization, (6) Databases, and (7) Other specialized tools.

Table 1 provides a summary of all reviewed resources and their availability. Furthermore, in Table 2, highlighted are unpublished tools that can be found in the CRAN and PyPI software repositories that are deemed useful for the metabolomics research community, but are not associated with a scholarly article that is published.

Table 1 The entire list of reviewed tools is organized by important analytical steps in metabolomics data analysis and includes details regarding their platform dependency, and implementation, e.g., programming language (R, Python, Java, C/C ++, etc.) or web browser based and their availability
Table 2 List of useful R/ Bioconductor packages that surfaced/ were improved in 2020

2 Platform-specific tools

Metabolomics as a discipline depends on mass spectrometry and spectroscopy analytical platforms to generate high through put omics scale data. These include, and are not limited to liquid chromatography-mass spectrometry (LC–MS), gas chromatography-mass spectrometry (GC–MS), capillary electrophoresis-mass spectrometry (CE-MS), and spectroscopic methods such as 1H-NMR, 13C-NMR, Raman, and Fourier transform infrared (FTIR) among others. In this section, I discuss all the tools that appeared in 2020 for analyses of datasets that are specific to a metabolomics platform or technology, i.e., LC–MS, GC–MS, and NMR.

Automated spectraL processing system for NMR (AlpsNMR), is an R-package that provides automated signal processing for untargeted NMR metabolomics datasets by performing region exclusion, spectra loading, metadata handling, automated outlier detection, spectra alignment and peak-picking, integration and normalization (Madrid-Gambin et al. 2020). The tool can load Bruker and JDX samples and can preprocess them for downstream statistical analysis.

Signature map** (SigMa), developed as a standalone tool using MATLAB dependencies, for processing raw urine 1H-NMR spectra into a metabolite table (Khakimov et al. 2020). SigMa relies on the division of the urine NMR spectra into Signature Signals (SS), Signals of Unknown spin Systems (SUS) and bins of complex unresolved regions (BINS), thus allowing simultaneous detection of urinary metabolites in large-scale NMR metabolomics studies using a SigMa chemical shift library and a new automatic peak picking algorithm.

NMR filter, is a stand-alone interactive software for high-confidence NMR compound identification that runs NMR chemical shift predictions and matches them with the experimental data, where it defines the identity of compounds using a list of matching rates and correlating parameters of accuracy together with figures for visual validation (Kuhn et al. 2020).

MSHub/ electron ionisation (EI)-Global Natural Product Social (GNPS) Molecular Networking analysis, as a platform enables users to store, process, share, annotate, compare and perform molecular networking of both unit/low resolution and GC–HRMS data (Aksenov et al. 2020). GNPS-MassIVE is a public data repository for untargeted MS2 data, EI-MS data, with sample information (metadata) and annotated MS2 spectra (Aron et al. 2020). MSHub performs the auto-deconvolution of compound fragmentation patterns via unsupervised non-negative matrix factorization and quantifies the reproducibility of fragmentation patterns across samples, followed by GNPS molecular networking analyses.

RGCxGC toolbox, is an R-package that aids in analysis of two dimensional gas chromatography-mass spectrometry (2D GC–MS) data by offering pre-processing algorithms for signal enhancement, such as baseline correction based on asymmetric least squares, smoothing based on the Whittaker smoother, and peak alignment 2D Correlation Optimized War** and multiway principal component analysis (Quiroz-Moreno et al. 2020).

3 Preprocessing and quality control (QC) tools

In untargeted metabolomics workflows that use either LC–MS/MS, GC–MS or NMR, depend a lot on pre-processing of the acquired raw datasets prior to statistical analyses and interpretation. Preprocessing typically involves tools that aid in the detection of masses (as m/z’s) from mass spectra (i.e., feature detection), construct and display extracted ion chromatograms, detect chromatographic peaks, deconvolution, peak alignment, data matrix curation steps such as batch and blank corrections to filtration and normalization steps, and quality assessments. Though, there are decade old popular preprocessing tools available to the community in the form of xcms (Tautenhahn et al. 2008), MZmine 2 (MZmine Development Team 2015), MS-DIAL (Tsugawa et al. 2015) there is a consistent effort to improve the workflows- from reducing computational time, to develo** graphical user interfaces (GUIs) for users to render them user friendly to addressing challenges associated with interpretation of data from advanced platforms such as HRMS data or those from IMS, MSI etc. In fact, a recent comparative effort (among software tools such as software packages MZmine 2, enviMass, Compound Discoverer™, and XCMS Online) demonstrated a low coherence between the four processing tools, as overlap of features between all four programs was only about 10%, and for each software between 40 and 55% of features did not match with any other program (Hohrenk et al. 2020). Moreover, quality control (QC) tools are important to take care of systematic and random variations/ errors induced during experimental and analytical workflows. Batch effects can pose a lot of challenges, i.e., introduction of experimental artifacts that can interfere with the measurement of phenotype‐related metabolome changes in metabolomics data (Han & Li, 2020), and data normalization strategies, tools, and software solutions available are reviewed to circumvent some of these challenges (B. B. Misra, 2020b). In this section, I cover the preprocessing and the QC tools that appeared in 2020.

Correlation-based removal Of multiPlicities (CROP), implemented as an R-package is a visual post-processing tool that removes redundant features from LC–MS/MS based untargeted metabolomic data sets (Kouřil et al. 2020), where it groups highly correlated features within a defined retention time (RT) window avoiding the condition of specific m/z difference making it a second-tier strategy for multiplicities reduction. The output is a graphical representation of correlation network allowing a good understanding of the clusters composition that can aid in further parameter tuning.

neighbor-wise compound-specific Graphical Time War** (ncGTW), is an integrated reference-free profile alignment method, implemented as an R-package and is available as a plugin for xcms that aids in detecting and fixing the bad alignments (misaligned feature groups) in the LC–MS data to render accurate grou** and peak-filling (Wu et al. 2020).

TidyMS, is a Python package for preprocessing of untargeted LC–MS/MS derived metabolomics data that reads raw data fro-m a .mzML file format, generates spectra and total ion chromatograms (TICs), allows peak picking, feature detection, reads processed data from xcms, MZmine 2 among others, offers functionalities for data matrix curation, normalization, imputation, scaling, quality metrics, QC-based batch corrections and interactive visualization of results (Riquelme et al. 2020).

AutoTuner, available as an R-package, is a parameter optimization algorithm that obtains parameter estimates from raw data in a single step as opposed to many iterations in a data-specific manner to generate robust features from untargeted LC–MS/MS runs (McLean & Kujawinski, 2020). For input, AutoTuner requires at least 3 samples of raw data converted from proprietary instrument formats (e.g. .mzML, .mzXML, or .CDF).

remove unwanted variation in a hierarchical structure (hRUV), is an R-package (also available as Shiny app) that aids in removal of unwanted variation from large scale LC–MS metabolomics studies which it accomplishes by progressively merging the adjustments in neighboring batches (Taiyun Kim, Owen Tang, Stephen T Vernon, Katharine A Kott, Yen Chin KoaTaiyun Kim, Owen Tang, Stephen T Vernon, Katharine A Kott, Yen Chin Koay, John Park, David James, Terence P Speed, Pengyi Yang, John F. O’Sullivan, Gemma A Figtree, Jean Yee Hwa Yangy, 2020). The package uses sample replicates to integrate data from several batches for removal of intra-batch signal drift and inter-batch unwanted variation and outperforms existing tools while retaining biological variation. For assessment of the results, a user can visualize results as three kinds of diagnostic plots, i.e., principal component analysis (PCA) plots, relative log expression (RLE) plots, and metabolite run plots.

MetumpX, is a Ubuntu-based R- package that facilitate easy download and installation of 103 tools spread across the standard untargeted MS- based metabolomics pipeline (Wajid et al. 2020). The package can aid in automatically installation of software pipelines truly speeding up the learning curve to build software workstations.

MeTaQuaC, is an R- package and aids in implementation of concepts and methods for Biocrates kits and its application in targeted LC–MS metabolomics workflows and creates a QC report containing visualization and informative scores, and provides summary statistics, and unsupervised multivariate analysis methods among others (Kuhring et al. 2020).

Dbnorm, is an R-package that allows visualization and removal of technical heterogeneity from large scale metabolomics dataset, after allowing inspection at both in macroscopic and microscopic scales at both sample batch and metabolic feature levels, respectively (Bararpour et al., 2020). dbnorm includes several statistical models such as, ComBat (parametric and non-parametric)-model from sva package that are already in use for metabolomics data normalization, and ber function.

MetaClean, available as an R-package, uses 11 peak quality metrics and 8 diverse ML algorithms to build a classifier for the automatic assessment of peak integration quality of peaks from untargeted metabolomics datasets (Chetnik et al. 2020). It was shown that AdaBoost algorithm and a set of 11 peak quality metrics were best performing classifiers, and applying this framework to peaks retained after filtering by 30% relative standard deviation (RSD) across pooled QC samples was able to further distinguish poorly integrated peaks that were not removed from filtering alone.

NeatMS, is a Python package that is available for untargeted LC–MS signal labelling and filtering, which enables automated filtering out of false positive MS1 peaks reported by routine LC–MS data processing pipelines. It relies on neural networking-based classification, and can process outputs from MZMine 2 and xcms analysis.

4 Annotation tools

Metabolite annotation remains a critical step that defines the success or failure of untargeted metabolomics efforts. With newer technologies such as collision cross section (CCS) data for ion mobility, high resolution mass spectra from Orbitrap, direct injection data, data independent acquisition (DIA)/ all ion fragmentation (AIF), imaging MS and multi-dimensional chromatography the annotation results have gained additional impetus in compound identification, but these methods have offered newer challenges in themselves for tool development. False discovery rates (FDRs) of annotations indicate that low FDRs yield low number yet reliable annotations, whereas higher FDR report high number of annotations by those of poor-quality annotations. Though metabolite annotation efforts can benefit from RT as an orthogonal information, efforts for combining RT predictions with MS/MS data is currently lacking (Witting & Böcker, 2020). Clearly reference spectra and spectral DBs/ libraries are not enough to annotate roughly 5–30% of the total features captured (depending on the environmental/ biological matrices in question) in a given mass spectrometry-based metabolomics dataset. Though experimentally obtained MS/MS data and NMR data on pure standards are precious, and aid in development of computational solutions for compound identification, they do not suffice at their current numbers, accessibility, and availability. Moreover, in 2020, the Metabolite Identification Task Group of the International Metabolomics Society assessed and proposed a set of revised reporting standards for metabolite annotation/ identification and requested community feedback for levels from A-G, from defining an enantiomer or a chiral metabolite (level A) (to unknown molecular formula with specific spectral features (G). Once formalized, these would positively affect and improve reporting standards in studies and the publication landscape in metabolomics research. In Fig. 1, 2, 3, shown are the software interfaces and analysis outputs for some of the annotation tools discussed in the following sections.

Fig. 1
figure 1

Snapshots of a subset of tools and resources discussed in this review. a SMART 2.0 outputs for a demo metabolite swinholide A, b Outputs from MESSAR for corosollic acid, c Demo analysis on FOBI

Fig. 2
figure 2

Snapshots of a subset of tools and resources discussed in this review. a Outputs on demo data on NRPro, b REDU analysis results for a demo data, c MetENPWeb analysis results

Fig. 3
figure 3

Snapshots of a subset of tools and resources discussed in this review. Web interfaces and snapshot of outputs for demo data on a VIIME, b MetaboliteAutoPlotter, and c SUMMER

MEtabolite SubStructure Auto-Recommender (MESSAR), is a web-based tool that provides an automated method for substructure recommendation guided by association rule mining, captures potential relationships between spectral features and substructures as learned from public spectral libraries for suggesting substructures for any unknown mass spectrum (Y. Liu, Mrzic, et al., 2020; Liu, Nellis, et al., 2020). Though the interface does not perform batch processing currently, it provides an open-source approach to annotate substructures.

Small Molecule Accurate Recognition Technology (SMART 2.0), is an artificial intelligence (AI) -based ML tool for mixture analysis in NMR data analysis workflow that aid in subsequent accelerated discovery and characterization of new NPs. SMART 2.0 generates structure hypotheses from two dimensional NMR data [1H-13C- Hetero‐nuclear Single Quantum Coherence (HSQC) spectra], then compares with a query HSQC spectrum against a library of > 100,000 NPs to provide outputs as simplified molecular-input line-entry system (SMILES), structures, cosine similarity, and molecular weights for a given compound of interest.

MetFID, is a tool that uses an artificial neural network (ANN) trained for predicting molecular fingerprints based on experimental MS/MS data (Fan et al. 2020). MetFID retrieves candidates from metabolite databases using molecular formula or m/z value of the precursor ion of the analyte and the candidate whose fingerprint is most analogous to the predicted fingerprint which is used for metabolite annotation. However, no codes or accessible tools/ repositories are provided with the published scholarly article.

CPVA, is a web-based tool that is aimed at the analyses of untargeted LC–MS/MS generated metabolomics data for visualization and annotation of LC peaks, where the tool performs functions such as annotation of adducts, isotopes and contaminants, and allows visualization of peak morphology metrics (Luan et al. 2020). Further, the tool aids in capturing potential noises and contaminants encountered in chromatographic peak lists generated from LC–MS/MS data, thus resulting in a reduced false positive peak calling in order to help data quality and downstream data processing.

NRPro, is a web-based application dedicated for dereplication and characterization of peptidic natural products (PNPs) from LC–MS/MS datasets that performs automatic peak annotation through a statistically validated scoring system (Ricart et al. 2020). An example NRPro dereplication effort revealed that the software was able to identify 169 PNPs in a dataset of 352 spectra with an FDR of 3.55.

MetENP/MetENPWeb, is available as an R-package on the Metabolomics Workbench repository, also deployed as a web-based application that allows extending the metabolomics data enrichment analysis to include Kyoto Encyclopedia of Genes and Genomes (KEGG)-based species-specific pathway analysis, pathway enrichment scores, gene-enzyme data, and enzymatic activities of the significantly altered metabolites on any Metabolomics Workbench submitted studies/ datasets (Choudhary et al. 2020). Various plots and visualizations such as volcano plots and bar graphs are available to the user of the tool after the analyses.

Class Assignment aNd Ontology Prediction Using mass Spectrometry (CANOPUS), available as a part of SIRIUS (Dührkop et al. 2019) suite of software, is a computational tool for systematic compound class annotation from fragmentation spectra (Dührkop et al. 2020). CANOPUS uses a deep neural network to predict 2,497 compound classes from fragmentation spectra, including all biologically relevant classes, and explicitly targets compounds for which neither spectral nor structural reference data are available in addition to predicting compound classes lacking MS/MS training data. Recently, CANOPUS was made available for analysis of MS/MS spectra obtained from both positive and negative mode ionization datasets.

molDiscovery, is a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by (i) using an efficient algorithm to generate mass spectrometry fragmentations, and (ii) learning a probabilistic model to match small molecules with their mass spectra (Mohimani et al. 2020). A search of over 8 million spectra from the GNPS molecular networking infrastructure demonstrated that this probabilistic model can correctly identify nearly six times more unique compounds than other previously reported methods.

MetIDfyR, developed as an R-package that aids in in silico drug phase I/II biotransformation prediction and mass-spectrometric data mining from untargeted LC-HRMS/MS datasets (Delcourt et al. 2020) to help with feature annotation. With the ability to predict drug metabolism products from in vitro and in vivo studies, this tool holds potential in annotation workflows in drug discovery programs.

Qemistree, is a cheminformatics tool available as an advanced analysis workflow on GNPS infrastructure that allows mass spectrometry data to be represented in the context of sample metadata and chemical ontologies (Tripathi et al. 2020). This tree-guided data exploration tool allows comparison of metabolomics samples across different experimental conditions such as chromatographic shifts. The Qemistree software pipeline is freely available to the microbiome and metabolomics communities in the form of a QIIME2 plugin as well.

Ion identity molecular networking (IIMN), a workflow available within the GNPS ecosystem that complements the feature based molecular networking (FBMN) by aiding in annotating and connecting related ion species in feature-based molecular networks (Schmid et al. 2020). Though, MS1-based ion identity networks (IIN), are well-known, IIMN helps to integrate IIN into MS2-based molecular networks in the GNPS environment, thus adding MS/MS information on top of MS1 characteristics of ions.

Food-Biomarker Ontology (FOBI), is a tool developed in R language, is a web-based analysis and visualization package that is focused on interactive visualization of the FOBI structure (Castellano-Escuder et al. 2020). FOBI (Food-Biomarker Ontology) is a new ontology that describes food and their associated metabolite entities and is composed of two interconnected sub-ontologies, the ‘Food Ontology’ consisting of raw foods and ‘multi-component foods’ and a second: ‘Biomarker Ontology’ containing food intake biomarkers classified by their chemical classes. These two sub-ontologies are conceptually independent but interconnected by different properties. Functionalities of the tool include static and dynamic network visualization, downloadable tables, compound ID conversions, classical and food enrichment analyses.

BioDendro, is a Python package, for feature analysis of LC–MS/MS metabolomics data as a workflow that enables users to flexibly cluster and interrogate thousands of MS/MS spectra and quickly identify the core fragment patterns causing grou**s leading to identification of core chemical backbones of a larger class, even when the individual metabolite of interest is not found in public databases (Rawlinson et al. 2020).

AllCCS, is a freely accessible database/ CCS atlas that covers vast chemical structures with > 5000 experimental CCS records and ~ 12 million calculated CCS values for > 1.6 million small molecules, with medium relative errors of 0.5–2% for a broad spectrum of small molecules (Zhou et al. 2020). The tool offers several modules to perform PCA, differential expression analysis, pathway analysis, and network analysis.

metPropagate, is a network-based approach that uses untargeted metabolomics data from a single patient and a group of controls to prioritize candidate genes in patients with suspected inborn errors of metabolism (IEMs) (Graham Linck et al. 2020). This approach determines whether metabolomic evidence could be used to prioritize the causative gene from this list of candidate genes, where each gene in a patient’s candidate gene list is ranked using a per-gene metabolomic score termed the “metPropagate score”, which represented the likely metabolic relevance of a particular gene to each patient.

9 Summary of current tools

In this section, I summarize the observed trends for the tools reported in 2020, which are:

  1. a.

    Majority of the software tools and packages focus on ‘annotations’, i.e., almost 35% of the total 72 tools reported for the year deal with untargeted metabolomics data annotation.

  2. b.

    82% of the total tools reported are concerned with data analysis challenges with “LC–MS/MS”, mostly untargeted LC–HRMS/MS efforts.

  3. c.

    Programming languages used for these tools mostly are R language packages (28 tools), Python language packages (11 tools), Java language (5 tools) or are web-servers/ web-based tools (23 tools).

  4. d.

    48% of the reported tools are ‘easy to use’ (click to start, web-based, or plug-and-play type tools) from a user stand point for community of biologists and chemists who are not computational savvy.

  5. e.

    Of the total tools reported here, 57% of the tools have a GitHub repository associated with them.

  6. f.

    Couple of tools are improved versions, suggesting these are active tools that are being developed/maintained.

  7. g.

    Lot of tools reported in the year deal with specialized applications: ranging from data integration (i.e., metabolomics data with proteomics/transcriptomics data), epidemiological metabolomics data, lipidomics, MSI data.

10 Concluding remarks

In summary, one can observe that there are numerous tools that were either developed from scratch or evolved from their previous versions in 2020 alone. Some tools and approaches found new applications, such as GNPS in the domain of GC–MS-based metabolomics (Aksenov et al. 2020), or released as a beta/ advanced version, i.e., MS-DIAL for lipidomics (Tsugawa et al. 2020) workflows. Only the future years will dictate as to which of these 2020 tools live on to see another year in terms of utility/ application, stays maintained and remain available, get improved, and get adopted by the metabolomics research community. Irrespective, all these tools help understanding metabolomics data from diverse stand points and are welcome additions to the community going forward into the big data-driven precision medicine era. In general, the trend is to develop, fast, computationally less intensive, robust, open-source, user-friendly tools that can adhere to findable, accessible, interoperable, and reproducible (FAIR) guidelines. Undoubtedly, the metabolomics research community needs more of these improved tools, and in the coming years the tools, resources, and databases will keep coming and getting better.