Enhanced efficiency of MS/MS all-ion fragmentation for non-targeted analysis of trace contaminants in surface water using multivariate curve resolution and data fusion

Vosough, Maryam; Salemi, Amir; Rockel, Sarah; Schmidt, Torsten C.

doi:10.1007/s00216-023-05102-x

Enhanced efficiency of MS/MS all-ion fragmentation for non-targeted analysis of trace contaminants in surface water using multivariate curve resolution and data fusion

Research Paper
Open access
Published: 11 January 2024

Volume 416, pages 1165–1177, (2024)
Cite this article

Download PDF

You have full access to this open access article

Analytical and Bioanalytical Chemistry Aims and scope Submit manuscript

Enhanced efficiency of MS/MS all-ion fragmentation for non-targeted analysis of trace contaminants in surface water using multivariate curve resolution and data fusion

Download PDF

Maryam Vosough^1,2,
Amir Salemi¹,
Sarah Rockel¹ &
…
Torsten C. Schmidt^1,3

1370 Accesses
2 Altmetric
Explore all metrics

Abstract

Data-independent acquisition–all-ion fragmentation (DIA-AIF) mode of mass spectrometry can facilitate wide-scope non-target analysis of contaminants in surface water due to comprehensive spectral identification. However, because of the complexity of the resulting MS² AIF spectra, identifying unknown pollutants remains a significant challenge, with a significant bottleneck in translating non-targeted chemical signatures into environmental impacts. The present study proposes to process fused MS¹ and MS² data sets obtained from LC-HRMS/MS measurements in non-targeted AIF workflows on surface water samples using multivariate curve resolution-alternating least squares (MCR-ALS). This enables straightforward assignment between precursor ions obtained from resolved MS¹ spectra and their corresponding MS² spectra. The method was evaluated for two sets of tap water and surface water contaminated with 14 target chemicals as a proof of concept. The data set of surface water samples consisting of 3506 MS¹ and 2170 MS² AIF mass spectral features was reduced to 81 components via a fused MS¹-MS² MCR model that describes at least 98.8% of the data. Each component summarizes the distinct chromatographic elution of components together with their corresponding MS¹ and MS² spectra. MS² spectral similarity of more than 82% was obtained for most target chemicals. This highlights the potential of this method for unraveling the composition of MS/MS complex data in a water environment. Ultimately, the developed approach was applied to the retrospective non-target analysis of an independent set of surface water samples.

Graphical abstract

Chemometric and high-resolution mass spectrometry tools for the characterization and comparison of raw and treated wastewater samples of a pilot plant on the SIPIBEL site

Article 23 November 2017

Optimization of LC-Orbitrap-HRMS acquisition and MZmine 2 data processing for nontarget screening of environmental samples using design of experiments

Article 07 October 2016

Mass-Suite: a novel open-source python package for high-resolution mass spectrometry data analysis

Article Open access 23 September 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In recent years, the use of liquid chromatography coupled with high-resolution mass spectrometry (LC-HRMS) to analyze organic micropollutants, their transformation products, and human metabolites in environmental water samples has increased substantially. In the framework of non-target analysis (NTA), modern hybrid mass spectrometry instruments allow sensitive and comprehensive detection of hundreds of chemical compounds in environmental samples [1,2,3]. As an added benefit, full-scan and tandem mass spectrometry (MS/MS) data can be stored without the need for rerunning samples, enabling retrospective analysis to uncover contaminants that have never been detected before. However, while LC-HRMS/MS has provided new opportunities for NTA of organic pollutants, there are several challenges, and knowledge gaps have emerged in the environmental big data era from measurement and data processing points of view. At the end of a non-targeted workflow, for instance, identifying the preferred pollutants is an important and challenging task. When searching large compound databases for possible structures, numerous hits are usually generated, which must then be sorted by MS/MS data, retention time plausibility, and metadata provided. In fact, develo** identification protocols for prioritized pollutants becomes crucial at this stage, especially with limited reference substances available. In an HRMS instrument, MS/MS can improve selectivity in annotation efforts, as MS/MS offers more annotation selectivity than does accurate mass alone. However, the search for unknowns in MS/MS or in-source fragment ion libraries is limited to the recorded spectra of reference standards, which is not sufficient for a real unknown screening and suffers from limited comparability among instruments [4]. Therefore an in silico strategy for determining unknown chemical structures by matching measured with quantitative structure–activity relationship (QSAR)-based predicted patterns from chemical databases is an alternative approach [5]. A successful MS/MS acquisition strategy could result in high-quality spectra for as many of the ions in the sample as feasible. The two most common MS² data acquisition approaches are defined as data-dependent acquisition (DDA) and data-independent acquisition (DIA), for liquid chromatography combined with HRMS. In DDA, differentiation can be made between a targeted approach (defined by the user, also known as list-dependent) and MS logic (e.g., top N of maximum peak height). DDA is generally used to increase annotation confidence in a non-targeted study. However, the limited MS/MS coverage of detected MS features in DDA has spurred the development of several alternative approaches, referred to as modified DDA approach [6]. In DIA, precursor windows are sequentially isolated and fragmented within the ion trap, thereby covering all precursor ions of interest. As a result of its wider isolation window, DIA offers the advantage of not requiring any prior knowledge of precursors. However, it displays more complex MS/MS spectra. Different DIA approaches include sequential precursor ion fragmentation (MS/MS^ALL), sequential window acquisition of all theoretical mass spectra (SWATH), and an innovative mode of acquisition known as scanning quadrupole DIA (SONAR) [7]. The MS/MS^ALL can be categorized into two methods, MS^E and all-ion fragmentation (AIF). MS^E is a mode that alternates between low-energy and elevated-energy scans, providing more comprehensive information about the sample. In AIF, all ions in the collision cell are fragmented without precursor ion selection. Therefore, AIF full scans combined with MS¹ scans provide an opportunity to retroactively analyze additional compounds of interest based on hypotheses that arise in the future [8]. However, in AIF, the selectivity is lost because no precursor ion selection is used, so the links between precursors and their fragment ions become untraceable due to the complexity of the resulting MS² spectra. Several deconvolution algorithms have been developed, including MS-DIAL [9], DIA-Umpire [10], R-MetaboList [11], and CorrDec (correlation-based deconvolution) [12] to link AIF parent ions to their associated fragment ions and extract the relevant pseudo-MS/MS spectra. One recent study developed an automated multi-sample-based correlation AIF workflow (MetaboAnnotatoR) for the annotation of -omics LC−MS AIF data sets [13]. In fact, research in this area is ongoing, specifically in the field of metabolomics, as well as evaluating different MS modes and data types [14]. Though these software programs offer a good starting point for NTA of water samples, they were originally developed for -omics research, and to the best of our knowledge, they have not yet been thoroughly tested for their effectiveness and functionality in NTA of water samples using the DIA approach. In fact, high-coverage non-target environmental screening presents several complications to researchers mainly due to the large diversity of environmental matrices and chemical space, the occurrence of low-intensity but highly environmentally effective contaminants, and substantial matrix effects in highly contaminated water samples [15,16,17]. In most of the mentioned tools, the precursor–fragment ion connection is performed by peak matching or peak intensity correlation matching across a fixed retention time region. Relying on peak shape or correlation shape-based feature tracing is highly problematic in the case of co-elution of compounds or embedded peaks. Moreover, when co-eluting compounds produce similar mass fragments, their intensity correlation is so small that it cannot be detected in deconvoluted MS² spectra. Additionally, some adducts, in-source fragments, and isotopologues are not always taken into account.

In recent years, extended multivariate curve resolution–alternating least squares (MCR-ALS) combined with a binning procedure has been developed for non-target analysis [18, 19]. Data compression and matrix construction can also be conducted according to searches of regions of interest (ROI), which are regions of data points with a high density ranked by a certain "data void" [20, 21]. Following the coupling of ROI with the MCR-ALS method in metabolomics studies [22,23,24], the method has been utilized for non-target analysis in environmental metabolomics [25], micropollutant screening in aquatic environments [15, 26], wastewater proteomics [27], polymer degradation in aquatic environments [28], and recently in the processing of different MS acquisition modes in an non-targeted metabolomics study [29]. The main strength of employing MCR-ALS in NTA studies is that, unlike most data processing strategies which are based on analyzing each m/z channel (feature) at a time for each sample and require alignment and finally a componentization step, MCR-ALS is based on a bilinear factor decomposition concept. Thus, all information regarding the mass features of each MCR-ALS component, such as precursor ions, their associated isotopic peaks, and adduct peaks, can be recovered at once and considered for identification purposes. Additionally, this method does not require background signal correction, since it recovers all chemicals, solvent peaks, and background signals responsible for systematic variation in the data sets. Moreover, MCR-ALS is the most flexible multi-way model for handling retention time shifts across chromatographic runs. In fact, because of the bilinear factor decomposition basis, an alignment of the retention time shift is not essentially required before performing MCR-ALS modeling, which is advantageous [30]. The reports, however, find that shift corrections and trilinearity constraints (and other constraints whenever applicable) can improve reliability and reduce uncertainty in many situations, depending on the data structure [31, 32]. Using this concept, MS² AIF data across different LC runs can be compiled as matrices, and multi-way data modeling can be applied to them, in a similar way as MS¹ full-scan data. In fact, by employing this method, a component-based profile resolution strategy rather than a feature-based deconvolution method is developed for MS² spectral recovery. One further step is fusing two blocks of data sets and their data processing. This work aims to develop the concept of MCR-ALS for simultaneous decomposition of LC-MS¹ and LC-MS² data sets in AIF acquisition mode in surface water samples, in a non-targeted way. In this study, we investigate how data integration in different MS acquisition modes can facilitate unknown peak identification in samples of different complexity. In addition to the curve resolution in each individual MS measurement mode, extended MCR-ALS allows researchers to analyze them simultaneously (both row-wise and column-wise) by fusing the data. The present study is the first to report the joint processing of augmented full MS/AIF modes data for NTA of organic micropollutants in water samples using a multi-way decomposition method. The advantages of this method for a simultaneous analysis of multiple chromatographic runs include the resolution of all components having a systematic variation within a raw data set in all instrumental modes and their relative abundance, and obtaining a direct connection between each precursor ion (together with its adducts, isotopes, and fragments) and its corresponding MS/MS spectral profiles. In order to achieve this, we prepared reference spectra for 14 pharmaceuticals and hormones frequently detected in surface water, evaluated the decomposition process in individual models of MS¹, MS², and fused MS¹-MS² data, and used different validation samples by spiking the target compounds in tap water and river water samples. Moreover, the match quality scores with the reference spectra were used to evaluate the quality of resolved MS² spectral profiles and clarifying some MCR challenges in this area. Ultimately, as a “real-life” application in this field, the developed approach was applied to the retrospective NTA of an independent set of surface water samples, following the classification and prioritization step. Our main objective was to implement a strategy for annotating highly prioritized chemicals based on the relevant chromatographic segments.

Material and methods

Chemicals, samples, and data acquisition

Authentic standards of 14 targeted chemicals including primidone, caffeine, acetaminophen, sulfamethoxazole, trimethoprim, testosterone, carbamazepine, napoxen, ibuprofen, gemfibrozil, fluoxetine, ciprofloxacin, estrone, and progesterone were purchased from Sigma-Aldrich (Saint Louis, MO, USA), and detailed information of these chemicals is provided in Table S1 (see Electronic Supplementary Material, S1). The mixed standard solutions (set I samples) were prepared in methanol at 1000 μg/L and stored at −20 °C, and different solutions of standard solutions (0.1, 0.5, 1, 10, 50, and 100 μg/L) were prepared in ultrapure water. LC-MS-grade water, LC-grade methanol, and formic acid were purchased from Merck (Darmstadt, Germany). A duplicate analysis was performed on all samples. The performance of the proposed methodology was then evaluated by spiking the mixed standard targets into two types of water samples with varying levels of matrix complexity (validation samples). To this end, tap water samples and river water samples collected from the Ruhr River south of Essen (Germany) were considered. Set II samples represent a “low-level” matrix complexity and the non-spiked and all spiked tap water samples (TW1–TW7) were directly injected into the LC-Q-Orbitrap-MS system. To generate set III samples which represent "high-level" matrix complexity, the river water sample was first extracted by solid phase extraction (SPE), and then the standards mixtures were spiked into the extracted water (RW1–RW7) with a nominal enrichment factor of 100. Details of sample preparation are presented in S2. Set II and set III samples were prepared at six-point concentrations of 14 pharmaceuticals and hormones (0.1 to 100 μg/L). Furthermore, the proposed approach was applied to an independent set of surface water samples (set IV) as part of a non-target study. Samples were collected in May 2019 at five different points in rivers of northern Iran. Details of sampling points and the laboratory and data analysis practices for set IV samples are presented in S3. All measurements were conducted using a Dionex UltiMate 3000 LC system (Thermo Scientific) hyphenated to a high-resolution accurate-mass Orbitrap mass spectrometer (Q Exactive, Thermo Scientific) with acquisition parameters and conditions described in S2 (data sets I to III) and S3 (data set IV).

Initial data preparation and matrix arrangement

The MS¹ and MS²AIF information in structure arrays was converted to peak lists (cell array of matrices containing m/z values and ion intensity values) using the “mzxml2peaks.m” function of the MATLAB Bioinformatics Toolbox (4.3.1.version) by setting LevelsValue to “1” and “2,” respectively. Then, the strategy based on the ROI approach was employed in all measured LC-HRMS/MS signals for data compression and preparing final data matrices [21,22,23]. Then, ROI analysis was performed on each data file's individual peak lists. In fact, by using ROIs, these initial arrays with their irregularly distributed measured m/z and MS instrument signal intensities can be converted into data matrices appropriate for multivariate data analysis. The ROI selection is governed by the mass intensity threshold (SNR_thr), the m/z error or the mass accuracy of the spectrometer and the minimum number of occurrences of m/z values in consecutive scans for an “ROIpeaks” function [24], which were set at 0.01% of the maximum MS signal intensity, 0.003 amu for the Orbitrap MS analyzer and 15, respectively. At the end of this step, the matrixized data for each sample in both MS¹ and MS² modes of measurement were created. Then, according to Figs. S1, S2 and Fig. 1, different data arrangements were made to model the data sets by the extended MCR method: (a) column-wise augmentation (CWA) of all samples in each of MS¹ or MS² modes (using the “MSroiaug” function), (b) row-wise augmentation (RWA) of LC-MS¹ and LC-MS² for each sample, and (c) row-column-wise augmentation (RCWA) of all samples for both MS¹ and MS² modes. Each individual data matrix can be segmented before this step in order to simplify the further curve resolution process and localize component information within narrow time frames. If there are high background signals or if peak windows cannot be selected in which each peak is completely contained in at least one window, windows can be selected in which regions overlap.

MCR-ALS processing of LC-MS¹, LC-MS², and fused LC-MS¹-MS² data

MCR-ALS is a well-known method that decomposes bilinear data sets into pure component profiles describing the measured variance of the system [33]. The original form of bilinear factor decomposition through MCR-ALS can be extended to a more powerful model for the decomposition of augmented data matrices and simultaneous modeling of several samples (see Fig. S1). Moreover, data fusion can be accomplished by using different spectrometric measurements to investigate a system as row-wise augmented matrices. In LC-HRMS measurements, the concept of data fusion can be adopted for a row-wise appending of data matrices in different acquisition modes of MS¹ and MS² AIF, as shown in Fig. S2 for one sample. For multiple sample analysis, row-wise fused data sets can be augmented on top of each other to provide a global data matrix as shown in Fig. 1. As a result of the inclusion of one or more of the included data matrices, these new augmented data matrices always exhibit favorable features that affect the resolution of the most complex data matrices [30].

To obtain an appropriate and meaningful data structure, data matrices in column- and row-wise data augmentations should share column vector space (MS¹ and MS² spectra) and row vector space (LC elution windows) with the other appended matrices, respectively. Therefore, while column-wise chromatographic alignment is not necessarily a prerequisite before MCR-ALS processing of RCWA data sets, row-wise augmented matrices must match before modeling begins. A misalignment condition can be checked intuitively or by individual modeling of LC-MS¹ and LC-MS² data blocks. Therefore, the retention times of resolved chromatograms of the LC-MS¹ can be considered as the reference index for adjusting of LC-MS² data. Bilinear decomposition of each global fused LC-MS¹-MS² matrix, containing K (no. of samples)×2 matrices can also be shown as:

$${\textbf{D}}_{\textbf{global}}=\left[\begin{array}{c}\textbf{D}{\textbf{1}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\\ {}\textbf{D}{\textbf{2}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\\ {}\textbf{D}{\textbf{3}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\\ {}.\\ {}.\\ {}.\\ {}{\textbf{D}\textbf{k}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\end{array}\right]=\left[\begin{array}{c}\textbf{C}\textbf{1}\\ {}\textbf{C}\textbf{2}\\ {}\textbf{C}\textbf{3}\\ {}.\\ {}.\\ {}.\\ {}\textbf{C}\textbf{k}\end{array}\right]={\textbf{S}}_{\textbf{MS1}-\textbf{MS}\textbf{2}}^{\textbf{T}}+\left[\begin{array}{c}\textbf{E}{\textbf{1}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\\ {}\textbf{E}{\textbf{2}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\\ {}\textbf{E}{\textbf{3}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\\ {}.\\ {}.\\ {}.\\ {}{\textbf{Ek}}_{\textbf{LCMS1}-\textbf{MS}\textbf{2}}\end{array}\right]={\textbf{C}}_{\textbf{aug}}\ {\textbf{S}}_{\textbf{MS1}-\textbf{MS}\textbf{2}}^{\textbf{T}}+{\boldsymbol{E}}_{\boldsymbol{MS}\textbf{1}-\boldsymbol{MS}\textbf{2}}$$

(1)

where the rows in matrix D_global contain RWA matrices recorded in full-scan MS¹ and MS² AIF modes. C_aug contains the elution time profiles of N compounds eluted in both modes of measurements which are present in all individual sub-matrices, and ${\textbf{S}}_{\textbf{MS1}-\textbf{MS}\textbf{2}}^{\textbf{T}}$ represents the pure MS¹ and MS² spectra associated with LC profiles. E_MS1−MS2 is the global matrix of residuals not fitted by the model. Thus, a unified MCR model can be used to obtain information regarding all involved LC profiles, taking into account different peak shapes, retention times and peak areas, as well as MS¹ and MS² profiles.

In order to obtain the best solution, it is crucial to carefully examine the number of components, initial estimate profiles, and constraints. In selecting the number of components in each D_global, a visual inspection, incremental approach, statistical approach, or a combination of these approaches can be used. The incremental approach can be used to build MCR-ALS models with a low number of chemical components (based on singular value decomposition [SVD] or principal component analysis [PCA]) and incrementally increase the number of chemical components as the model progresses [33]. Depending on criteria such as explained variance or model stability, and reliability of resolved chromatograms, MS¹ and MS² spectra, the optimal number of components can be determined [15, 34]. The initial estimates of spectra or LC profiles were produced by the SIMPLe-to-use Interactive Self-modeling Mixture Analysis (SIMPLISMA) method [35], according to the most pure elution regions or pure m/z values of the involved components, respectively. Then, bilinear factor decomposition of D_global and estimation of C_aug and S^T matrices were performed by iterative least-squares minimization of the Frobenius norm of E (residuals/errors), under constraints of non-negativity in C_aug and S^T factor matrices, and normalization of the mass spectra of the resolved components to the maximum signal intensity equal to one [33]. Here, the data sets were modeled without background correction and in most cases without chromatographic peak alignment. However, for a few cases, after shift corrections using icoshift [36] and the fulfillment of the trilinear structure, the trilinearity constraint was also added to improve the quality of results. Other constraints can be implemented during the ALS optimization such as closure, unimodal, selectivity, and local rank [37], which were not considered in this work. Regarding the correspondence criterion, the default setting assumption was used in MCR-ALS analysis for adhering to a non-targeted concept. In fact, due to the clear advantage of MCR-ALS for flexible implementation of constraints, a wide variety of data sets can be processed effectively by selecting the appropriate restricting conditions, thus reducing the uncertainty associated with bilinear factor decomposition [31]. Finally, the iterative optimization is continued, until the convergence criterion is fulfilled. This criterion is based on determining the relative standard deviation of residual changes between two consecutive iterations below a predetermined threshold, (i.e., 0.1%). Furthermore, the quality of the decomposition process can be determined by the lack of fit (Lof%) and the amount of variance explained, R², as defined by Eqs. 2 and 3:

$$\textrm{Lof}\ \left(\%\right)=100\ \frac{\sum_{\textrm{i}=1}^{\textrm{m}}\sum_{\textrm{j}=1}^{\textrm{n}}{\textrm{e}}_{\textrm{i},\textrm{j}}^2}{\sum_{\textrm{i}=1}^{\textrm{m}}\sum_{\textrm{j}=1}^{\textrm{n}}{\textrm{d}}_{\textrm{i},\textrm{j}}^2},{\textrm{e}}_{\textrm{i},\textrm{j}}={\textrm{d}}_{\textrm{i},\textrm{j}}-{\hat{\textrm{d}}}_{\textrm{i},\textrm{j}}$$

(2)

$${\textrm{R}}^2\left(\%\right)=100\ \frac{\sum_{\textrm{i}=1}^{\textrm{m}}\sum_{\textrm{j}=1}^{\textrm{n}}{\textrm{d}}_{\textrm{i},\textrm{j}}^2-\sum_{\textrm{i}=1}^{\textrm{m}}\sum_{\textrm{j}=1}^{\textrm{n}}{\textrm{e}}_{\textrm{i},\textrm{j}}^2}{\sum_{\textrm{i}=1}^{\textrm{m}}\sum_{\textrm{j}=1}^{\textrm{n}}{\textrm{d}}_{\textrm{i},\textrm{j}}^2}$$

(3)

where each d_ij shows each experimental data matrix and each e_ij is the residual element of the E matrix.

Data evaluation

In the current study, the mixed standard solutions (data set I) were arranged globally (D_Aug-rcw-MS¹ _MS²) and subjected to extended MCR-ALS modeling to generate a robust source of reference chemical information, including resolved chromatographic profiles and MS¹ and MS² spectra and their relative abundance (Fig. 1). These results were further compared with individual modeling of CWA LC-MS¹ and LC-MS² data sets. Then, the method was used for decomposition of two validation data sets II and III. To this end, MCR models were created and evaluated for one global RCWA LC-MS¹-MS² data set. Finally, the feasibility of the method was further evaluated for individual modeling of RW fused LC-MS¹-MS² data sets for extracted river water samples enriched with target compounds with contamination levels of 100 and 10 μg/L (Fig. S2). For NTA of data set IV, the peak areas obtained via MCR modeling of CWA D_LC-MS1 were put into a matrix whose rows and columns corresponded to the water samples and the resolved MCR-ALS components. This matrix of non-target data was subjected to multivariate methods including PCA and orthogonal partial least squares–discriminant analysis (OPLS-DA). Then, the LC-MS¹ and LC-MS² data sets for some chromatographic segments were fused to be processed by global MCR-ALS models and simultaneous decomposition of MS¹ (including precursor ions) and MS² AIF spectra.

Chromatograms were recorded in profile mode using Xcalibur software (Thermo Fisher). All chromatographic data sets were then converted into mzXML files using MSConvertGUI software [38]. Next, data files were imported into MATLAB (The MathWorks, Inc., version 9.9, 2020b, Natick, MA, USA) for further data preprocessing and postprocessing as mzXMLStruct using the “mzxmlread.m” function of the MATLAB Bioinformatics Toolbox. The calculations involving MCR-ALS were performed in MATLAB software using the MCR-ALS 2.0 toolbox available at www.mcrals.info. The icoshift routine was downloaded from www.models.life.ku.dk/algorithms. Chemometrics data processing (set IV samples) and prioritization of relevant contaminants were performed using PLS Toolbox version 8.9 (Eigenvector Research, Inc., Wenatchee, WA, USA) in the MATLAB computational and visualization environment. Following the decomposition process for fused LC-MS¹-MS² data sets for the standard samples, and confirming their correspondence with the extracted ion chromatograms (EICs) from the original data in Xcalibur software, the resolved MS¹ and MS² spectral profiles were used for confirmation of known non-target chemicals (validation sets) and confirmation/tentative identification of unknown contaminants in surface water samples, based on multiple lines of evidence. These include (1) a positive hit in the MS/MS libraries mzCloud (mzCloud; https://www.mzcloud.org) and PubChem (https://pubchem.ncbi.nlm.nih.gov/) and the in-house MS² spectral library from experimental data in DDA, for the most intense mass fragment in the resolved MS¹ profile (assigned to theoretical exact m/z [M+H]⁺ or [M+Na]⁺ precursors within a 5 ppm m/z error). For the validation step, due to the availability of reference MS² profiles (in-house library), a similarity score or MS² spectrum match was calculated as a dot product between reference MS² profiles and the resolved AIF MS² profiles as follows:

$${\textrm{MS}}^2\ \textrm{similarity}\ \left(\%\right)=100\times \frac{\sum {\left({MS}_{res}{MS}_{ref}\right)}^2}{\sum {MS_{res}}^2\sum {MS_{ref}}^2}$$

(4)

where MS_res and MS_ref are the vectors of resolved AIF MS² profiles and AIF (or DDA) reference mass spectrum, respectively. Further evidence includes (2) a match between chromatographic peak shapes and retention time, and (3) availability of reference materials. To classify tentatively identified features, the scheme proposed by Schymanski was used [39].

Results and discussion

Decomposition of LC-HRMS/MS standard data set

Initially, each individual CWA LC-MS¹ and LC-MS² data of mixed standard solutions (set I samples) were analyzed using the MCR-ALS method. MS¹ and MS² data were simultaneously processed in the next step by fusing the matrices in a row-wise way. All models performed optimally with 25 resolved components, including 11 background signals and 14 target compounds. The percentage of explained variance and lack of fit of experimental for the global models were ≥99% and ≤6.5% (for individual models the values were ≥99% and 3–6%), respectively. Figures S3 and S4 show the resolved chromatograms for the mixed standard solution in concentration of 100 μg/L (in three models) and 0.5 μg/L in fused mode, respectively. As can be seen in these figures, all targeted compounds have been resolved well chromatographically and there is a high level of coherence between the profiles across the MS¹, MS², and MS¹-MS² mass spectral data. Furthermore, the difference in total ion currents (TICs) in LC-Q-Orbitrap for MS¹ and MS² data acquisition modes, which reflect different ion sensitivities, can be compensated for by simultaneously analyzing both MS¹ and MS² signals (see Fig. S5 as an example). With this approach, all patterns and features of both MS levels are captured in a unified chromatogram, components are identified more accurately, and the level of information is generally higher than if each block were modeled separately. A pairwise comparison between the retrieved MS¹ and MS² profiles through individual models with their hybrid model counterparts showed an excellent agreement, making the global model robust and superior to individual ones capable of recovering MS² profiles and directly connecting them to relevant MS¹ spectra (precursor ions) and chromatographic profiles. Correlation analysis between the areas under the resolved LC profiles in MS¹-MS² model and each of the MS models confirmed the high quality of chromatographic resolution of the fused model (R² values >0.993). Moreover, based on a regression analysis between peak areas and concentrations of each chemical (between 0.1 and 100 g/L), the correlation coefficients obtained from fused data modeling ranged from 0.993 to 0.998 (p-value of lack-of-fit test >0.05), suggesting that the methodology can also be used as a complementary approach for quantification purposes as well (Fig. S6a and b). This study, however, focused on qualitative aspects of the results when constructing an integrative model for non-targeted analysis of RCWA LC-MS¹-MS² data sets.

Figure 2a–c shows an example of MCR-ALS modeling of global augmented data matrices for carbamazepine standard [M+H]⁺ (m/z 237.1013). Here, the chromatographic profiles of carbamazepine (MCR component 4) and their MS¹ and MS²AIF profiles counterparts in all standard samples (with R²= 0.998) have been recovered at once. As shown in the inset of Fig. 2c, MS² fragments and their ratios match quite well with the relative peak heights of the corresponding EICs of pure standard carbamazepine. In this way, it turns out that we can group all signals from isotopic peaks, adducts, and different charge states of a single compound at the MS¹ level, as well as all mass fragments for every eluting compound at the MS² level, using a bilinear factor decomposition method. This can be achieved without initial preprocessing steps including background correction and retention time shift alignment, considering the high selectivity of LC-MS signals and corresponding high quality of initial estimates of LC profiles using the pure variable detection approach [21, 24, 29, 40, 41]. However, due to the extensive flexibility of MCR-ALS, additional constraints can be added to the process if a higher quality of profiles is required in tricky situations, provided the data structure meets the required criteria (see below). Figure 2d also reports the MS² spectrum of carbamazepine under the same instrumental conditions (CE of 30 eV) in ddMS² mode. A comparison between the resolved MS² spectrum in AIF mode with the current method and the ddMS² profile shows a similarity score of 80% and increased relative sensitivity of the main mass fragment 194.0964 and two other fragments to the precursor ion 237.1017 when the measurement is carried out in AIF mode. The following cases further illustrate the benefits of the proposed method. An example is naproxen (MCR component 11) for which the correct information on its MS¹ spectrum and precursor [M+H]⁺ (m/z 231.016) was not immediately clear due to in-source degradation. However, with this method and through global componentization, we were able to successfully recover the true MS¹ pattern of naproxen, with the most abundant fragment ion at m/z 185.0960, and a straightforward (manual) assignment to its MS² spectrum, which was further confirmed by mzCloud. Another example is gemfibrozil, where through a pairwise comparison with its corresponding retrieved MS² profile (MCR component 14), the most intense MS¹ peak (m/z 273.1459) was manually assigned to the [M+Na]⁺ adduct as the precursor ion to be further followed for annotation purposes in validation data sets. Figure 3 shows a comparison between acetaminophen and gemfibrozil with precursor ions [M+H]⁺ and [M+Na]⁺ in their retrieved MS¹ spectra and their associated resolved MS² AIF spectra.

However, as mentioned before, one significant consideration in the modeling of fused LC-MS/MS data is that LC profiles in both modes of MS data acquisition should be synchronized and span the same retention time range [29]. Although this requirement has generally been met in the current data sets, there have been some instances of distortion. For example, processing fused data matrices in the retention time range of 10 to 11 min can be explained in more detail. Here, proper recovery of MS²AIF for fluoxetine can be considered as a challenging case. Figure S7a–b illustrates the EICs of highly co-eluting carbamazepine and fluoxetine in their characteristic ions in MS¹ and MS² data acquisition modes. A substantial difference between the ion ratios of these chemicals due to their different ionization efficiency and fragmentation behavior is clear. The significantly lower abundance of MS/MS AIF fragments for fluoxetine and the smaller peak width relative to the MS¹ peak, which is most likely the result of strong over-fragmentation of the AIF precursor ion [42], would lead to a discrepancy in LC windows of fluoxetine in row space of fused MS¹ and MS² data matrices and prevent efficient MS² spectrum recovery. This was further confirmed by individual MCR-ALS modeling of CWA data matrices LC-MS¹ and LC-MS², following proper alignment of chromatograms and adding trilinearity constraints to the processing workflow (Fig. S7c–d). However, following row-wise concatenation and modeling of these two data blocks, while the MS¹ spectrum of this compound was effectively recovered (due to the high purity of initial estimates and the complementary role of MS¹ data matrix in the resolution process), its MS² profile was mainly characterized by carbamazepine fragmentation patterns with a base MS/MS fragment of 194.0964 (Fig. S7e–f). This is while the recovered MS² AIF spectrum of fluoxetine, through MCR-ALS modeling of LC-MS² data, showed a similarity score of 96.8% with its corresponding pure standard (Fig. S7g–h). This type of issue can be tracked and checked in a real non-target data set by analyzing the LC-MS¹ and LC-MS² data in a fused and non-fused way and comparing the results.

Consequently, the simultaneous curve resolution of full-scan MS¹ and MS² AIFs in the fused method provides a two-sided advantage by directly linking the acquired MS² profiles with their MS¹ spectral counterparts (and responsible precursor ions). It can be considered for the complementary information of each resolved component and facilitates the proper assignment of MS¹ and MS² spectra. Finally, the recovered MS² fragments and their ratios for the most prominent ions were compared with the EICs from the original data in Xcalibur software. The similarity scores were greater than 99% for all the target compounds. The provided MS² AIF spectra along with LC and MS¹ information obtained through modeling of global data set I (Table S1) were then considered as reference chemical information for proof of concept in validation water samples.

Multivariate curve resolution of LC-HRMS/MS validation data set

As the first set of validation samples, the non-spiked and all spiked tap water samples with chemical standards were arranged as a global RCW augmented data matrix and each subjected to an individual CWA modeling of LC-MS¹ and LC-MS² and finally a fused LC-MS¹-MS² data modeling with MCR. Data showed chromatographic regions with different co-elution degrees with matrix components, retention time shifts, and varying drifting patterns and intensities of background signals. The number of components in the models with optimal performance was 40, 36, and 40, respectively. Therefore, fused data modeling shows a clear advantage over individual MS² data modeling, since it captures more components in a non-target assay. Moreover, since the MCR model has an inherent property of swap** the positions of components in individual CWA models, each resolved component (or target) must be assigned separately. Using MCR modeling of row-wise concatenated data matrices, this issue was also resolved. The results of global modeling, including the matrix dimension, model performance parameter, resolved chromatographic profiles, and variation in recovered peak areas for 40 components across different samples, are provided in Table S2, Fig. S8, and Fig. S9, respectively. The resolved components include the target standards, background signals, and unknown matrix components. Figure S8 shows the co-elution issues (with matrix components) for some of the target compounds and also presents the resolved MS² AIF spectra using the global model for acetaminophen, testosterone, and gemfibrozil. Their similarity scores with the reference spectra are 90.7%, 96.9%, and 95.4%, respectively. In fact, in addition to the significance of dealing with co-elution issues in NTA of water samples, the presence of background signals with different drifting patterns throughout the chromatographic runs can be decisive in recovering highly qualified mass spectral profiles through MCR analysis of raw data matrices. The mentioned issues are especially pertinent for the annotation of trace amounts of pollutants. However, the advantage of extended MCR-ALS similar to other multi-way methods is that it can comprehensively combine the analysis of several samples in experimental series. When different samples with varying concentrations of chemicals are simultaneously subjected to the method, the ability to detect trace peaks increases, and a more robust and reliable estimate of the pure chromatographic and spectral profiles can be obtained [43, 44]. Additionally, the current workflow can be modified to include background correction of data matrices as a preprocessing step, whenever necessary [32].

At the end, highly qualified MS² spectra were recovered for 13 chemical standards with similarity matches ≥82% (Table S3). Thus, a global MCR-ALS model for fused data sets was effective in recovering highly qualified and interference-free MS² spectra for most of the components with the current setup. The utility of the proposed methodology was further assessed on a set of extracted (and pre-concentrated) river water samples, representing a "high-level" matrix complexity, spiked with different concentration levels of target chemicals (data set III). Figure 4a shows an example of the curve resolution process for a subset of the global data matrix (7–10 min) including five target compounds. This LC interval reaches the optimal solution (R²=99.2) with 31 components under non-negativity constraint, including five target compounds and 26 background and unknown river water chemicals. The complexity of the data matrix is clear from Fig. 4b, which shows the resolved LC profiles, representing various co-eluting patterns between target chemicals of interest and other unknown matrix components. The successfully resolved MS¹ and MS² AIF spectra for primidone and trimethoprim are shown in this figure, representing match scores of 82.4% and 98.1% with reference spectra, respectively. The curve resolution process for other LC regions, including eight chemical standards, was also successful. Detected main mass fragments and their relative abundance were in accordance with Table S1 for standard samples, with match quality scores ranging from 88.2% (for acetaminophen) to 99.2% (for carbamazepine).

We encountered a challenging case in recovering the estrone MS² profile due to its low abundance in the current setup experiment, where it co-eluted with highly abundant compounds in most of the samples. This complicates the correct identification, with MS² match of around 60% (using the bilinear factor model), whereas other target pollutants in the extracted river water with the same concentration range had a promising MS² quality score (>80%). Thus, it can be concluded that the quality of resolved MS² spectra in a non-target water environment depends on contaminant classes (MS/MS fragment sensitivity) and variation patterns across samples. Nevertheless, the main advantage of MCR-ALS for chromatographic data is the flexibility to implement constraints even for a single peak in a matrix, to obtain more qualified mass spectra. As a result, different levels of model complexity can be covered when real-world situations in NTA of water samples with various degrees of complexity are encountered [31]. For example, in the mentioned case, following a careful alignment of chromatographic data and implementation of trilinearity constraint for estrone, the contribution of the main interfering compound (with the mass fragment 98.9845) to the estrone profile was effectively removed, and a more qualified MS² spectrum with similarity index of 86% with the corresponding pure sample was recovered (Fig. S10). Further studies are currently underway to automatically accommodate these capabilities in modeling fused MS¹-MS² data in different scenarios.

At the end of the analysis of data set III with the global fused data sets, we tried to tentatively identify the remaining unknown MCR components through MS¹ and MS² AIF spectral profiles connected to each recovered LC profile. The final results of identification for 10 MCR components resolved in the global fused model for data set III are presented in Table S4. Figure S11 shows the results of the resolution process for MCR unknown components 9 and 23, identified as benzotriazole and 4-acetamido-antipyrine in the river water sample (data set III), respectively.

Moreover, the fused models were employed for individual modeling of row-wise fused LC-MS¹-MS² data sets (Fig. S2) using two extracted river water samples with different spiked contamination levels (100 and 10 μg/L). Basically, MCR models can be fit to each individual sample, as MCR does not require three-dimensional data. This allows the identification of elution patterns in individual modeling of samples. Our final findings regarding similarity scores for target standard compounds in modeling of single LC-MS¹-MS² measurements are presented in Table S3. It is clear that for a higher spiked level in the final extracted (pre-concentrated) sample (100 μg/L), except for fluoxetine and estrone, the similarity scores are higher than 78%. This is while in a low spiked level (10 μg/L), the quality scores of most of the chemicals are less than 20% (for individual modeling of one sample). This result supports the importance of simultaneous modeling of multiple LC-MS¹-MS² measurements after matrix augmentation [45, 46] to reduce MCR model errors and ambiguities and provide high-quality resolved MS² AIF spectra. This is especially important for contaminants showing low sensitivity or low abundance or that are highly suppressed due to matrix effects in the water environment. In fact, simultaneous data processing of one or different sets of water samples is truly in line with real-world aquatic NTS advancements and perspectives. This methodology can be extremely useful in analyzing surface water samples collected at various times or locations, wastewater samples undergoing chemical or biological treatment, samples for chemical source tracking studies, and water samples measured under a variety of extraction protocols/instrumental conditions.

Non-target screening of surface water samples

The data processing strategy for simultaneous modeling of MS¹ and MS² data sets was further utilized as the end stage of a non-target screening workflow (sample set IV), as an application example. The details of the preliminary curve resolution and multivariate data processing steps are provided in SI-7 and supplementary Figs. S12 to S15.

Each component associated with the second group of surface water samples was initially assigned to its corresponding resolved elution profile by MCR modeling of the original CW LC-MS¹ data set. Then, different fused LC-MS¹-MS² data matrices (Fig. 1) were made using the LC windows including the prioritized pollutants according to their location in the chromatograms. As an overview of whole patterns, Fig. S16 shows TICs for surface water sample WS-18 (sampling site 5, Pirbazar River) in both data acquisition modes together with total ion mass currents for MS¹ and MS²-AIF modes. For instance, for prioritized component 43, an LC window of 11.3–12.1 min was extracted throughout the whole data set of LC-MS¹ and LC-MS² and subjected to global modeling by MCR-ALS. Figure 5 shows the results of this processing for a subset of raw data from sampling sites 3 to 5 for carbamazepine-positive assignment. Among other prioritized pollutants, we were able to annotate six compounds using the mzCloud database, and the rest of the compounds could not be identified (Table S6). The identified chemicals could be attributed to various classes, such as caffeine, carbamazepine (and its primary metabolite carbamazepine-10,11-epoxide), dextromethorphan, piperine, and buphedrine (the urinary metabolite of buphedrone, a drug of abuse), which are mainly released to the environment throughout non- or insufficiently treated domestic, hospital or industrial wastewater effluents [47, 48]. Also, the presence of the herbicide bensulfuron-methyl could be attributed to the agricultural runoffs. Its main application purpose is to control broadleaf weeds in rice paddies [9, 49], and rice is the most important agricultural product of Gilan province. Overall, the identified chemicals could be considered anthropogenic contaminants that were released into the river water via wastewater or non-point runoffs.

Conclusion

In the current study, global MCR modeling of fused full-scan MS¹ and MS/MS (AIF) data sets using LC-HRMS/MS measurements has been proposed as a highly efficient approach for enhancing the performance of DIA-based workflows for non-targeted analysis of trace contaminants in surface water samples. With the integration of MS² AIF data matrices to initial concatenated LC-MS¹, precursor ions in the resolved MS¹ spectrum can be directly linked to their corresponding resolved MS/MS spectra, both associated with their unified LC profiles. This facilitates the detection and identification of prioritized contaminants, especially when simultaneous analysis of multiple chromatographic data is considered. Moreover, the implementation of the extended MCR-ALS strategy for simultaneous MS-based data analysis is flexible, expandable, and customizable according to study needs and data structure. Further, the use of a unified model reduces data analysis time and improves results accuracy by modeling fused data sets.

We believe that while this methodology addresses some key needs for highly effective annotation of non-targeted LC-MS/MS AIF data, it still has room to accommodate combined strategies for a more comprehensive capture of chemical space in non-targeted pollution screening studies.

Abbreviations

DIA-AIF:: Data-independent acquisition–all-ion fragmentation
LC-HRMS:: Liquid chromatography coupled to high-resolution mass spectrometry
EICs:: Extracted ion chromatograms
PCA:: Principal component analysis
SVD:: Singular value decomposition
MCR-ALS:: Multivariate curve resolution–alternating least squares
OPLS-DA:: Orthogonal partial least squares–discriminant analysis
VIP:: Variable importance in projection
NTA:: Non-target analysis

References

Aceña J, Stampachiacchiere S, Pérez S, Barceló D. Advances in liquid chromatography–high-resolution mass spectrometry for quantitative and qualitative environmental analysis. Anal Bioanal Chem. 2015;407(21):6289–99.
Article PubMed Google Scholar
Hollender J, Schymanski EL, Singer HP, Ferguson PL. Nontarget screening with high resolution mass spectrometry in the environment: ready to go? Environ Sci Technol. 2017;51(20):11505–12.
Article ADS CAS PubMed Google Scholar
Nanusha MY, Frøkjær EE, Liigand J, Christensen MR, Hansen HR, Hansen M. Unravelling the occurrence of trace contaminants in surface waters using semi-quantitative suspected non-target screening analyses. Environ Pollut. 2022;315:120346.
Article CAS PubMed Google Scholar
Lestremau F, Levesque A, Lahssini A, Magnan de Bornier T, Laurans R, Assoumani A, et al. Development and implementation of automated qualification processes for the identification of pollutants in an aquatic environment from high-resolution mass spectrometric nontarget screening data. ACS ES&T Water. 2023;3(3):765–72.
Article CAS Google Scholar
McEachran AD, Mansouri K, Grulke C, Schymanski EL, Ruttkies C, Williams AJ. “MS-ready” structures for non-targeted high-resolution mass spectrometry screening studies. J Cheminformatics. 2018;10(1):45.
Article Google Scholar
Broeckling CD, Hoyes E, Richardson K, Brown JM, Prenni JE. Comprehensive tandem-mass-spectrometry coverage of complex samples enabled by data-set-dependent acquisition. Anal Chem. 2018;90(13):8020–7.
Article CAS PubMed Google Scholar
Yang Y, Yang L, Zheng M, Cao D, Liu G. Data acquisition methods for non-targeted screening in environmental analysis. TrAC Trends Anal Chem. 2023;160:116966.
Article CAS Google Scholar
Eliuk S, Makarov A. Evolution of Orbitrap mass spectrometry instrumentation. Annu Rev Anal Chem. 2015;8(1):61–80.
Article Google Scholar
Tsugawa H, Cajka T, Kind T, Ma Y, Higgins B, Ikeda K, et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat Methods. 2015;12(6):523–6.
Article CAS PubMed PubMed Central Google Scholar
Tsou C-C, Avtonomov D, Larsen B, Tucholska M, Choi H, Gingras A-C, et al. DIA-umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat Methods. 2015;12(3):258–64.
Article CAS PubMed PubMed Central Google Scholar
Peris-Díaz MD, Sweeney SR, Rodak O, Sentandreu E, Tiziani S. R-MetaboList 2: a flexible tool for metabolite annotation from high-resolution data-independent acquisition mass spectrometry analysis. Metabolites. 2019;9(9):187.
Article PubMed PubMed Central Google Scholar
Tada I, Chaleckis R, Tsugawa H, Meister I, Zhang P, Lazarinis N, et al. Correlation-based Deconvolution (CorrDec) to generate high-quality MS2 spectra from data-independent Acquisition in Multisample Studies. Anal Chem. 2020;92(16):11310–7.
Article CAS PubMed Google Scholar
Graça G, Cai Y, Lau C-HE, Vorkas PA, Lewis MR, Want EJ, et al. Automated annotation of untargeted all-ion fragmentation LC–MS metabolomics data with MetaboAnnotatoR. Anal Chem. 2022;94(8):3446–55.
Article PubMed PubMed Central Google Scholar
Guo J, Huan T. Comparison of Full-Scan, Data-Dependent, and Data-Independent Acquisition Modes in Liquid Chromatography–Mass Spectrometry Based Untargeted Metabolomics. Anal Chem 2020;92(12):8072-8080.
Hohrenk LL, Vosough M, Schmidt TC. Implementation of Chemometric tools to improve data mining and prioritization in LC-HRMS for nontarget screening of organic micropollutants in complex water matrixes. Anal Chem. 2019;91(14):9213–20.
Article CAS PubMed Google Scholar
Minkus S, Bieber S, Letzel T. Spotlight on mass spectrometric non-target screening analysis: advanced data processing methods recently communicated for extracting, prioritizing and quantifying features. Anal Sci Advan. 2022;3(3-4):103–12.
Article CAS Google Scholar
Helmus R, ter Laak TL, van Wezel AP, de Voogt P, Schymanski EL. patRoon: open source software platform for environmental mass spectrometry based non-target screening. J Cheminformatics. 2021;13(1):1.
Article CAS Google Scholar
Sinanian MM, Cook DW, Rutan SC, Wijesinghe DS. Multivariate Curve Resolution-Alternating Least Squares Analysis of High-Resolution Liquid Chromatography–Mass Spectrometry Data. Anal Chem 2016;88(22):11092-11099.
Navarro-Reig M, Jaumot J, García-Reiriz A, Tauler R. Evaluation of changes induced in rice metabolome by cd and cu exposure using LC-MS with XCMS and MCR-ALS data analysis strategies. Anal Bioanal Chem. 2015;407(29):8835–47.
Article CAS PubMed Google Scholar
Stolt R, Torgrip RJO, Lindberg J, Csenki L, Kolmert J, Schuppe-Koistinen I, et al. Second-order peak detection for multicomponent high-resolution LC/MS data. Anal Chem. 2006;78(4):975–83.
Article CAS PubMed Google Scholar
Gorrochategui E, Jaumot J, Lacorte S, Tauler R. Data analysis strategies for targeted and untargeted LC-MS metabolomic studies: overview and workflow. TrAC Trends Anal Chem. 2016;82:425–42.
Article CAS Google Scholar
Navarro-Reig M, Jaumot J, Baglai A, Vivó-Truyols G, Schoenmakers PJ, Tauler R. Untargeted comprehensive two-dimensional liquid chromatography coupled with high-resolution mass spectrometry analysis of Rice Metabolome using multivariate curve resolution. Anal Chem. 2017;89(14):7675–83.
Article CAS PubMed Google Scholar
Ortiz-Villanueva E, Benavente F, Piña B, Sanz-Nebot V, Tauler R, Jaumot J. Knowledge integration strategies for untargeted metabolomics based on MCR-ALS analysis of CE-MS and LC-MS data. Anal Chim Acta. 2017;978:10–23.
Article CAS PubMed Google Scholar
Gorrochategui E, Jaumot J, Tauler R. ROIMCR: a powerful analysis strategy for LC-MS metabolomic datasets. BMC Bioinformatics. 2019. p. 256.
Sheikholeslami MN, Gómez-Canela C, Barron LP, Barata C, Vosough M, Tauler R. Untargeted metabolomics changes on Gammarus pulex induced by propranolol, triclosan, and nimesulide pharmaceutical drugs. Chemosphere. 2020;260:127479.
Article ADS CAS PubMed Google Scholar
Lotfi Khatoonabadi R, Vosough M, Hohrenk LL, Schmidt TC. Employing complementary multivariate methods for a designed nontarget LC-HRMS screening of a wastewater-influenced river. Microchem J. 2021;160:105641.
Article CAS Google Scholar
Perez-Lopez C, Ginebreda A, Carrascal M, Barcelò D, Abian J, Tauler R. Non-target protein analysis of samples from wastewater treatment plants using the regions of interest-multivariate curve resolution (ROIMCR) chemometrics method. J Environ Chem Eng. 2021;9(4):105752.
Article CAS Google Scholar
Vila-Costa M, Martinez-Varela A, Rivas D, Martinez P, Pérez-López C, Zonja B, et al. Advanced analytical, chemometric, and genomic tools to identify polymer degradation products and potential microbial consumers in wastewater environments. Chem Eng J. 2022;442:136175.
Article CAS Google Scholar
Yamamoto FY, Pérez-López C, Lopez-Antia A, Lacorte S, de Souza Abessa DM, Tauler R. Linking MS1 and MS2 signals in positive and negative modes of LC-HRMS in untargeted metabolomics using the ROIMCR approach. Anal Bioanal Chem. 2023;415(25):6213–25.
Article CAS PubMed PubMed Central Google Scholar
Tauler R, de Juan A. Chapter 5 - Multivariate Curve Resolution for Quantitative Analysis. In: de la Peña AM, Goicoechea HC, Escandar GM, Olivieri AC, editors. Data Handling in Science and Technology. 29: Elsevier; 2015. p. 247-92.
Zhang X, Tauler R. Flexible implementation of the Trilinearity constraint in multivariate curve resolution alternating least squares (MCR-ALS) of chromatographic and other type of data. Molecules. 2022;27(7):2338.
Article CAS PubMed PubMed Central Google Scholar
Vosough M. Current challenges in second-order calibration of hyphenated chromatographic data for analysis of highly complex samples. J Chemom. 2018;32(12):e2976.
Article Google Scholar
Tauler R. Multivariate curve resolution applied to second order data. Chemom Intell Lab Syst. 1995;30(1):133–46.
Article CAS Google Scholar
Sheikholeslami MN, Vosough M, Esfahani HM. On the performance of multivariate curve resolution to resolve highly complex liquid chromatography–full scan mass spectrometry data for quantification of selected immunosuppressants in blood and water samples. Microchem J. 2020;152:104298.
Article CAS Google Scholar
Windig W, Guilment J. Interactive self-modeling mixture analysis. Anal Chem. 1991;63(14):1425–32.
Article CAS Google Scholar
Tomasi G, Savorani F, Engelsen SB. Icoshift: an effective tool for the alignment of chromatographic data. J Chromatogr A. 2011;1218(43):7832–40.
Article CAS PubMed Google Scholar
Jaumot J, de Juan A, Tauler R. MCR-ALS GUI 2.0: new features and applications. Chemom Intell Lab Syst. 2015;140:1–12.
Article CAS Google Scholar
Adusumilli R, Mallick P. Data conversion with ProteoWizard msConvert. In: Comai L, Katz JE, Mallick P, editors. Proteomics: methods and protocols. New York, NY: Springer New York; 2017. p. 339-368.
Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ Sci Technol. 2014;48(4):2097–8.
Article ADS CAS PubMed Google Scholar
Windig W. 2.17 - two-way data analysis: detection of purest variables. In: Brown SD, Tauler R, Walczak B, editors. Comprehensive Chemometrics. Oxford: Elsevier; 2009. p. 275–307.
Chapter Google Scholar
Hohrenk LL, Vosough M, Schmidt TC. Implementation of Chemometric tools to improve data mining and prioritization in LC-HRMS for nontarget screening of organic micropollutants in complex water matrixes. Analytical Chemistry2019. p. 9213-9220.
Sentandreu E, Peris-Díaz MD, Sweeney SR, Chiou J, Muñoz N, Tiziani S. A survey of Orbitrap all ion fragmentation analysis assessed by an R MetaboList package to study small-molecule metabolites. Chromatographia. 2018;81(7):981–94.
Article CAS Google Scholar
Baccolo G, Quintanilla-Casas B, Vichi S, Augustijn D, Bro R. From untargeted chemical profiling to peak tables – a fully automated AI driven approach to untargeted GC-MS. TrAC Trends Anal Chem. 2021;145:116451.
Article CAS Google Scholar
Escandar GM, Olivieri AC. Multi-way chromatographic calibration—a review. J Chromatogr A. 2019;1587:2–13.
Article CAS PubMed Google Scholar
Vosough M, Mason C, Tauler R, Jalali-Heravi M, Maeder M. On rotational ambiguity in model-free analyses of multivariate data. J Chemom. 2006;20(6-7):302–10.
Article CAS Google Scholar
Olivieri AC, Tauler R. The effect of data matrix augmentation and constraints in extended multivariate curve resolution–alternating least squares. J Chemom. 2017;31(3):e2875.
Article Google Scholar
Patel M, Kumar R, Kishor K, Mlsna T, Pittman CU Jr, Mohan D. Pharmaceuticals of Emerging Concern in aquatic systems: chemistry, occurrence, effects, and removal methods. Chem Rev. 2019;119(6):3510–673.
Article CAS PubMed Google Scholar
Koroša A, Brenčič M, Mali N. Estimating the transport parameters of propyphenazone, caffeine and carbamazepine by means of a tracer experiment in a coarse-gravel unsaturated zone. Water Res. 2020;175:115680.
Article PubMed Google Scholar
Lin X-Y, Yang Y-Y, Zhao Y-H, Fu Q-L. Biodegradation of bensulfuron-methyl and its effect on bacterial community in paddy soils. Ecotoxicology. 2012;21(5):1281–90.
Article CAS PubMed Google Scholar

Download references

Acknowledgments

This work was supported by the German Research Foundation (DFG) [grant number 520243139]. Open access funding was enabled and organized by Projekt DEAL via the Open Access Publication Fund of the University of Duisburg-Essen.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Instrumental Analytical Chemistry and Centre for Water and Environmental Research (ZWU), University of Duisburg-Essen, Universitätsstr. 5, Essen, 45141, Germany
Maryam Vosough, Amir Salemi, Sarah Rockel & Torsten C. Schmidt
Department of Clean Technologies, Chemistry and Chemical Engineering Research Center of Iran, P.O. Box 14335-186, Tehran, Iran
Maryam Vosough
IWW Water Centre, Moritzstr. 26, Mülheim an der Ruhr, 45476, Germany
Torsten C. Schmidt

Authors

Maryam Vosough
View author publications
You can also search for this author in PubMed Google Scholar
Amir Salemi
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Rockel
View author publications
You can also search for this author in PubMed Google Scholar
Torsten C. Schmidt
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Maryam Vosough: Conceptualization, Methodology, Software, Data analysis, Writing–Reviewing and Editing. Amir Salemi: Methodology, Formal analysis, Reviewing and Editing. Sarah Rockel: Formal analysis, Data collection and Curation. Torsten C. Schmidt: Resources, Funding acquisition, Reviewing and Editing.

Corresponding author

Correspondence to Maryam Vosough.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(DOCX 2052 kb)

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Vosough, M., Salemi, A., Rockel, S. et al. Enhanced efficiency of MS/MS all-ion fragmentation for non-targeted analysis of trace contaminants in surface water using multivariate curve resolution and data fusion. Anal Bioanal Chem 416, 1165–1177 (2024). https://doi.org/10.1007/s00216-023-05102-x

Download citation

Received: 22 September 2023
Revised: 18 November 2023
Accepted: 30 November 2023
Published: 11 January 2024
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00216-023-05102-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Enhanced efficiency of MS/MS all-ion fragmentation for non-targeted analysis of trace contaminants in surface water using multivariate curve resolution and data fusion