Introduction

Protein glycosylation refers to the covalent attachment of carbohydrates to polypeptides and represents a class of prevalent and structurally diverse co-translational and post-translational modifications (PTMs) that impact a huge number of biological processes1,2,3,4,5,6. Carbohydrate modifications include single monosaccharides and complex carbohydrate chains, both referred to as glycans. Protein glycosylation is a non-templated process and is mediated by enzymes known as glycosyltransferases, responsible for the initiation or elongation of glycans, and oligosaccharyltransferases, responsible for the addition of whole carbohydrate chains. In cells, the complex interplay between glycosyltransferases or oligosaccharyltransferases, carbohydrate transporters and glycosidases — the enzymes that remove these carbohydrates — fine-tunes the glycan structures observed on individual proteins and regulates glycoprotein function, with effects on biological processes that include cellular development7, cell–cell communication8, host–microorganism interactions9,10 and immunity5,11,12. For example, the recruitment of leukocytes to sites of inflammation is precisely controlled by specific glycan structures that mediate interactions with cell-surface lectins to enable selective and site-specific leukocyte homing5,7,11,12. Dysregulation of glycosylation is associated with numerous diseases, including cancer13,14,15,16, infection and inflammation17,18,19,20,21,22, schizophrenia23 and a wide range of congenital and neurological disorders24,25,26. Unravelling the role of glycosylation under both physiological and pathophysiological conditions is a long-standing goal of glycobiology and has driven the rapid development of methods to track glycosylation for diagnostic and therapeutic purposes27,28.

Glycosylation is a universal protein modification across all domains of life with structurally distinct subclasses and glycan types now recognized29,30,31,32,33,34 (Fig. 1a,b). Our knowledge of mammalian asparagine-linked (N-linked) and serine/threonine-linked (O-linked) glycans is the most developed, and these modifications are therefore the focus of this Primer. Characterizing the glycoproteome involves the identification of glycoproteins as well as definition of the macroheterogeneity (structural diversity owing to the presence or absence of glycans at specific glycosylation sites) and microheterogeneity (structural diversity of glycosylation patterns at individual glycosylation sites)35 within these proteins. Microheterogeneity can arise through differences in the number and type of individual monosaccharide residues within the glycan, the structural arrangements and branching patterns of these monosaccharides or the configuration of anomeric linkages (see Box 1 for a guide to the symbol nomenclature for glycans). Ultimately, identifying glycosylation sites and discrete glycan structures is crucial for understanding the roles of glycan-dependent functions in biological processes.

Fig. 1: Protein glycosylation classes and common glycans observed across mammalian systems.
figure 1

a | A range of glycosylation types exist, with most eukaryotic cells possessing multiple pathways for protein glycosylation. Glycosylation involves the installation of glycans on proteins, with N-linked pathways targeting the nitrogen of asparagine residues, O-linked pathways targeting the oxygen atoms of serine/threonine residues and C-linked pathways targeting the second carbon of tryptophan residues. Many of these glycosylation events are observed on proteins known to be secreted or displayed extracellularly, as denoted here, owing to the role of glycosylation in mediating extracellular protein stability and membrane protein recognition. Intracellularly, O-GlcNAcylation has a crucial role in cellular signalling events. b | A range of common glycan classes is observed across mammalian N-linked and mucin-type O-linked glycosylation. N-linked glycans include paucimannose, oligomannose, and complex and hybrid structures. Paucimannose carries one to three mannose (Man) residues on a chitobiose core with variable core fucosylation. Oligomannose glycans contain terminal branches composed only of mannose sugars. Complex and hybrid glycans may contain galactose (Gal), N-acetylglucosamine (GlcNAc), N-acetylgalactosamine (GalNAc), fucose (Fuc), N-acetylneuraminic acid (NeuAc) and N-glycolylneuraminic acid (NeuGc) residues in their antennae, with hybrid glycans also containing unsubstituted terminal mannose residues. Eight core structures have been described for mucin-type O-linked glycosylation, which differ in their composition and linkage position of branches to a protein-linked GalNAc. Non-canonical glycans introduced using metabolic oligosaccharide engineering approaches are also possible; for non-canonical glycans, the presence of monosaccharides bearing chemical handles such as alkyne or azide (N3) groups allow glycan-specific labelling and/or enrichment. GlcA, glucuronic acid; Xyl, xylose.

Glycoproteomics refers to the systems-level study of protein-linked glycans and is a rapidly evolving analytical field that aims to profile glycosylation events observed within biological samples36,37. The characterization of intact glycopeptides is an attractive analytical strategy as only intact glycopeptides can provide direct evidence of the site-specific glycosylation of proteins. Bottom-up glycoproteomics using liquid chromatography–tandem mass spectrometry (LC–MS/MS)-based profiling of intact glycopeptides allows for cell-wide, tissue-wide and organism-wide map** of glycosylation events and the ability to address their functional roles in biological processes38. This is in contrast to commonly used techniques that involve the study of detached glycans — a field known as glycomics39 — or formerly N-linked glycosylated peptides (N-glycosylation site map**40).

LC–MS/MS-driven glycoproteomic approaches have been refined considerably over the past decade and these strategies are increasingly being used for quantitative map** of glycosylation sites within complex mixtures (as previously reviewed36,38,41,42,43,44,45,46,47,48,49,50,51). Technological and computational advances now enable the characterization of thousands of intact N-glycopeptides and O-glycopeptides within a given glycoproteomics experiment52,53,54,55,56,57,58,59,64,65,66,67,68. The choice of sample will affect the degree of sample processing needed (Table 1). For a given sample, the depth of analysis required is dependent on the total number of proteoforms present and the relative abundance and dynamic range of glycoproteins within the sample. For samples of low complexity, glycosylation analysis can be accomplished with low microgram levels of material, although milligram amounts may be needed for complex samples in which the glycoproteins of interest are present in low concentrations. In general, samples of low complexity with a high glycoprotein abundance will allow for better characterization of glycosites and glycoforms, which underpins the rationale for separating or enriching glycoproteins or glycopeptides before analysis (see below)69,70,71,72.

Table 1 Sample considerations

Biological relevance is important to consider if analysing recombinant glycoproteins from different sources. The observed glycosylation sites and glycan structures of proteins heterologously expressed under in vitro conditions, such as in genetically modified immortalized cell lines, may differ from in vivo sources as the repertoire of expressed glycosyltransferases and glycosidases can vary between cell types32. This is evident for viral envelope glycoproteins such as the HIV-1 envelope protein (Env) and SARS-CoV-2 spike glycoprotein, where higher degrees of N-glycan processing are found on native virions than ectopic expression of individual viral proteins in cell lines73. Furthermore, there can be notable differences in glycosite occupancy and glycan structure between native oligomeric proteins and individually expressed subunits, likely influenced by differences in the accessibility of the subunits and the protein quaternary structure to glycosyltransferases69,70,74,75. Thus, care should be taken to ensure that the models used reflect the biological question being explored as closely as possible.

The redundant and overlap** specificities of glycosyltransferases have profound impacts on glycosylation patterns, as compensation and competition for substrates can make the observed relationships between glycosyltransferases and glycosylation events highly context dependent even across similar cell types. This is best illustrated for O-linked, mucin-type glycosylation, which is governed by the expression of several members of a large family of GalNAc-transferase (GalNAc-T) isoforms6. A diverse array of biological specimens have been probed to study the breadth of the O-glycoproteome53,66,67,68,76,77,78. The competition for substrates between GalNAc-T isoforms is complex and largely unclear, and genetically engineered cell lines have been used to dissect substrates of specific GalNAc-T isoforms79,80. Further, isogenic cell lines and transgenic animal models generated using gene editing have identified GalNAc-T isoform-specific substrates in the context of both simplified and natural glycan structures79,81,82,83. These findings highlight the benefits of genetic approaches for understanding glycosylation site specificity in situations in which complex interplays exist. Considering this known complexity associated with glycosylation substrates for many glycosylation systems, it is advisable to include several biological replicates representing different clonal lineages of genetically engineered cell lines and only consider consistent changes relevant83,84.

Sample preparation

Protein isolation and buffer considerations

Optimal protein isolation is key for efficient downstream sample processing in all proteomic experiments. Protein extraction from tissues can require pre-treatment with enzymes or ethylenediaminetetraacetic acid (EDTA) to release cells from the extracellular matrix before cell lysis. Once isolated, cells can be lysed with cryogenic homogenization, mechanical disruption using sonication or mechanical grinding in buffers that contain strong detergents such as sodium dodecyl sulfate (SDS) or chaotropic agents85,86,87,88. Complex tissue-derived and cell-derived samples will rarely be solubilized completely and often require clearing of the lysates by centrifugation to remove insoluble material. Homogenization may also be necessary for viscous biological secretions such as sputum or intestinal mucus89,90. It should be noted that several commonly used cationic, anionic or zwitterionic detergents can interfere with proteolytic digestion and may cause LC–MS analyte signal suppression without subsequent clean-up (see below)91,92. MS-compatible detergents such as RapiGest76,93,94, N-dodecyl β-d-maltoside95 or ProteaseMAX96 have been used for glycoproteomic studies to solubilize membrane proteins and can be combined with orthogonal isolation methods such as mechanical disruption to enhance protein isolation77,97. Notably, these MS-compatible detergents can be less effective solubilization agents than strong detergents such as SDS98. The isolation of membrane-bound glycoproteins requires vigorous disruption of the cell membrane followed by a solubilization step that uses detergents or chaotropic agents to prevent the precipitation of hydrophobic proteins99; for soluble secreted glycoproteins, the most important consideration when preparing the sample is to avoid contamination from exogenous protein sources commonly used to maintain cell lines, such as fetal bovine serum, which can be achieved by briefly culturing cells in serum-free medium100.

For many glycoproteomic studies, it may be essential to ensure complete linearization of glycoproteins during solubilization by removing disulfide linkages with the aid of reduction agents such as dithiothreitol (DTT) or tris(2-carboxyethyl)phosphine (TCEP). Ensuring protein linearization can improve the ability of detergents to coat hydrophobic regions within glycoproteins; however, this process also results in the generation of reduced cysteine residues, which are extremely reactive and readily undergo oxidation as well as other chemical transformations. Alkylation of reduced cysteines can ‘cap’ these reactive amino acids, preventing the formation of undesirable cysteine products and the re-formation of disulfide linkages during sample preparation. Iodoacetamide is commonly used to alkylate cysteine residues during glycoproteomic sample preparation. Although alkylation is advantageous for improving the detection of cysteine-containing peptides, it has been noted that the underalkylation or the unintended alkylation of residues such as methionine (overalkylation) can cause the misassignment of glycan compositions, as these events unexpectedly change the glycopeptide mass to match isobaric alternative glycan compositions, leading to incorrect glycopeptide assignment61. Both glycoproteomic61 and proteomic101 studies have highlighted that underalkylation and overalkylation are commonplace, and care should be taken to ensure that alkylation reagent concentrations and incubation times are optimized for the given sample.

Glycoproteome clean-up approaches

To facilitate the analysis of chemically solubilized samples, recent advancements in sample preparation offer attractive solutions to removing interfering chemical agents such as salt and detergents before subsequent MS analysis. Three such approaches are filter-aided sample preparation (FASP)102, suspension traps (S-traps)103,104 and methods based on protein aggregation capture (PAC)105,106,107,108,109 (Fig. 2). These methods involve binding proteins to solid-phase supports such as filters (FASP), quartz mesh (S-traps) or magnetic particles (PAC) and washing with chaotropic agents or organic solvents to remove contaminants; digestion of the bound proteins then releases peptides for subsequent analysis. FASP-based sample preparation is well established and has been implemented in numerous N-glycoproteomic studies across species and tissues64,110, whereas S-traps and PAC-based approaches such as single-pot, solid-phase-enhanced sample preparation (SP3)111 are a more recent addition to the glycoproteomics toolkit (although they have been implemented in several glycoproteomic studies)112,113,114. These approaches can be used for sample amounts as low as a few micrograms to several milligrams of protein, and they result in high peptide recovery rates102,103,104,111. It was recently demonstrated that PAC enables the removal of chemical or affinity tag agents typically used in click-based labelling105,106, making PAC particularly appealing for bioorthogonal glycoproteomic sample preparation.

Fig. 2: Sample preparation.
figure 2

Glycoproteomic sample preparation can be summarized into six key steps. a | Proteins for glycoproteomic analysis are extracted and solubilized from samples of interest such as from cell culture models using a cell disruptor to lyse the cells. b | Protein mixtures are processed to remove potential interfering reagents for downstream processing with filter-aided sample preparation (FASP), quartz mesh (S-trap) and protein aggregation capture (PAC)-based approaches commonly used. c | The resulting protein preparations are then digested with proteases and/or glycoproteases to generate mixtures that contain the glycopeptides of interest for downstream analysis. Digestion of FASP, S-trap or PAC prepared samples allows the release of peptides from the captured proteins enabling their collection for downstream liquid chromatography–mass spectrometry (LC–MS) analysis. At this stage, glycosidases can also be used to remove specific glycans of interest or modify glycans to enhance their downstream detection by reducing microheterogeneity. d | The resulting peptide mixtures containing the glycopeptides of interest can be concentrated and purified, allowing the removal of non-digested proteins, enzymes or buffer components that may interfere with chemical labelling or enrichment approaches. Several solid-phase clean-up media can be used to achieve this, including C18, hydrophilic–lipophilic balance (HLB) or styrenedivinylbenzene–reverse phase sulfonate (SDB–RPS) resins, which can be implemented in solid-phase extraction (SPE) cartridge, plate or microcolumn (Zip/STAGE tips) formats. e | Further peptide-based chemical derivatization can be undertaken to enable enrichment, quantification or to enhance the detection of glycopeptides during downstream LC–MS analysis. For example, the incorporation of positively charged imidazolium groups within biotin-based enrichment handles can be used to improve electron-driven dissociation (ExD)-based fragmentation. f | Glycopeptides of interest can be enriched using affinity approaches before LC–MS analysis, such as streptavidin enrichment of biotin-labelled metabolic ogligosaccharide engineering (MOE) samples, lectin weak affinity chromatography (LWAC), which exploits the binding of lectins to specific sugars, or hydrophilic interaction liquid chromatography (HILIC), which retains glycopeptides based on hydrophilic interactions.

Proteome digestion approaches

After clean-up, glycoproteins can be digested using proteases to produce individual peptides and glycopeptides (Fig. 2). The conversion of proteins into (glyco)peptides offers a range of analytical advantages in both downstream separation and mass spectral analysis. Reducing the chemical heterogeneity of a proteome to a mixture of soluble peptides enables separation with much higher resolution than intact proteins. Furthermore, smaller peptides fragment more efficiently and produce simpler spectra, aiding the characterization of modification sites. The workhorse protease for glycoproteomics is trypsin, which cleaves at the C terminus of arginine or lysine residues with high specificity, efficiency and robustness. This generates peptides that can be protonated at the amine-containing N terminus and the arginine/lysine residue at the C terminus, resulting in rich MS/MS spectra when analysed in positive polarity mode. Although trypsin is the protease of choice for most N-glycoproteomic and O-glycoproteomic analyses, O-glycosites are commonly found in dense clusters notoriously resistant to tryptic cleavage owing to a lack of arginine/lysine residues96, which limits the applicability of trypsin to these densely O-glycosylated domains. To address this issue, many groups have employed digestion with several alternative proteases that possess different cleavage specificities to increase proteome coverage, such as chymotrypsin to cleave C-terminally to phenylalanine, tryptophan and tyrosine; GluC, which cleaves C-terminally to glutamic acid and to a lesser extent aspartic acid, or AspN, which cleaves N-terminally to aspartic acid and to some extent glutamic acid72,115,116,117.

Non-specific proteases such as Pronase and Proteinase K have also been used to analyse a range of glycosylated proteins. Pronase is a commercially available mixture of proteases isolated from Streptomyces griseus that exhibits both exoprotease and endoprotease activities and yields a crude mixture of heterogeneous peptide fragments118. Pronase is useful for the glycoproteomic analysis of samples of modest complexity119; however, the peptide heterogeneity generated by Pronase digestion is a major issue for quantitative site-specific glycan profiling. Similar to Pronase, Proteinase K is an endoprotease that cleaves at the C termini of aliphatic and aromatic residues and is often used in conjunction with trypsin digestion for glycosylation site localization of simple mixtures120. The drawback of both non-specific digestion techniques is that the resultant data must be searched against all theoretical peptides, producing an extremely large search space that increases search time and false discovery rates (FDRs; discussed below)121. Further, the propensity of these proteases to generate relatively short glycopeptides limits their usefulness for complex samples, as map** the identified glycopeptides to specific proteins can be difficult. Thus, the use of non-specific proteases is typically restricted to single-protein mixtures, where this approach is most appropriately used to characterize regions such as mucin domains that cannot be accessed by other enzymes122. It should also be noted that despite these challenges, the high levels of peptide heterogeneity observed with these enzymes can be advantageous for applications such as the localization of glycosylation events to specific amino acids119,120,122.

Glycoproteome-centric proteases (O-glycoproteases)

Glycoproteases are increasingly being used in O-linked glycoproteomic studies123. O-glycoproteases have modest peptide sequence specificities, cleaving the peptide backbone based on the presence of various O-linked glycans and allowing the digestion of glycosylated regions resistant to other proteases. OgpA, derived from Akkermansia muciniphila and marketed and sold as OpeRATOR, was the first commercial O-glycoprotease. This enzyme cleaves at the N terminus of serine or threonine residues that bear truncated glycans such as GalNAc or GalNAc-Gal, also known as core 1 O-glycans (Fig. 1b). OgpA has been used for the digestion of isolated O-glycoproteins, cell lysates and tissues56,124. Its main drawback is that it is unable to cleave glycopeptides decorated with sialic-acid-containing O-glycans; thus, samples must be sialidase-treated before proteolytic digestion. Additionally, OgpA can be inefficient in regions that are densely glycosylated, requiring downstream electron-based fragmentation for confident O-glycosite localization63.

Several glycoproteases other than OgpA have been introduced to the field. Secreted protease of C1 esterase inhibitor (StcE), derived from enterohaemorrhagic Escherichia coli, is specific for a serine/threonine*-X-serine/threonine motif, cleaving before the second serine/threonine (the asterisk indicates that the first serine/threonine is invariably glycosylated). StcE improved the analysis of densely O-glycosylated mucin-domain glycoproteins, increasing protein sequence coverage, the number of glycosites identified and the number of localized glycans in proteins studied96. Expanding on this concept exploiting the diversity of bacterial glycoproteases as glycoproteomic tools, the Bertozzi group compiled a glycoprotease toolkit of six additional enzymes: Bacteroides thetaiotaomicron 4244 (BT4244), A. muciniphila 0627 (AM0627), 1514 (AM1514) and 0608 (AM0608), enteroaggregative E. coli protease involved in colonization (Pic), and Streptococcus pneumoniae zinc metalloprotease C (ZmpC), where each has a different cleavage motif125. Similarly, other groups have demonstrated that enzymes such as the coagulation-targeting metalloendopeptidase (CpaA) of Acinetobacter baumannii126 and the immunomodulating metalloprotease (IMPa) from Pseudomonas aeruginosa also cleave glycosylated serine and threonine residues with unique specificities127.

Endoglycosidases and exoglycosidases

Endoglycosidases release oligosaccharides from the protein attachment site or within the glycan chain, whereas exoglycosidases trim monosaccharides from the non-reducing termini of the glycan chain128. The removal of glycans or the reduction of glycan heterogeneity can concentrate the observable signal of glycosylated or previously glycosylated peptides to a limited number of chemical species, which can enhance the detection of glycosylation events. One of the most commonly used endoglycosidases is PNGase F, which cleaves intact N-glycans from proteins and deamidates the previously modified asparagine residue to aspartic acid. Similar enzymes such as Endo F and Endo H cleave within the chitobiose N-glycan core to leave a single GlcNAc on the modified asparagine residues129,130. A universal endo-O-glycosidase has not been characterized, although some glycosidases can remove truncated O-glycan structures, for example, OglyZOR, a commercially available endoglycosidase derived from Streptococcus oralis that hydrolyses truncated core 1 O-glycans. Commercial glycosidases derived from S. pneumoniae and Enterococcus faecalis that release core 1 and (to a limited extent) core 3 O-glycans are also available. Many O-glycosidases have limited activity if the glycans are modified by sialic acid or GlcNAc and thus must be used in conjunction with other glycosidases to remove these modifications44.

Exoglycosidase treatment is commonly used to simplify glycoproteomic analyses. Sialidases are often used to remove sialic acids, reduce microheterogeneity and limit the number of detected glycoforms, which can improve the identification of glycopeptides131. Broad-acting sialidases such as neuraminidase A can remove sialic acid residues α2,3, α2,6 or α2,8 linked to a glycan, whereas some sialidases are specific for a particular linkage; for example, Clostridium perfringens neuraminidase is commonly used to cleave α2,3 linkages78. Other exoglycosidases used in O-glycoproteomics include β1,4-galactosidase from S. pneumoniae, which removes β1,4-linked galactose, and β-N-acetylhexosaminidase — also from S. pneumoniae — which removes terminal non-reducing HexNAc residues from oligosaccharides49. Owing to the innate specificity of these enzymes, exoglycosidases are useful for trimming glycans for targeted characterization of glycan epitopes and simplifying glycoproteomic analysis. However, removing monosaccharides does limit the information that can be gleaned using intact glycoproteomics.

Chemical and biological affinity-based glycopeptide enrichment

In-depth glycoproteomic analysis benefits from selective enrichment of glycopeptides with affinity-based approaches broadly used across the field and are classified as being chemical or biological in nature. Within this section we introduce common protocols for N-glycopeptide and O-glycopeptide enrichment yet highlight that for a detailed discussion of the breadth of glycopeptide enrichment approaches used across the community readers are referred to exhaustive literature on this topic36,41,43,129,132.

Some of the first proteome-scale studies of glycosylation events used chemical enrichment strategies such as the covalent tethering of glycoproteins or glycopeptides to hydrazide-based resins through cis-diols within the carbohydrate chains. These approaches allow the formation of covalent linkages between resins and the glycopeptides or glycoproteins of interest and allow the removal of non-glycosylated peptides or proteins with detergents or chaotropic agents followed by the elution of the enriched glycopeptides by enzymatic or chemical cleavage of the linked glycans133,134,135,136,137,138,139,140,141. The need to release N-glycans of glycopeptides using PNGase F or the acid hydrolysis of hydrazide-linked sialic acids in these methods has led to the development of alternative chemical enrichment approaches that do not require the removal or alteration of glycan structures. For example, several boronic acid-based resins have been developed that allow glycopeptide enrichment using reversible covalent tethering of glycopeptides66,68. Vicia villosa agglutinin (VVA) is also well suited for the enrichment of glycopeptides that bear a single O-GalNAc (Tn, Fig. 1b); this lectin was implemented into the SimpleCell O-glycoproteomics approach, where cultured cells are genetically engineered to express homogeneous O-GalNAc glycosylation76,77. Both LWAC and antibody-based enrichment allow glycopeptides to be isolated and eluted with competitive free-carbohydrate solutions155 or through denaturation of the affinity protein with acid114. In addition to its use in studying N-linked and O-linked glycosylation, LWAC-based enrichment has also been applied to study O-Man glycosylation. LWAC-based enrichment of O-Man glycopeptides has been achieved using concanavalin A (ConA) lectin, which recognizes O-linked, but not C-linked, α-mannose sugars94,157,158. It is important to note that the broad and poorly defined specificities of most lectins can complicate interpretation of glycopeptide enrichment results and care must be taken when interpreting glycans enriched with a given lectin.

Metabolic engineering of oligosaccharides for glycopeptide enrichment

Metabolic oligosaccharide engineering (MOE; Fig. 2) has emerged as an important strategy to profile N-glycans and O-glycans58,93,159,160. In MOE, monosaccharides are chemically modified with tags and incorporated into proteins with endogenous glycosylation machinery. The tags are stable in the cellular environment, but reactive against bioorthogonal click chemistry strategies, such as copper-mediated azide-alkyne cycloaddition161. The addition of ‘clicked’ functionalized biotin allows tagged glycopeptides to be enriched using streptavidin-conjugated beads before MS analysis129,162. Metabolic incorporation of clickable alkyne- or azide-modified sugars has been demonstrated for map** N-glycosites93 and O-GalNAc163,164,165 or O-GlcNAc proteomes166,167. One benefit of MOE is that the functionalized glycans can be incorporated into glycan structures without a chain-terminating effect, allowing additional sugars to be added by endogenous glycosyltransferases. However, labelling efficiency in MOE is extremely low, and reagents are of limited specificity as they can be interconverted and incorporated into unintended glycan structures. A bump-and-hole strategy can be used to label cellular glycans with engineered GalNAc-Ts that accept bumped GalNAc donors168,169,170, delineating GalNAc-T specificities. This strategy has been further developed using a metabolic labelling probe (GalNAzMe) for specific labelling of O-glycans171, as well as clickable tags (ITag) that stably increase glycopeptide charge172.

Analysis of glycopeptides

Glycopeptides are typically characterized using LC–MS/MS, whereby glycopeptides eluted from an LC column are ionized by electrospray ionization (ESI) and sequenced using a suite of tandem MS (MS/MS) dissociation methods41,48,49. Parameters for LC and MS/MS stages are key decision points in glycoproteomic experiments and ultimately have consequences for data quality and interpretation. Matrix-assisted laser desorption/ionization (MALDI)–MS is also a popular high-throughput approach for glycopeptide analysis, although the ability to automate ESI and directly couple it to separation technologies allows a greater dynamic range for complex samples and has made ESI-based LC–MS/MS the mainstay of most glycoproteomic methods. ESI-based LC–MS/MS strategies are therefore the focus of this section.

Liquid chromatography-based separation of glycopeptides

Most glycoproteomic methods use low-pH (pH <2) reverse phase liquid chromatography (RP-LC) to separate glycopeptides before MS/MS, with a C18-based stationary phase and flow rates that range from tens to hundreds of nanolitres per minute (nanoflow). RP-LC is a versatile and robust method widely used in proteomics as it offers a combination of high peak capacity and simplicity173. The retention and thus separation of glycopeptides in the RP-LC column is mostly driven by the hydrophobicity of the peptide backbone, although the size, conformation and monosaccharide content of glycans also contribute to retention behaviour174,175,176. Retention times are useful for glycopeptide identification in combination with the accurate precursor mass and tandem MS spectra, especially when ambiguous MS/MS spectra generate several potential glycopeptide candidates. Prediction tools can help incorporate this orthogonal information from RP-LC177,178,179, although adoption of these data into informatic tools is not yet ubiquitous.

There is no universal separation technique that is ideal for all classes of glycoconjugates129, and although RP-LC is the dominant separation modality in LC–MS/MS glycoproteomics, it does have some drawbacks, such as the co-elution of isomeric glycoforms owing to their identical peptide sequences180,181,182. Although the use of elevated column temperatures in RP-LC can allow the separation of isomeric N-glycopeptides and O-glycopeptides183, this does not always provide adequate separation of all isomeric species. Alternatively, HILIC-LC, in which separation is largely influenced by the hydrophilicity imparted by glycan moieties, can be used in online glycopeptide separations and is effective at separating isomeric species that differ only in glycan linkage position and branching184,185,186. Several HILIC-LC resins exist187 and new HILIC resins provide novel separation characteristics that may be beneficial for specific glycopeptide classes181. Another RP-LC alternative uses porous graphitized carbon (PGC) as the stationary phase, which retains polar compounds with MS-compatible solvents188 and is highly advantageous for separating released glycans189. Its use for separating glycopeptides is somewhat complicated as both hydrophobicity and charge contribute to retention using this separation modality190,191,192; furthermore, highly sialylated glycopeptides and glycopeptides derived from commonly used proteases such as trypsin, GluC or chymotrypsin are difficult to elute from the resin, meaning non-specific proteases that generate shorter glycopeptides are typically required193,194,195,196,197. PGC-LC has been shown to separate isomeric N-glycopeptides and O-glycopeptides198, and separation of glycopeptides with α2,3-linked or α2,6-linked sialic acids can be modulated by column temperature199. However, challenges with the elution of large glycopeptides owing to the retention of hydrophobic species have limited the widespread use of PGC-LC in LC–MS/MS glycoproteomics. We compare separation techniques in Table 2. It is worth noting that although the above-mentioned LC-based approaches are traditionally performed using columns, they can also be successfully employed using chip-based fluidic devices180.

Table 2 Online separation options for glycopeptide analysis

Non-liquid chromatography-based separation of glycopeptides

Separation techniques other than LC are increasingly finding applications in the fine structural analysis of glycans and glycopeptides38. Online capillary electrophoresis (CE) is an emerging tool for glycoproteomics that can separate glycopeptide isomers and offer potential improvements in reproducibility and sensitivity200,201,202,203. Electrophoretic mobility in CE is governed by glycopeptide charge-to-size ratios, and, as a result, glycan composition (and especially sialic acid content) can affect migration, providing glycan-based separation of glycoforms of the same peptide backbone204,205,206. Gas-phase separations of glycopeptides following LC or CE can also be used to separate isomeric glycopeptides; these techniques include ion mobility spectrometry (IMS) approaches207,208,209,210 such as travelling-wave IMS211,212,213,214,215, differential/high-field asymmetrical waveform IMS216,217,218,219 and drift-tube IMS220,221,222,223. In addition to allowing isomeric separation, IMS has also been shown to enable separation of glycosylated species from non-modified peptides, providing access to glycopeptides incompatible with chromatographic enrichment224,225.

The benefits of individual separation approaches (which are summarized in Table 2) can be leveraged together. Offline separation is typically used to fractionate complex mixtures of glycopeptides — usually enriched before fractionation — into multiple samples, with each sample then analysed by LC–MS/MS using an orthogonal separation modality. This fractionation approach can markedly increase sensitivity by reducing the complexity of the mixture being analysed in each online LC–MS/MS analysis; conversely, this dramatically decreases throughput as the analysis of a single sample is spread across multiple LC–MS/MS acquisitions. One such prominent ‘2D’ glycoproteomic approach is offline high-pH RP-LC followed by online low-pH RP-LC57,228,229,230, although offline fractionation with HILIC-LC, PGC-LC and CE have been used prior to online low-pH RP-LC44,119,231,232. Other combinations of glycopeptide separation techniques can provide unique advantages of separating on both glycan and peptide components182, such as offline RP-LC coupled with online CE203, offline HILIC-LC coupled with offline PGC-LC followed by MALDI–MS233 and offline RP-LC coupled with online HILIC-LC43,77,258,259,260,261. ExD is also valuable for highly charged species, although the generation of sequence-informative fragment ions decreases at low precursor cation charge densities262. This can be problematic for glycopeptide analysis, in which neutral or negatively charged glycans add mass without a concomitant addition of positive charge. Additionally, glycan size and attachment site can affect ExD dissociation owing to secondary gas-phase structure effects263. Hybrid fragmentation methods that combine ExD with collisions (for example, electron transfer/higher-energy collision dissociation, or EThcD) or photons (activated-ion ETD) can address these issues57,241,264,265. Beyond improving fragment ion generation from ExD itself, these hybrid methods also generate fragment ion types from each dissociation mode — for example, in the EThcD regime, c/z-type peptide fragment ions are generated from ETD, and b/y-type peptide fragment ions and B/Y-type glycan fragment ions are generated from beamCID59,241,264,266. Photon-based dissociation methods, particularly ultraviolet photodissociation (UVPD), have also shown promise for generating information-rich spectra with multiple fragment ion types for glycopeptides267,268,269,270, but have yet to be explored for large-scale glycoproteomics.

Although ExD and related hybrid methods can generate high-quality spectra for both N-glycopeptides and O-glycopeptides, these methods often have reaction times of tens to hundreds of milliseconds per spectrum262. BeamCID, by comparison, provides near instantaneous fragmentation. BeamCID or SCE-beamCID methods are therefore more suited for large-scale N-glycopeptide analyses, where b/y-type ions — some of which retain an initiating HexNAc — and B/Y-type ions are mostly sufficient for identification271. Conversely, ExD-centric methods are favourable for O-glycopeptide characterization despite high time costs, as c/z-type ions that retain intact glycan modifications are often necessary for O-glycosite localization59,63,252,258,259,266,272. Experiments that require ExD often combine beamCID and ExD in a product-dependent fashion273,274,275. In product-dependent acquisition schemes, more expedient beamCID methods are used to sequentially fragment precursor ions to look for potential glycopeptides. Once a specific product ion is observed, for example, abundant oxonium ions from a given precursor, the instrument then triggers an ExD spectrum for that same ion, creating complementary pairs of beamCID and ExD spectra for the same precursor ions and relegating ExD spectral acquisition to only those ions that are likely to be glycopeptides.

Glycopeptide data acquisition approaches

Glycoproteomic methods rely heavily on data-dependent acquisition (DDA)38: here, the first mass spectrometer (MS1) scan measures intact glycopeptide ions across a wide m/z range (for example, m/z 400–1,800) as they elute from the LC column and are ionized by ESI. Ions are then isolated using ~1–3 atomic mass unit (amu) windows, fragmented using one of the dissociation strategies discussed above, and the subsequent fragment ions are measured in an MS/MS spectrum with the underlying assumption that fragment ions are largely derived from a single precursor ion. DDA typically prioritizes ions by abundance and sequentially selects analytes for MS/MS analysis, starting with the most abundant and/or desired charge states.

As an alternative to DDA, data-independent acquisition (DIA) isolates large overlap** windows of ions that are designed to cover a user-defined mass range276,277. Each window of ions may contain multiple peptide and glycopeptide species that co-isolate and are thus co-fragmented, and as a result MS/MS spectra contain fragments from multiple precursor ions50. DIA methods iterate over the same windows in a repeating fashion with a defined duty cycle regardless of the signal in MS1 scans, which can aid in sampling of low-abundance ions and improve reproducibility across multiple acquisitions. The complex MS/MS spectra resulting from DIA are challenging to interpret, especially for inherently complex analytes like glycopeptides277. A particular challenge that remains unresolved is the fact that related glycopeptide forms tend to generate near-indistinguishable fragment patterns, making it difficult to identify which precursor structures fragments arise from if captured in the same window. Several DIA methods for glycoproteomics have emerged in recent years278,279,280,281,282,283,284,285,286, and the momentum of DIA in traditional proteomics will likely propel a growth in DIA for glycoproteomics in the future if the above challenge can be overcome50. DIA could be especially beneficial for structure-focused glycoproteomics, as partially resolved, co-eluting glycoforms can be distinguished based on unique chromatogram profiles of fragment ions, enabling quantification of isobaric glycoforms38.

In DDA, the ability to combine several dissociation methods or acquisition styles (for example, product-dependent methods) allows the use of dynamic acquisition schemes that can leverage the strengths of multiple dissociation approaches252. Conversely, DIA requires rapid MS/MS acquisition to enable iterative sampling of all m/z windows across the mass range, which limits the range of dissociation methods that can be implemented efficiently and the ability to dynamically switch between dissociation methods. This limits DIA largely to beamCID-based strategies as ExD spectra simply require too much time to acquire, meaning most glycoproteomic methods that employ DIA to date have focused on simple mixtures of N-glycopeptides278,279,280,281,282,283,284,285. Although O-glycoproteomic studies using DIA have been described, they currently rely on additional DDA-based ExD methods for O-glycosite localization286. Instrumentation that reduces acquisition times for ExD spectra could have the potential to enable ExD-based DIA methods for large-scale glycoproteomics287,288.

Quantification approaches and multiplexing

Several strategies exist for the relative quantification of glycosylation across different samples including those targeted at live cells, proteins or peptides. These methods vary in their multiplexing capacity, quantification accuracy and time and cost effectiveness.

The most common type of quantification is label-free quantification (LFQ). Here, signal intensity or spectral counts are considered to determine relative abundance and each LC–MS analysis corresponds to a single sample, resulting in no sample multiplexing. LFQ analysis has been used to study a range of glycoproteomes including O-GalNAc286 and N-linked glycosylation events289. Although extremely accessible and cost effective, LFQ methods can be less accurate than other methods290.

Stable isotope labelling by amino acids in culture (SILAC) is a highly accurate yet costly method to identify and quantify relative differential changes in complex protein samples291. In this technique, cells are grown in the presence of ‘heavy’ 13C-labelled or 15N-labelled amino acid isotopologues to allow their incorporation into proteins, which leads to an observed mass shift in the MS1 spectrum of labelled peptides. By mixing labelled and unlabelled samples, the relative abundance of peptides or glycopeptides can be determined by comparing the ratio of the light and heavy forms at the MS1 level52,291,292. SILAC typically enables the multiplexing of up to three samples and has been used for N-glycoproteomic studies to understand insulin resistance within adipocytes52, track N-glycan processing and monitor temporal and stress-induced changes in O-GlcNAcylation events156,293. Other stable isotope-based labelling strategies for quantification at the MS1 level include dimethyl294,295 or diethyl296 labelling of peptides, which offers an inexpensive alternative for large-scale experiments and multiplexing of up to three samples296,297. These approaches have been applied for differential glycoproteomic analyses of O-GalNAc and O-Man glycoproteomes, allowing the study of the substrate specificities of GalNAc-Ts79,81 and the mannosyltransferases POMT1 and POMT2 (ref.157) and TMTC1–TMTC4 (ref.158).

A further strategy to enhance multiplexing is the use of isobaric labels that contain different stable isotopes298,299,300 such as isobaric tags for relative and absolute quantification (iTRAQ)301 and tandem mass tags (TMT)298. Upon fragmentation, reporter ions of various masses are generated and their intensities are used for quantification at the MS/MS or MS/MS/MS (MS3) level302,303 with multiplexed analyses of up to 18 samples possible304. An additional advantage of isobaric labelling for glycoproteomics is a notable increase in the observed charge states of glycopeptides, which enhances electron-driven fragmentation305. Despite the advantages, the high price of isobaric labels and the ability to label only submilligram quantities of samples using standard commercial kits306 is a potential drawback. TMT-based labelling has been applied to studying O-GalNAc84,307, O-GlcNAc308,309 and N-glycoproteomes310,311.

For sensitive applications in the clinical setting, absolute quantification of select glycopeptides is possible using internal standards such as stable isotope-labelled counterparts, which allow normalization across samples and direct comparison of analyte concentrations between different patients312,313. This approach enables reliable quantification of glycopeptides of interest in large patient cohorts, although it is limited by the time-consuming and high-cost synthesis of relevant glycopeptide standards.

Results

Comprehensive characterization of glycopeptides from MS data involves determining the peptide sequence, the site (or sites) of glycosylation and identity of the attached glycans. A growing number of software solutions enable the identification of glycosylation events (Table 3), and computational approaches associated with glycopeptide identification are rapidly develo**. Below, we highlight the features of different fragmentation data and discuss the existing tools and emerging bioinformatic methods. We also highlight the conceptual frameworks that underpin glycopeptide assignments, localizing glycosylation sites and defining glycans.

Table 3 Software tools for glycopeptide annotation of MS data

Glycopeptide sequence determination

Decades of developments in proteomics have provided various robust methods for identifying peptide sequences from MS data by comparing protein sequences from a reference database in silico with the observed spectra314,315. Such methods include Mascot316, SEQUEST317, Andromeda318 and MS Amanda319. Handling the addition of attached glycans of varying complexity poses great challenges with existing proteomic workflows; below, we discuss two major approaches that address these challenges, distinguished by whether peptide fragment ions are searched with or without attached glycans.

Searching peptide ions with the attached glycan: ‘variable modification’ searches

When treating attached glycans as variable modifications on peptides (Fig. 3a), possible glycan masses are specified on allowed sites, and theoretical glycopeptides containing these glycan masses are generated from the peptide sequences provided in a proteome database. The precursor mass for a given MS/MS spectrum is used to select candidate glycopeptides, which are then scored by comparing the observed MS/MS spectrum with the theoretical fragment ions of the glycopeptide candidates. Sequences supported by sufficient peptide fragment ion evidence result in a peptide spectral match (PSM). Glycopeptides present two major challenges for this approach: first, the heterogeneity of possible glycan structures can result in a huge number of candidate glycopeptides to consider when multiple possible glycosylation sites are available in a peptide sequence. Second, glycan fragments are often lost from glycopeptide ions in collisional or hybrid activation methods; as glycan modifications are specified as an integral part of the peptide in this approach, they are expected to be present in both MS1 and MS/MS spectra, and the loss of a glycan or parts thereof in the MS/MS spectrum will prevent matching theoretical ions containing the glycan (Fig. 3). For this reason, traditional proteomics tools have severely limited sensitivity for the sequencing of glycopeptides using collision-activation-based fragmentation.

Fig. 3: Glycopeptide sequence identification methods.
figure 3

a | Glycans can be searched as a variable modification of peptides, similar to how other post-translational modifications (PTMs) are identified in common proteomics searches. The in silico prediction of the search tool assumes that the fragment ions observed in the tandem mass spectrometry (MS/MS) events will preserve the glycan at the site of attachment in the peptide. b | For glycopeptides fragmented by collisional activation, offset-style searches can look for peptide ions that have lost the glycan directly within MS/MS scans. c | The glycan-first method of separating the precursor mass into peptide and glycan components uses a series of Y-type ions resulting from a known core structure to determine the glycan mass. Subtracting the glycan mass from the precursor mass yields the peptide mass, which is then used to determine candidate peptide sequences that are compared with the peptide fragment ions observed. d | The alternative peptide-first method uses an offset-style search to identify the peptide sequence from peptide fragment ions that have lost the glycans. The resulting peptide mass is subtracted from the precursor mass to yield the glycan mass, which can be matched to a specific composition or structure using the observed Y-type ions. m/z, mass to charge ratio.

Glycoproteomics-focused sequencing approaches can address the above challenges. One approach is to adapt an existing search engine to filter spectra for the presence of oxonium ions and add glycan masses to observed peptide ions112,229,320,321. A variation of this method179,322 first groups glycopeptide spectra using clustering methods before searching, allowing glycopeptide annotations to be transferred from one identified spectrum to the entire cluster. Other tools, including Byonic323,324,325,326,327, perform their own variable modification-style search with the inclusion of peptide fragment ions with various glycan additions or losses, using various scoring methods to evaluate glycopeptides (note that although this method is extremely sensitive, concerns have been raised about the accuracy of this approach328). Alternatively, tools such as Protein Prospector329 use a multi-step search, whereby an initial open search determines common glycan masses to be included in a second, more specific search330,331. Overall, variable modification searches are straightforward to implement for the localization of glycans — particularly those on glycopeptides fragmented by electron-based activation methods — although the inclusion of additional fragment types can reduce search speed, and some methods have reduced sensitivity in collision-activation data owing to glycan losses.

Searching peptide ions missing fragmented glycans: ‘offset’ searches

In offset searches, peptide sequence ions are searched directly without glycans (Fig. 3b). This offers greatly improved sensitivity over variable modification approaches for glycopeptides fragmented by collisional activation, as peptide fragments that have lost glycans (Fig. 3b) can be matched and contribute to the peptide score. The most common implementation of this method is a ‘glycan-first’ search, in which a series of Y-type ions corresponding to a common glycan core structure is used to determine the mass of the glycan and, by extension, the glycan-free peptide mass, which is then used to search for peptide fragment ions without the glycan (Fig. 3c). This approach has proved popular77. Further, computational approaches have been proposed to control glycopeptide FDRs at both the glycan and peptide levels54,348.

In contrast to the statistical controls for the peptide sequences assigned to glycopeptides, which are generally considered robust, the determination of glycan composition or structure is acknowledged to be a key limitation of intact glycopeptide analysis365. The software tools for the determination of glycan composition described above use a fragment-ion-based method for assigning glycans, and the accuracy of such assignments has largely been evaluated manually or with empirically determined score filters62. Manual expert-based curation of output data is time-consuming and often prohibitive for large-scale analysis of glycopeptides, prompting the development of glycan-specific FDR methods to enable automated control of false assignments. The linear sequence of amino acid residues can be reversed or shuffled to make a decoy peptide with the same amino acid composition as the target; however, non-linear glycans comprising multiple different building blocks of identical masses require a different method for decoy generation. GlycoPepEvaluator366 and IQ-GPA323 generate decoys by substituting monosaccharides and reversing or altering the glycopeptide sequence to obtain a decoy glycopeptide that is an isobar of the target glycopeptide and that contains a nonsensical glycan (Table 3). An alternative ‘spectrum-based’ FDR method implemented in GlycoPAT324 and pGlyco346 generates decoy glycans by applying random mass shifts to the fragment ions of a target glycan, preserving the fragmentation characteristics of the target glycan and assessing the likelihood of random matches to ions in the mass spectrum. This approach has been adopted by GPSeeker368,369. N-glycoproteomics has been extensively used to analyse various sources of neural tissue in an attempt to identify biomarkers for neural diseases, including stem cell-derived neural cells, mouse brains and patient-derived CSF88,370,371. Recently, comparative in-depth N-glycoproteomic analysis of CSF samples from healthy controls and patients with Alzheimer disease demonstrated differential N-glycosylation patterns between cohorts368. Similarly, comparisons of postmortem human Alzheimer disease and control brain tissue have shown quantitative changes in N-glycosite occupancy in clinically relevant proteins372.

N-glycoproteomics has also been explored as a tool for the early detection of cancer. Cancer models studied so far include ovarian cancer cell lines with differential resistance to the chemotherapeutic agent doxorubicin373, as well as patient serum samples369, and native and xenografted tissues from ovarian serous carcinoma369,374,375. These studies have demonstrated that the detection of select glycopeptide signatures may be useful in diagnostic applications, for the stratification of patients or to follow disease progression. Studies in other cancers have also shown differential abundance of select N-glycopeptides between tissues, serum and bodily fluids from healthy donors and patients with cancer, further suggesting that alterations in specific N-linked glycosylation events may correlate with cancer progression13,376 and that the integration of N-glycoproteomic profiles can improve diagnostic sensitivity compared with proteomics alone377,378,379,380,381.

Map** O-glycosylation

The application of O-glycoproteomics to a range of biological questions has resulted in a massive expansion of the mammalian O-glycoproteome53,76,77,124, leading to unexpected discoveries such as the discovery of O-glycosylated neuropeptides and peptide hormones67,382, O-glycans in LDLR-related protein linker sequences80 and extensive O-glycosylation of viral envelope proteins66,78.

The discovery of O-glycoproteases and their inactive mutants has led to the development of O-glycoprotein and mucin-domain glycoprotein enrichment methods. A notable example of using catalytically active O-glycoproteases for O-glycosite enrichment is the site-specific extraction of O-linked glycopeptides (ExoO) approach, which has been used to identify O-glycosites on more than 1,000 proteins across human kidney tissue, T cells and serum samples124. Inactive O-glycoproteases have also been shown to be robust affinity tools for enabling the differentiation of cancer-associated changes in mucin-domain-containing glycoproteins96,125. A recent preprint publication showed that inactive StcE-based enrichment was capable of isolating hundreds of O-glycopeptides from patient-derived ascites fluid, including many from MUC16 — the classic, gold-standard biomarker for ovarian cancer383.

Genetic knockouts of specific GalNAc-Ts have identified isoform-specific substrates in various cell lines and tissues68,79,81,82,83 that could give information on the pathophysiological mechanisms that drive congenital disorders of glycosylation384. Further genetic engineering-driven glycoproteomic strategies — first using zinc finger nucleases76,385 and more recently CRISPR-based approaches83,158 — have led to the discovery of novel glycosylation pathways such as an O-mannosylation system responsible for glycosylation of cadherins158. These discovery-driven applications of glycoproteomics have expanded our understanding of carbohydrate-binding proteins386,387,388, providing insights into how glycan recognition may have an important role in cancer development.

Glycoproteomics in multi-omic studies

Multi-omic approaches that combine transcriptomic and glycoproteomic analyses can provide context for the global consequences of N-glycoproteomic or O-glycoproteomic changes in cell systems, disease models and clinical specimens79,115,116,389,390. For example, in a clinical setting, combining N-glycoproteomic-based classification of tumours with transcriptomic changes led to biomarker discovery and prospective therapeutic targets based on the pathways identified391. Further, public genomic, transcriptomic or proteomic repositories of patient cohort data can be excellent sources of data for correlation with glycoproteomic data381, an approach that has been used to help understand global regulatory networks in cell differentiation programmes392.

Another successful multi-omic approach is to combine glycoproteomic data with data from phosphoproteomic analyses393,394,395,396,397. The integration of phosphoproteomics, proteomics, transcriptomics and glycoproteomics can provide comprehensive insights into disease mechanisms or tissue development, as recently shown for both N-linked and O-linked glycans83,398. In such multi-omic studies, transcript expression can be correlated with protein expression, and cross-referencing of PTMs with protein abundance and signalling networks gives a narrow selection of relevant targets for downstream study83.

The modelling of glycans at specific sites can be useful for understanding the functional impacts of changes in glycosylation. Multiple platforms provide tools for predicting 3D structures of carbohydrates attached to glycoproteins399, and it has been argued that new tools such as AlphaFold2 should be modifiable to incorporate PTMs such as glycosylation, which will enable far more realistic structural predictions400. Integrative bioinformatics tools such as the GlycoDomainViewer401 are now also beginning to emerge, which allows glycosylation sites to be assessed within the context of the protein sequence, domain architecture and other known PTM events.

Although MS-based glycoproteomic applications are becoming more mainstream, several challenges remain. Comprehensive characterization of glycosite microheterogeneity and reliable quantification of glycopeptides harbouring different glycans is still challenging in complex clinical samples. These challenges are exacerbated when the amount of sample is limited and when multi-omic analysis from an identical sample is required. Methods that preserve the natural context and provide reliable quantification should be prioritized given the limitations of cell culture-based systems. One of the next milestones for the community will be applying glycoproteomics at the level of individual cell types, or even at the single-cell level, which could provide insight into the spatiotemporal regulation of glycosylation in different tissues. Recent progress in MOE labelling has now shown that cell line-specific glycoprotein tagging can be achieved within in vivo models (as shown in a recent preprint article), opening new opportunities to explore cell lineage glycoproteomes in native contexts402. As the field develops, translating the findings of glycosite map** studies into a deeper understanding of the molecular mechanisms regulated by glycosylation will become the central goal of glycoproteomics.

Reproducibility and data deposition

Glycoproteomics is still a maturing field and, unlike proteomics and other omic disciplines, has yet to experience consolidation and harmonization of its experimental methodologies and informatics approaches. As the glycoproteomics community grows, it will be important to establish conventions and move towards the use of standardized approaches that reflect best practice for the collection, management and sharing of data. Below, we discuss factors that lead to known reproducibility issues.

Variations in data collection

A key factor that contributes to the lack of reproducibility in glycopeptide data sets across laboratories is the inconsistent and often incomplete description of sample handling, sample processing and data acquisition parameters such as those relating to LC–MS/MS experiments. Experimental variations in peptide generation, chemical derivatization or labelling steps and glycopeptide enrichment can greatly affect the resulting glycopeptide data and are often not fully explained. These differences can be compounded in the LC–MS/MS acquisition process by, for example, changing MS ionization and fragmentation behaviours. For these reasons, it is crucial to fully describe these parameters in published research. It should be noted that MS instrument cleanliness and chromatography performance are also vitally important for data integrity403.

A diverse set of experimental methods are available for glycoproteomics data generation as demonstrated by several glycopeptide-focused multi-laboratory studies conducted through the Human Proteome Organization’s Human Disease Glycomics/Proteome Initiative404,405, the Association of Biomolecular Resource Facilities406 and the National Institute of Standards and Technology (NIST)407. Although analytical diversity could be considered a strength of the field, several of these experimental methods, some using highly customized and non-commercial reagents, are employed by few groups worldwide, therefore making data difficult to reproduce. Standardization of methods across laboratories could reduce some of these observed variations, although we acknowledge it is unlikely that a one-size-fits-all approach to methodologies would be advantageous for many biological questions.

Variations in data analysis

Analysis of glycopeptide data is challenging and a source of variation in glycoproteomic experiments. A recent multi-institutional study performed by the Human Proteome Organization (HUPO) Human Glycoproteomics Initiative evaluating software tools for serum N-glycopeptide and O-glycopeptide analysis using glycopeptide data sets provided from various glycoproteomic laboratories found that the identified glycopeptides varied dramatically between laboratories even when the same informatic tools were employed, confirming that variables such as pre-processing and post-processing methods substantially affect glycopeptide assignments even on identical data sets365. Although this comparison identified several high-performance search strategies, the large variability in the performance of software tools and search parameters highlights that ongoing benchmarking to track and compare the performance of glycoproteomic informatics used across the community is crucial.

Data deposition and sharing

Data repositories will be essential for glycoproteomics data to comply with the FAIR data deposition standards408. The MIRAGE initiative has taken the lead in proposing reporting guidelines for glycomics409, and these are currently undergoing refinement to provide guidelines for glycoproteomic data. The MIRAGE guidelines have been adopted by several journals to ensure that consistent information is reported for glycomic experiments with the goal that the finalized glycoproteomics guidelines will provide a clear framework for the glycoproteomics community. To facilitate the sharing of data, glycoproteomic-centric repositories have been launched, for example, GlycoPOST410, which assigns unique identifiers to raw MS data for individual projects and provides input forms and spreadsheets to give users a template for providing metadata required by MIRAGE guidelines. The database UniCarb-DR411 is complementary to GlycoPOST and allows users to visualize glycan structures annotated in the raw MS data. At the time of writing, UniCarb-DR and GlycoPOST are both available from the GlyCosmos Glycoscience Portal412. ProteomeXchange413,414 is also available for (glyco)proteomic LC–MS data deposition. As avenues for data sharing are now established, all published glycoproteomic data should be made publicly available. Many journals are already beginning to implement this requirement and it is important to note that ensuring the public availability of data will be a community effort.

Limitations and optimizations

Several assumptions and experimental trade-offs shape the conclusions that can be drawn from glycoproteomic studies. Although workflows used to undertake glycoproteomics are continuously improving, a clear understanding of potential limitations and the underpinning assumptions associated with these workflows is needed to best interpret glycoproteomic data.

One MS/MS event, multiple glycoforms

A common assumption for glycopeptide MS/MS events is that each of the resulting spectra contains a single glycoform; however, multiple isobaric glycans57 or isomeric glycosylation states415 may be observed within a single MS/MS spectrum. Isobaric glycans and isomeric glycopeptides possess similar elution profiles when separated using chromatography approaches such as RP-LC, resulting in mixtures of glycoforms being subjected to MS/MS analysis (Fig. 5a). This leads to the generation of chimeric spectra that complicate the assignment of glycosylation sites and glycan arrangements (Fig. 5). Chimeric spectra have been observed in N-linked glycoproteomic studies54, and O-linked glycopeptides are known to display multiple isomeric species63. Careful analysis of chromatography-separated isomers270 or use of additional separation techniques such as IMS216 can help to resolve co-eluting isomeric glycosylation sites.

Fig. 5: Glycopeptide co-fragmentation and chimeric spectra.
figure 5

Glycopeptide co-elution and co-isolation of isomeric species can lead to the generation of chimeric spectra containing fragments from two or more precursor ions. a | Glycopeptide isomers can possess unique elution properties when separated with reverse phase separation, although some isomers may have closely related elution profiles. b | The presence of multiple glycopeptide isomers in samples can result in the observation of multiple overlap** Gaussian features in the chromatogram. c,d | Examining tandem mass spectrometry (MS/MS) spectra corresponding to different retention times results in distinct MS/MS spectra containing different mixtures of isomeric glycopeptide species. These chimeric spectra are identifiable by the presence of fragment ions corresponding to the modification attached to two residues, such as the c12 and z4 ions highlighted in blue. Mixtures of isomeric glycopeptides can result in chimeric spectra, supporting the assignment of mutually exclusive glycosylation events. ETD, electron transfer dissociation; m/z, mass to charge ratio.

MS-based glycan class assignments

MS data provide limited insights into monosaccharide identity or linkage information (see above). This lack of information limits the ability to assign glycan classes on the basis of mass alone. Although the conservation of glycosylation pathways in eukaryotic glycosylation systems does constrain many glycan compositions, which allows glycan classes to be predicted and/or assigned with reasonable confidence416,417, it is important to note that these should still be treated as unconfirmed assignments. Orthogonal methodologies can be used to further support the presence of specific glycans or linkage configurations such as the use of exoglycosidases418; the release of glycans and confirmation of specific glycans using isomeric resolving approaches such as PGC310,367; or the analysis of oxonium fragmentation patterns to support monosaccharide assignments114,350. In situations where glycans are ambiguous, restraint in the assignments of glycan classes is best practice. Alternatively, an increasingly accessible way to corroborate glycopeptide assignments is the use of synthetic glycopeptide standards, which allow subtle changes in retention time or fragmentation properties to be detected to support glycan identities419.

Ambiguous localizations

The community’s ability to assign glycosylation sites has seen a dramatic improvement over the past decade with multiple innovations in instrumentation and data acquisition, such as increased accessibility to ExD dissociation methods on multiple instrument platforms and improved data collection approaches57,341. These innovations do not guarantee that localization information will be obtained for a given glycopeptide, and a large proportion of glycopeptides are not able to be localized within most data sets. A growing question within the field is whether site localization is needed for all glycosylation experiments, especially if localization comes at the cost of speed and subsequent glycoproteomic depth252. Glycopeptide-focused DIA analysis286,339, which is undertaken using beamCID, highlights this change in thinking and the growing acceptance of site ambiguity. Many in the field advocate that sites should be assigned either as localized or non-localized on the basis of the available fragmentation information76,77,286 (Box 2). Further, a formal system to stratify glycosylation site ambiguity on the basis of site localization probability was recently proposed by Lu, Riley et al.336 to provide a means to categorize assignment quality. In reality, not all biological questions need complete unambiguous glycosylation site assignments; for example, studies in which the focus is the identification of glycans367,420 or the quantification of glycopeptide abundances52,310 will not be affected by site ambiguity. By contrast, site localization can be crucial for confirming atypical glycosylation events such as tyrosine O-glycosylation76,421 or when attempting to fully characterize the site-specific glycosylation of a protein of interest, especially when both N-glycans and O-glycans are present. It should be noted that at least partial localization of glycans may be required for peptides with multiple glycosylations to avoid misassignment of glycan compositions63,335.

Outlook

Glycosylation shapes nearly all biological processes across all areas of life, and there has been a rapid growth in glycobiology-focused efforts over the past two decades to define and understand the role of the complex and dynamic glycoproteome. The development of chemical biology tools for tagging glycoproteins93,162,169,170, enrichment techniques to isolate glycopeptides114,129 and new glycoproteome-specific reagents such as O-glycoproteases96,125 have greatly improved our ability to site-specifically map glycosylation across biological systems. Over the coming years, improved access to glycoproteomics toolkits promises to stimulate further activity in the field and promote an increasing number of studies exploring fundamental and applied questions in glycobiology.

Glycoproteomics has shown potential to differentiate disease subtypes, stratify patients and predict clinical outcomes in complex human diseases such as cancer398,422, inflammation423,424 and microbial infections425,426, and there is great potential for glycoproteomic analysis to improve diagnostic sensitivity and precision376. Community-based development of robust methods and software that implement best practice for data interpretation, standardization and sharing will be essential for clinical translation; this has begun with the establishment of glycoproteomic focused sharing platforms such as GlycoPOST410. Although these developments are promising, ease of use and implementation is still the major hurdle currently limiting the translation of glycoproteomics to the clinic.

It is important for glycopeptide-focused software solutions to be developed in parallel with new practical techniques. Future tools should aim to be customizable to facilitate the analysis of diverse glycoproteomes beyond the mammalian realm, including in plants, invertebrates and microbial systems31,33,34. For future software solutions, the crucial challenges will be identifying and localizing multi-glycosylated peptides, statistical control of glycopeptide identification and distinguishing glycan structural isomers.

Marked improvements in proteomic sample multiplexing, chromatography and MS acquisition speed are likely to lead to increased throughput in the field of glycoproteomics. Peptide-based sample multiplexing techniques using tandem mass tags currently allow 18 samples to be analysed within a single proteomic experiment304. Multiplexing can also be used to provide structural insights by allowing the incorporation of samples treated with specific glycosylation inhibitors112 or the inclusion of genetic knockouts of specific glycosyltransferases or glycoside hydrolases371,427, enabling glycan class or isoform information to be obtained that may otherwise be missed.

Improvements in glycoproteomic depth are likely to come from new tools. The recent demonstration of a large range of bacterial glycan-targeting hydrolytic enzymes125 shows that the current repertoire of glycoproteases represents only a small subset of possible enzymatic activities and specificities. As our understanding of glycan-modifying enzymes improves428,429, so too will our ability to rationally modify and tailor these enzymes to target or enrich specific glycosylation sites and their glycans of interest. Modified enzymes and affinity tools generated against specific glycans430 will be particularly valuable to advance less-mature areas of glycoproteomics such as C-glycosylation431. Additional methods for unbiased, untargeted quantitative profiling of multiple glycosylation classes in a single experiment will also be crucial.

Applications such as single-cell analysis and top-down glycoproteomics still represent significant technical barriers for the field. Although isobaric labelling approaches are increasingly used for single-cell proteomic analysis432,433 and have the potential to enable single-cell glycoproteomics, it remains to be seen how applicable these approaches will be. The use of charge detection MS434,435,436 has the potential to radically improve top-down glycoform characterization, and integration of these approaches for glycoproteomics will require further development. Non-MS-based DNA-sequencing methods using oligonucleotide-labelled lectins have been used by several groups to explore glycosylation changes at the single-cell level113,437. Further, a recent study demonstrated that non-glycosylated and glycosylated forms of peptides can be resolved using nanopore sequencing438, suggesting that this technique may enable single-molecule analysis of glycopeptides and glycoproteins. Although these technologies are still in their infancy, they have considerable potential to provide orthogonal information to MS-based glycoproteomics.

Great strides have been made in glycoproteomics-based identification of glycosylation events and the discovery of new or unusual types of protein glycosylation33,367,439. Over the coming years, glycoproteomics will increasingly provide valuable mechanistic insight into the formation and role of protein-linked glycans in biological processes. New insights into mechanisms such as the requirement of N-linked fucosylation for ricin toxicity371 or the role of specific O-GlcNAcylation sites in metabolic regulation440 have already been established using glycoproteomics. Further, multi-omic integration has enabled a holistic understanding of biological systems and it is likely that the integration of glycoproteomics with other omic techniques for the analysis of large cohorts will further enhance our knowledge at a population level. For example, the identification of common genetic variants associated with differences in glycosylation through genome-wide association studies may further enhance mechanistic insights and unravel potential disease predispositions424,441.

As methods and technologies continue to evolve, one of the most exciting opportunities for the field will be further integration and improvements in the bioinformatic space. Across the life sciences, the growing application of machine learning approaches is leading to new ways to model, analyse and handle large data sets of increasing complexity and information content442,443. Machine learning and artificial intelligence are not used routinely by the glycoproteomics community, although their increasing use in proteomics444 suggests that these approaches will become commonplace in glycoproteomics workflows. Collectively, these transformative tools are likely to make glycosylation analysis accessible to a wider range of life scientists, ultimately improving our understanding of organismal development, disease adaptation and evolution.