Introduction

The covalent attachment of ubiquitin (Ub) to protein substrate (ubiquitination) is an important post-translational modification (PTM) that regulates diverse cellular functions [1,2,3,4]. Ub is a small and highly conserved 76-residue protein in eukaryotes. The C-terminal glycine of Ub (G76) is covalently attached to its substrate protein, which is regulated in a cascade order of E1 Ub-activating enzymes, E2 Ub-conjugating enzymes, and E3 Ub ligases (Fig. 1A) [5, 6]. Ub is reversibly removed from the substrate through Ub hydrolases known as a family of proteins named deubiquitinases (DUBs) [7]. Maintenance of cellular ubiquitination homeostasis is fulfilled by the orchestrated interplay of 2 E1 enzymes, ~ 40 E2 enzymes, more than 1000 E3 ligases, and approximately 100 DUBs encoded by the human genome [8,9,10]. Aberrance of ubiquitin-related enzymes’ activity leads to the dysregulation of protein ubiquitination, promoting the pathogenesis of numerous diseases, such as cancer, neurodegenerative diseases, and so on [11,12,13,63]. This IAA-induced artifact can now be avoided by using less reactive chloroacetamide or by lowering the reaction temperature and concentration of iodoacetamide [64]. In addition, the profiling of K-ε-GG peptides using antibodies may exhibit slight peptide sequence bias [65].

UbiSite antibody-based approach to profiling ubiquitination sites

To overcome some drawbacks associated with the diGly approach, Akimov et al. recently generated an antibody, UbiSite, which recognized the C-terminal 13 amino acids (ESTLHLVLRLRGG) of Ub after Lys-C digestion (Fig. 3B). By combining UbiSite-based enrichment and pre-fractionation, the authors identified over 63,000 unique ubiquitination sites on 9200 proteins in human Hep-G2 and Jurkat cell lines after proteasomal inhibitor treatment by MS analysis [66]. In addition, 104 N-terminal ubiquitination sites were identified in their results which were ignored by the diGly approach. As the only available tool with the ability to enrich peptides from protein N-terminal ubiquitination, the UbiSite antibody-based approach might be a useful approach to studying the function and regulation of N-terminal ubiquitination [67].

Firstly, due to the specific C-terminal sequence (ESTLHLVLRLRGG) of Ub, the UbiSite antibody significantly improves specificity toward ubiquitinated peptides by reducing the non-specific enrichment of Ub-like proteins such as NEDD8 and ISG15. Secondly, UbiSite enables enrichment of protein N-terminal ubiquitination, which advances the understanding of protein N-terminal ubiquitination. Hoverer, the sequence ESTLHLVLRLRGG is quite long on the ubiquitinated peptides, resulting in more extra fragments during MS/MS analysis, which will impair the identification efficiency of protein ubiquitination using Lys-C digestion only [45]. Therefore, extra enzymatic digestion will benefit the protein profiling for the UbiSite approach [45].

Insights into N-terminal ubiquitination using the antibody toolkit

Conjugation of Ub to lysine ε-amino group is the most common type of ubiquitination. Besides, the α-amino group of protein N-terminus has also been identified as non-canonical ubiquitinated targets [68, 69]. However, current approaches, except the UbiSite approach, mainly focus on lysine ubiquitination profiling, there are limited approaches to analyzing the N-terminal ubiquitination, which hampers the functional study of protein N-terminal ubiquitination. To identify protein N-terminal ubiquitination, three approaches for profiling protein N-terminal ubiquitination, anti-GGX mAbs based approach, UbiSite based approach, and StUbEx PLUS approach, were reported [66, 70, 71]. For example, Davies et al. generated four monoclonal antibodies (anti-GGX mAbs) that selectively recognize tryptic peptides with an N-terminal diGly remnant rather than the K-ε-GG group, realizing the specific enrichment and identification of protein N-terminal ubiquitination [70]. UBE2W is the only E2 Ub conjugating enzyme reported to regulate the protein N-terminal ubiquitination. The authors used anti-GGX mAbs to enrich and analyze UBE2W regulated N-terminal ubiquitination events [72]. They identified 152 unique N-terminal ubiquitination sites derived from 109 endogenous proteins. Of the 152 unique N-terminal ubiquitination sites, 32 sites are reported as the potential substrates of UBE2W, demonstrating that the anti-GGX mAbs-based approach is qualified for protein N-terminal ubiquitination profiling.

Among the three methods for analyzing N-terminal ubiquitination, the anti-GGX mAbs-based approach is designed for selectively enriching tryptic peptides with an N-terminal diglycine remnant rather than a diglycine remnant on lysine. However, the UbiSite antibody specifically recognizes the ubiquitin 13-residue remnant on N-terminus and Lys residue after Lys-C digestion. Different antibodies show some bias toward some preferent sequences. In addition, StUbEx PLUS (see below) is an antibody-free approach that is different from the above antibody-based approaches, showing no bias toward lysine and N-terminal ubiquitination. That’s why only a small part of N-terminal ubiquitination sites identified by the anti-GGX mAbs-based approach were overlapped with the ubiquitination sites identified by UbiSite and StUbEx PLUS approaches [66, 70, 71]. This result nicely highlights the limitations and complementary of the current approaches.

Antibody-free approaches to profiling ubiquitination sites

Antibody-based strategies are the most widely used approaches to systematically profiling protein ubiquitination at the peptide level. However, antibody-based approaches to profiling protein ubiquitination entail some shortcomings such as (a) the bias toward the amino acid sequence surrounding the ubiquitination sites and (b) expensive antibodies limit the widespread application. As an alternative to antibody-based approaches, Gevaert et al. reported an antibody-free approach, termed Ub COmbined FRActional DIagonal Chromatography (COFRADIC) (Fig. 3C), for enriching and identifying protein ubiquitination at the peptide level [73, 74]. In brief, all primary amino groups were blocked by acetylation at the protein level, followed by USP2 catalytic core domain (USP2cc) incubation to hydrolysis Ub from the ubiquitinated proteins and reintroduce of the free ε-amine groups at the ubiquitination sites. To the free ε-amines, a glycine linked to a hydrophobic tert-butyloxycarbonyl (BOC) group was attached, which was further used to enrich the peptides via two reverse-phase HPLC (RP-HPLC) runs before and after TFA-based removal of BOC groups. Gevaert et al. used this approach to profiling protein ubiquitination in native human Jurkat cell lysate and in Arabidopsis thaliana, resulting in the identification of over 7500 endogenous ubiquitination sites on 3300 different proteins and 3009 endogenous ubiquitination sites on 1607 proteins, respectively.

Another reported antibody-free approach is StUbEx PLUS (Fig. 3D). Based on the Stable Tagged Ub Exchange (StUbEx) strategy which was used to enrich ubiquitinated proteins, Akimov et al. modified StUbEx strategy, StUbEx PLUS, to specifically enriched the ubiquitinated peptides [71]. The authors built a StUbEx PLUS system to insert His-tag between serine 65 and threonine 66 in the recombinant Ub. After proteolytic digestion with specific lysine cleavage enzymes, the tag was still attached to the ubiquitination sites and enriched by Ni-NTA beads. Since Ub-like protein doesn’t carry His-tag, the interference from Neddylation and ISGylation were also avoided here. Using the StUbEx PLUS strategy, 41,589 unique ubiquitination sites on 7762 proteins were identified in U-2 OS cells.

The antibody-free approach showed a powerful ability to enrich and profile protein ubiquitination, easier handling and cheaper antibody-free approaches for protein ubiquitination profiling are very much needed to overcome the drawbacks of antibody-based approaches. Recently, we proposed an antibody-free approach, termed AFUP, to profiling protein ubiquitination by selectively clicking the ubiquitinated lysine, resulting in 7103 ubiquitination site identification with high confidence in 5 mg HeLa lysates (in preparation).

Antibody-free-based strategies play a vital role in the studies of protein ubiquitination, which can overcome some drawbacks, for example, bias toward to sequence surrounding the ubiquitination sites and high cost. Even though antibody-free approaches enable global identification of protein ubiquitination, there still exist some shortcomings. First, the dataset identified by antibody-free approaches is quite smaller than the dataset identified by antibody-based approaches, suggesting that more effective antibody-free approaches are needed to expand the depth of the protein ubiquitination pool. Second, the Ub-COFRADIC approach is complicated and time-consuming. Third, StUbEx PLUS or other approaches like StUbEx PLUS require a tagged Ub which changes the structure of Ub, introducing some artifacts and limiting its application in animal or patient tissues. Therefore, an antibody-free approach, with high throughput, convenient and effective identification of protein ubiquitination, is urgently needed.

Insights into the ubiquitinated proteins at the level of architecture

Ub chains with distinct topologies regulate the protein stability, protein–protein interaction, or protein localization in eukaryotic cells, and thus play vital roles in multifunctional signals [75]. The approaches discussed above mainly reveal the ubiquitinated substrates and ubiquitination sites, which are not able to reveal the Ub chain architecture. There are several possibilities to detect the architectures of Ub chains. For example, linkage-specific antibodies which specifically recognize Ub linkages are used to identify the distinct topologies of Ub chains [8]. Currently, MS-based proteomics is categorized into bottom-up proteomics (BUP), middle-down proteomics (MDP), and top-down proteomics (TDP) based on the analytes (Fig. 4). Bottom-up is a traditional strategy of digesting proteins into small peptides for LC–MS analysis with high throughput. However, complete digestion leads to the inability to distinguish the protein isoforms. Middle-down is a restricted digestion proteomics strategy to generate longer peptides for LC–MS analysis, resulting in analyzing a wider range of peptide fragments. Compared with bottom-up proteomics and middle-down proteomics, Top-down proteomics strategy does not need digestion, but directly analyzes the intact protein by LC–MS to gain a comprehensive characterization of the analyzed protein. In this section, we mainly discuss the MS-based approaches to map** the topologies of Ub chains (Fig. 5).

Fig. 4
figure 4

Schematic workflow of bottom-up proteomics, middle-down proteomics, and top-down proteomics. The protein structure is downloaded from National Center for Biotechnology Information (MMDB ID: 209664, PDB ID: 7KW7) [104].

Fig. 5
figure 5

Approaches to getting insights into the architecture of Ub chains. A Schematic diagram of the BUP strategy of UbiCRest to characterize the substrate ubiquitin chain type. B Schematic diagram of the MDP strategy of Ub-clip** to characterize the substrate ubiquitin architecture. C Schematic diagram of TDP strategy to characterize the substrate ubiquitin architecture

BUP is the conventional approach to analysis. A disadvantage of the BUP strategy is the loss of architectural information on polyUb or branched chains upon trypsin digestion. Therefore, whole-cell K-ε-GG analysis does not reveal the topological information of Ub chain attached to the substrates. To reveal the information of Ub chains, linkage-specific antibodies, Affimers, or binding domains are firstly used to enrich a specific chain type and the BUP strategy is further used to identify the linkage-specific substrates [38, 39, 56, 76]. There are several reviews that summarize the application of linkage-specific chain enrichment strategies to study the structure topologies of polyUb chains [28, 49, 77,78,79]. Map** the topologies of Ub chains requires the detection of Ub molecules modified by other Ub molecules. Ohtake et al. used a mutated Ub in which the arginine 54 was replaced by an alanine. This mutant enables the discrimination between branched K48/K63 linkages and unbranched linkages [80]. The authors revealed that the Ub chain branched at K48 and K63 regulated nuclear factor κb (NF-κB) signaling. Ub chain restriction (UbiCRest) is another approach to analyzing Ub chain architecture (Fig. 5A), in which substrates (ubiquitinated proteins or polyUb chains) are treated with a panel of linkage-specific DUBs in parallel reactions [81]. Several issues should be considered when using the UbiCRest approach. First, some DUBs may be non-specific for unexpected Ub chains, leading to cross specificity. Second, many DUBs can’t hydrolyze long (n > 4) chains from the substrates [82]. Third, simple UbiCRest is unable for heterotypic chain analysis [83,84,85]. In addition to the methods described above, absolute quantification of ubiquitin (Ub-AQUA) by MS is a standard way to detect the multiformity of ubiquitin linkages that are covalently attached to the protein substrate. By synthesizing isotope-labeled internal standard-tryptic peptides corresponding to mono-ubiquitin and poly-ubiquitin chains bound to cyclin B1, Kirkpatrick et al. revealed that cyclin B1 was modified by complex ubiquitin chain architecture linked through Lys63, Lys11 and Lys48 [86]. By the combination of affinity chromatography and protein standard absolute quantification (PSAQ) mass spectrometry, Kaiser et al. developed a strategy, termed ubiquitin-PSAQ, to quantify cellular concentrations of ubiquitin species by spiking stable isotope-labeled free ubiquitin and ubiquitin conjugates into the lysates [87]. The authors used ubiquitin-PSAQ to measure the concentrations of ubiquitin types in both cell lines as well as mouse and human brain tissue and found that the concentrations of different ubiquitin types varied significantly in different samples. Therefore, BUD-based Ub-AQUA strategies will play vital roles in determining the ubiquitin topology of specific substrates and the concentrations of ubiquitin types of the whole proteome.

As an alternative, the MDP approach has been the most widely used in determining branched chains. Utilizing limited tryptic digestion, the MDP strategy has been applied in determining the abundance of branched Ub chains and detecting the specific linkages, such as K6/K48 and K29/K48 linkages [88,89,90,91]. Considering the high activity of trypsin towards Lys and Arg residues, it is difficult to control the process of tryptic digestion in the MDP strategy. Ub-clip** is a kind of MDP strategy (Fig. 5B), which uses an engineered viral protease, Lbpro*. Lbpro* is created by mutating the 102 Leu to Trp of Lbpro, a foot and mouth disease leader protease, to enable the preferable cleavage towards all types of diubiquitin. The cleavage of Lbpro* happens after Arg74 of Ub and leaves the signature C-terminal diGly attached to modified residue [92]. Swatek et al. used Ub-clip** to quantify branch-point Ub and surprisingly found that about 10–20% of Ub chains seemed to exist as branched type. The authors also showed that PINK1/PARKIN-mediated mitophagy predominantly exploited mono- and short-chain polyUb [93].

Compared to BUP and MDP, TDP may be an ideal platform for the analysis of proteomes bearing Ub chains of different lengths, linkages, and architectures (Fig. 5C) [28]. The major challenge of TDP is that its gas-phase dissociation produces overlap** and low signal-to-noise (S/N) fragments with increasing molecular weight [94]. Therefore, to better characterization of Ub chains by TDP, it’s important to optimize the instrumental parameters [95]. Lee et al. used a TDP strategy that utilized electron-transfer/collision-induced dissociation (ETciD) activation to achieve extensive fragmentation to facilitate the characterization of chain topography and lysine linkage sites, such as K48 linkage and K63 linkage [96]. Thus, while TDP has been used to analyze the topologies of some well-defined ubiquitylated proteins, the application of TDP in the analysis of complex, heterogenous mixtures has not been realized. With advantages in sample preparation and analytical approaches, TDP will play a vital role in analyzing the topologies of Ub chains for a complex system.

MS-based proteomics plays an important role in the identification of Ub chains and linkages. However, it should be noted that strategies for detecting branched chains by MS usually include an enrichment step to increase the recovery of substrates before digestion or direct analysis.

Prediction of protein ubiquitination via computational algorithms

The current methodologies for systematically analyzing protein ubiquitination can be divided into two categories: MS-based strategies and computational strategies. MS-based experimental strategies are often expensive, labor-intensive, and time-consuming. Compared with MS-based strategies, prediction protein ubiquitination using a variety of machine-learning methods can provide simple and rapid research solutions, and provide valuable information for further laboratory studies [97].

Over the past decade, researchers have achieved great success in applying different feature extraction methods to predict protein ubiquitination sites, such as machine-learning algorithms. These computational approaches predict new ubiquitination sites by learning sequence context characteristics of ubiquitination sites of the experimentally verified ubiquitination sites. UbiPred was the first tool reported by Tung et al. for predicting ubiquitination sites, which was implemented by using a Support Vector Machine (SVM) with 31 informative physicochemical features selected from published amino acid indices [98, 99]. The authors utilized UbiPred and identified 23 ubiquitylation sites, which were further validated. In 2018, He et al. reported a multimodal deep architecture to identify the ubiquitination sites and evaluated their method on the available database PLMD, leading to 66.4% specificity, 66.7% sensitivity, and 66.43% accuracy [100]. Recently, Wang et al. reported a new method, named HUbipPred, which utilized the binary encoding and physicochemical properties of amino acids as training input and integrated two kinds of neural networks to build the model [101]. HUbiPred greatly improved the prediction accuracy compared to previous predictors such as DeepUbi and hCKSAAP_UbSite. At present, more and more bioinformatics tools have been developed for predicting protein ubiquitination sites. We refer the readers to a series of excellent reviews discussing the different methods, predictive algorithms, functionality, and properties for predicting protein ubiquitination [97, 102, 103].

Since there is no universal algorithm that can accurately predict protein ubiquitination sites, the fusion of multiple computational methods may be an effective method to comprehensively predict protein ubiquitination sites. In addition, a large amount of protein ubiquitination data has been rapidly accumulated in the last decade. As expected, the improvement of the ubiquitination sequence logo is consistent with the increased sensitivity of the recently developed predictor [103]. Because computational methodology always introduces false-positive results, verification of the prediction results through experimental methods is required.

Conclusions

Protein ubiquitination is one of the most difficult PTMs to be identified due to its large size, low abundance, and dynamic regulation. To identify protein ubiquitination, a diversity of enrichment approaches to ubiquitination at multiple levels have been developed, including the protein level and the peptide level (Table 1). Since MS analysis has become the most powerful tool to precisely identify PTMs, the developed enrichment approaches combined with the advanced MS enabled the identification of tens of thousands of ubiquitination sites corresponding to thousands of ubiquitinated proteins. Considering the fact that MS-based experimental methods are often expensive, labor-intensive, and time-consuming, bioinformatics approaches and tools based on machine learning from the reported ubiquitination dataset have recently been developed for predicting protein ubiquitination sites. However, these tools are constructed based on different training libraries, prediction algorithms, functionality, and features, complicating their utilities and applications. Despite various limitations, it is now possible for researchers to analyze thousands of ubiquitination sites using experimental methods or predicting approaches. However, the prediction of ubiquitination sites by machine learning underlies the complex nature of Ub chain topologies. Because this approach only predicts the ubiquitination sites rather than the topologies of Ub chains, experimental methods are the only way to get insights into the architecture of Ub chains.

Table 1 Comparison of different strategies for ubiquitination characterization: from ubiquitinated protein to ubiquitin chain architecture

The complexity of ubiquitination stems from the ability to form the polymerization with different length (number of Ub molecules), linkage, and overall architecture. Conformation of the polyUb plays a vital role in regulating the function of substrates in diverse physiological and pathological processes. Although a series of techniques have been developed to detect branched Ub chains, none of them can reveal the topology and length of branched chains. Methodology breakthrough is desperately needed to provide systematic insights into the overall architecture of Ub chain, such as the exact structure of different linkages. Top-down proteomics (TDP) may be a promising tool to map the structure of Ub chains bearing different lengths, linkages, and architectures. However, how to solve the S/N ratio of the ubiquitinated substrates is a great challenge.

In summary, protein ubiquitination analysis has made significant progress over the past decade. However, significant challenges remain in this area. Promising methods and dedicated databases will help us untangle the complexities of ubiquitination and facilitate the discovery of biomarkers associated with abnormal protein ubiquitination in a variety of diseases.