Background

Cancer immunotherapy has improved cancer patients' treatment possibilities, especially those with metastatic spread [19]. Although the success, the clinical outcome of immunotherapy is not consistent both within and across cancer entities. A sufficient infiltration into the tumour microenvironment and activation of effector T-cells against cancer cells can be seen as predictors for responses to T-cell-based immunotherapies [15,16,1). Next, PeptideProphet validates detected peptides with the following parameters; accmass: TRUE, decoyprobs: TRUE, expectScore: TRUE, Glycosylation: FALSE, ICAT: FALSE, masswidth: 5, minimum probability after first pass of a peptide: 0.9, minimum number of NTT in a peptide: 2, among other parameters (Supplementary Table 1). Isobaric quantification was then undertaken the following parameters (bestPSM: TRUE, level: 2, minProb 0.7, ion purity cut-off: 0.5, tolerance: 20 ppm, among other parameters (Supplementary Table 1). Next, to only retain confident peptides, peptides were filtered using stringent False Discovery Rate (FDR) filtering. The following parameters were used for FDR filtering; FDR < 0.01, peptideProbability: 0.7, among other parameters (Supplementary Table 1). Next, TMT-integrator was used to create integrated reports with isobaric quantification across all samples with the following parameters (retention time normalization: False, minimum peptide probability on top of FDR filtering: 0.9, among other parameters (Supplementary Table 1).

Substitutant peptides were fetched from the reports of TMT Integrator (version 3.1.0). Using a R-script, peptides with a log2-transformed intensity score above 0 in a sample were observed as positively detected peptides in that sample. As described before [13], for intra-tumour type analysis a filter for the maximum number of samples was applied to retain peptides with higher specificity in expression, except for W > F substitutants due to their exclusive significant and specific distribution wherever significant. All tumour types have been demonstrated to be exclusive with the analysis of database 1 [13], while GBM, UCEC, and PDA did not show this exclusivity in the analysis of database 2. This optimizes the signal for gene expression correlation analysis. Furthermore, this script was used to plot bar plots depicting the cumulative number of tryptophan substitutants detected in the scans.

Gene expression data was downloaded in GCT format from PDC database. The counts of W-substitutants were combined for each sample with gene expression profiles. PERL scripts were designed to count the number of substitutants when a gene is lowly expressed (intensity < 0) or highly expressed (intensity > 0). P-values for comparison are calculated using Wilcoxen t-test.

Data sources

Eight independent human tumour-types, namely Lung Squamous Cell Carcinoma (LSCC, [14]), Clear Cell Renal Cell Carcinoma (CCRCC, [5]), Glioblastoma (GBM, [20]), Head and Neck Squamous Cell Carcinoma (HNSCC, [11]), Hepatocellular Carcinoma (HCC, [9]), Ovarian Serous Cystadenocarcinoma (OVSCC, [12]), Pancreatic Ductal Carcinoma (PDA, [3]), Uterine Corpus Endometrial Carcinoma (UCEC, [7]), were analysed to generate ABPEPserver database (Table 1) [1]. CPTAC IDS of the datasets are provided in Table1 and Supplementary Table 2.

Table 1 This table details the tumour types used to build ABPEPserver, with the information on identified substitutants and the number of tumours and adjacent normal tissues used to build the analysis

Design of database and web tools

A MySQL database was created to efficiently organize output data and avoid storage difficulties of multiple large files. With a database, data is efficiently stored and easily retrievable. For each cancer type, we stored substitutant counts. We identified individual substitutant peptides and gene cluster data from the proteomic analysis and associated gene expression. Data for tumour and adjacent normal tissue samples are made distinguishable for comparison.

Implementation

ABPEPserver is a R/Shiny application which allows users to interact with and visualize our data and analysis. We implemented the R package RMySQL 0.10.23 to connect our database to the application.

Users are provided with background scientific information, methods and cancer types used in the study on the home page of the ABPEPserver. Users are provided then two options for using the web tools, viz. Analyze and tryptophan to phenylalanine (W > F) Substitutants. In the “Analyze” module, the user can explore the enrichment of substitutants and their association with molecular gene expression signatures. On the other hand, the “W > F Substitutants” module can be employed to browse individual substitutants in multiple cancer types and tumour vs adjacent normal tissue expression. Here, the information on the database used to detect the peptide is also added. The corresponding files of this module are downloadable.

Utility and discussion

Description of utility

ABPEPserver is a web database that serves as a platform to identify and characterize the expression of substitutants, a recently identified class of aberrant proteins with immunotherapeutic potential, across various human cancer types. In addition, ABPEPserver allows the analysis of the association of molecular gene expression signatures to the enrichment of substituants. As an example, such analysis was demonstrated to be essential for pinpointing the role of T-cell infiltration and direct causal proteins (such as IDO1) in the expression of W > F substitutant peptides (Fig. 2) [13]. This shows that the expression of substitutant peptides is regulated by IDO1 expression, which is induced via the T-cell infiltration pathway.

Furthermore, ABPEPserver displays enrichment differences of substitutant peptides in tumours and tumour-adjacent normal tissues, an analysis that can be utilized for underpinning cancer-specific underlying mechanisms. Lastly, downloadable text files from the ABPEPserver can be used to identify common cancer-specific substitutants that can provide the foundation for predicting neoepitopes for a wide-ranging immunotherapeutic application. Altogether, ABPEPserver provides detailed information on the identity of the substitutants, their cancer-specific expression and immunotherapeutic potential.

User interface

Main page

The main page of ABPEPserver displays relevant information, methods in detail and cancer types used in the analysis in the construction of ABPEPserver along with supplementary information on each cancer type (Supplementary Fig. 1A).

Analyze module

The “Analyze” module allows the user to select the cancer type and database of interest and plot Barplots, scatter-contour plots and Violin plots for analytical purposes (Supplementary Fig. 1B). Barplots allow the display of various types of W-substitutants and their relative enrichment concerning each other. Scatter contour plots allow the association analysis of all proteins to the number of substitutants. Violin plots allow the analysis of individual protein association with the number of substitutants. For example, IDO1 expression was associated with substitutant peptide expression using this analysis (Fig. 2). IDO1 is an enzyme that catabolizes Tryptophan molecules in the cell; hence, the association of IDO1 expression with substitutants is biologically meaningful. Hence, Analyze module provide critical biological insights into the substitutant peptide expression and can be used by the users to design cancer-specific immunotherapy study.

W > F substitutants module

This module allows users to select the cancer type and database of interest and plot and download individual peptides for potential immunotherapeutic applications (Supplementary Fig. 1C). The utility of this module is demonstrated in a case study below.

Case study: identification of immunocompetent substitutants using ABPEPserver

Using the “W > F Substitutants” module for UCEC (Uterine Cancer) and the displayed scatter-plot, two example peptides ( fGHPAGK and SVLGCfK) were identified using fully substitutant database (database 1) and found to be expressed in a highly tumour-specific manner (73 tumours and 0 tumour-adjacent normal tissue, 45 tumours and 0 tumour-adjacent normal tissue respectively) (Fig. 3A). This tumour-specific expression implies that it is feasible to target these antigens specifically in cancerous tissues without harbouring any reactivity against normal cells if these peptides can present on the cell surface and bind to HLA molecules. Indeed, NETMHC [1] based prediction shows that many combinations of these two peptides have potentially strong binding affinity to one or multiple HLA super-alleles (Fig. 3B-C). This analysis indicates that the discovered peptides potentially harbour strong immunotherapeutic potential and warrant experimental validation. Thus, ABPEPserver can be used to identify potential cancer-specific antigens for immunotherapeutic applications.

Fig. 3
figure 3

Case-study: Utility of ABPEPserver. A A scatter plot from ABPEPserver (W > F Substitutants module) for UCEC (Uterine Cancer), depicting total occurrences in tumour tissue samples on X-axis and Total occurrences in adj. normal tissue samples on Y-axis. Identification of two peptides that are selected to be highly tumour-specifically expressed. The database used for this scatter plot is the fully substitutant database (database 1). B, C Tables displaying peptide combinations for the selected peptides in (A) that have HLA-binding ability to one or more HLA super alleles. Rank is displayed as inverse rank from NETMHC 23analysis where peptides rank < 1 are identified as strong binders and rank > 1 & rank < 2 are identified as weak binders

Future outlook

In future, we plan to expand ABPEPserver functionality towards harbouring other kinds of aberrant peptides that are discovered, such as ribosomal-frameshift-associated chimeric peptides. In response to tryptophan shortage, it has been observed that ribosomes change the frame at the tryptophan-associated “TGG” codon, leading to the synthesis of W-chimera. Since W-chimera were only observed in cell-culture systems, it is important to analyze whether W-chimeras is also expressed in the eight cancer types analyzed here. If the expression is observed, the next pursuit is the association of gene expression pathways.

Conclusions

We present ABPEPserver, a database of aberrant Substitutant peptides in human cancer. Substitutant peptides result from tryptophan to phenylalanine misincorporation events and are generated in human cancer due to T-cell infiltration and subsequent tryptophan depletion [13]. The “Analyze” module of ABPEPserver allows exploration of gene expression signature of substitutant peptide expression in multiple human cancer types, organized as peptides detected in tumours and tumour-adjacent normal tissue. The W > F “Substitutant” modules allow exploration of individual Substitutant peptides in multiple patient samples and have download features. The presented case study exemplifies that the substitutant peptides identified by ABPEPserver harbour immunotherapeutic potential. Hence, ABPEPserver is a valuable resource to the scientific community invested in anti-tumour immunotherapy development.