Introduction

In 1971, the human polyomavirus type 1, named BK virus (BKV), was isolated from a patient’s urine sample for the first time [1]. BKV is a non-enveloped virus consisting of an icosahedral capsid with a 40-nm diameter, which encloses its circular, double-stranded DNA genome of about 5 Kb. It belongs to the polyomavirus genus of the Polyomaviridae family, which includes two other species, JC virus (JCV) and simian virus 40 (SV40). BKV is divided into four subtypes (I–IV): subtype I is the most prevalent subtype (80%), followed by subtype IV (15%), and subtypes II and III, which are rare. However, some reports show the subtype III has been detected more frequently in HIV-infected patients [2]. BKV is a prevalent virus that persistently infects about 100% of the human population by early childhood after the waning of maternal antibodies, with an IgG seroprevalence > 90% in 5–9-year-old healthy children [3,4,5,6,7,8,9]. Up to now, the exact route of BKV transmission has not been identified [10]. The foremost mode of transmission seems to be via the respiratory route, as supported by the detection of BKV DNA in the respiratory tract and tonsils of children, suggesting that the oropharynx may be the initial site of BKV infection [11,12,13]. Other proposed routes of transmission are fecal–oral, blood transfusion, and transplacental by identifying BKV in sewage, leukocytes, and fetus, respectively [12, 14,15,16,17,18]. Different studies show that it also can be transmitted via semen and through organ transplantation [19, 20, 22]. Primary BKV infection is mainly asymptomatic, and the most common symptoms, when symptoms appear, are fever and nonspecific respiratory infection or urinary tract disease [11, 20, 23, 24]. BKV goes into the circulatory system through infected tonsils, and afterward, the peripheral-blood mononuclear cells get infected and disseminate to be latent in several sites most notably in kidneys. The virus remains dormant in the uroepithelium and renal tubular epithelial cells. Intermittent reactivation with asymptomatic viruria may occur in the patients [12, 21, 25]. Moreover, clinical findings in the past decades reveal that BKV can remain latent in leukocytes, brain tissue, and lymph nodes, which remarks its notable role in the outburst of variable complications in these organs [24, 26]. Immunosuppressive therapy can reactivate the virus, so that it starts to replicate, causing a series of events, beginning with tubular cell lysis and viruria. Then, this pathogen proliferates inside the interstitium and crosses into the peritubular capillaries, causing viremia. Ultimately, tissue damage may occur following direct viral cytolytic effects and secondary inflammatory responses [12, 21, 24, 25]; the outcome is determined by the level of damage, inflammation, and fibrosis. These complicated reactions between BKV and the immune system result in different clinical pathologic manifestations of BKV. Recent studies show that BKV as well as JCV, can be an etiologic agent of progressive multifocal leukoencephalopathy (PML) [27, 28]. There are different reports regarding human neoplasia caused by BKV. Hence, the oncogenic characteristic of this pathogen should not be neglected [7]. BKV is one of the most common opportunistic pathogens causing posttransplant viral infections, leading to allograft loss in renal transplant recipients. Posttransplant viral infections are followed by potent immunosuppression regimens that are usually applied in transplant recipients with the purpose of reduction of acute allograft rejection [29, 30]. In the first posttransplant year, approximately 15% of renal transplant recipients are affected by BKV viral infection, which can cause BKV-associated nephropathy (BKVN) with limited treatment options. If unaddressed, allograft failure will be progressed [21]. There is minimal or no efficacy of anti-viral agents in clearing BKV, while causing a high incidence of side effects. Hence, reduction of immunosuppression is the mainstay of BKVN treatment [9, 21, 31,32,33], which can be a threat to renal transplant recipients. Therefore, an innovative prophylaxis strategy can exterminate most of the complications that human population is confronted by this pathogen. Vaccines have a substantial role in the eradication and prevention of viral diseases for the years [34] and can be considered as a new way of confronting BKV complications in the human population by assuring a reliable prophylactic approach. In recent years, mRNA-based vaccine technology has been appeared as a game-changing vaccine platform for therapeutics and prophylactic applications due to their high potency, rapid development capacity, low-cost manufacturing, and safe administration.

Immunoinformatics, known as computational immunology, is a branch of bioinformatics that is the interface between computational analyses and immunological data and resources. It plays a vital role in vaccine design [35, 36]. The production time and cost of vaccine development have drastically reduced by immunoinformatic approaches due to the efficient prediction of appropriate antigens, epitopes, carriers, and adjuvants needed for a vaccine [37].

In this study, we aimed to design a novel multi-epitope mRNA vaccine consist of cytotoxic T lymphocyte (CTL), helper T lymphocyte (HTL), linear B lymphocyte (LBL) epitopes derived from the BK virus antigenic proteins, combined with the highly immunogenic adjuvants and other crucial elements through the immunoinformatic, and computational strategies.

Methods

Workflow of the study

The workflow of the study is shown in Fig. 1. The analysis conducted in this study can be divided into 17 sections.

Fig. 1
figure 1

Population coverage of the selected T lymphocyte epitopes. Globally it covers 93.77% of the world’s population. The highest and lowest areas of coverage are Europe (97.54%) and Central America (8.48%) respectively

Retrieval of protein sequences

The referenced proteins of BKV (organism ID: 1,891,762) were retrieved from the UniProt proteome database (https://www.uniprot.org/proteomes/). Then, the reference sequences were downloaded and stored in the FASTA format.

Antigenic proteome selection

Antigenicity of the six proteins of the BK virus was evaluated by both VaxiJen v2.0 (http://www.ddg-pharmfac.net/vaxijen/VaxiJen/VaxiJen.html) and ANTIGENpro (http://scratch.proteomics.ics.uci.edu/) [38] servers. The VaxiJen server employs a novel alignment-independent method, which is based on auto cross-covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Depending on the target organisms (bacterial, viral, and tumor protein datasets), the server accuracy varies from 70 to 89% [38, 39]. The default threshold (0.4) was used. The accuracy of ANTIGENpro utilizing the combined dataset was found to be 76% based on cross-validation experiments [40].

Prediction and assessment of CTL epitopes

Both IEDB MHC-I binding tool (http://tools.iedb.org/mhci/) and RANKPEP MHC-I binding prediction tool (http://imed.med.ucm.es/Tools/rankpep.html) were used for the prediction of 9-mer CTL epitopes for 12 MHC class I supertype alleles, namely A*0101, A*0201, A*0301, A*1101, A*2301, A*2402, B*0702, B*0801, B*3501, B*4001, B*4402, and B*4403. The recommended IEDB MHC-I prediction binding tool was NetMHCpan 4.1 EL, which is trained on eluted ligand prediction method [41]. In IEDB, epitopes were selected based on their scores (high scores indicated favorable binding efficacy). RANKPEP server predicts peptide binders to MHCI molecules from protein sequence/s or sequence alignments using position specific scoring matrices (PSSMs). In addition, it predicts those MHC-I ligands whose which the C-terminal end is likely to be the result of proteasomal cleavage [42, 43]. Epitopes were selected at a 2% threshold of top-scoring peptides. According to the results of IEDB and RANKPEP servers the overlap** regions (epitopes) were selected and considered for further analysis.

Immunogenicity of the epitopes of interest was evaluated by the class I immunogenicity tool of the IEDB Analysis Resource (http://tools.iedb.org/immunogenicity/) [44]. Only epitopes that showed a positive value for immunogenicity were kept for the next stage of evaluation.

Afterward, the antigenicity of the selected CTL epitopes was evaluated by the VaxiJen v2.0 server.

The selected epitopes were evaluated for toxicity using the ToxinPred server (https://webs.iiitd.edu.in/raghava/toxinpred/algo.php), using SVM (Swiss-Prot) based method with all parameters set as default [45]. Moreover, the allergenicity of the selected epitopes was also calculated using two online tools, AllerTOP 2.0 (https://www.ddg-pharmfac.net/AllerTOP/data.html) [46] and AllegenFP v1.0 (https://ddg-pharmfac.net/AllergenFP/), with considering a priority to AllerTOP 2.0 results, since the server has a better accuracy (88.7%) than AllergenFP server (with the accuracy of 87.9%) [47, 48].

Prediction and assessment of HTL epitopes

In this study, both IEDB MHC-II binding tool server (http://tools.iedb.org/mhcii/) and RANKPEP MHC-II binding prediction tool (http://imed.med.ucm.es/Tools/rankpep.html) were used for the prediction of 15-mer HTL epitopes for eight MHC class II supertype alleles, namely DRB1*0101, DRB1*0301, DRB1*0401, DRB1*0701, DRB1*1101, DRB1*1302, DRB1*1501, and DRB5*0101. The combinatorial approach recommended by IEDB 2.22, which combines the Consensus, NN-align, SMM-align, CombLib, Sturniolo, and NetMHCIIpan methods, was chosen [49,50,51,52,53]. The host species was selected as human. Epitopes were selected based on a low adjusted rank, indicating the epitopes are good binders. RANKPEP server predicts peptide binders to MHCI molecules from protein sequence/s or sequence alignments using position-specific scoring matrices (PSSMs).

According to the results of IEDB and RANKPEP servers, the overlap** regions were selected and considered for further analysis.

Antigenicity and toxicity of the selected epitopes were evaluated by VaxiJen v2.0 and ToxinPred, respectively. Then, the antigenic and non-toxic epitopes were checked for their allergenicity through AllerTOP 2.0 and AllergenFP v1.0 with a priority for AllerTOP 2.0 results. The remaining epitopes were then assessed for their inducibility of three cytokines: (1) interferon-γ (IFN-γ) using the IFNepitope (http://crdd.osdd.net/raghava/ifnepitope/scan.php) [54] with the motif and support vector machine (SVM) hybrid approach, (2) interleukin-4 (IL-4) using IL4pred (https://webs.iiitd.edu.in/raghava/il4pred/design.php) [55] based on the motif and SVM hybrid approach with SVM threshold of 0.2, and (3) interleukin-10 (IL-10) using IL10pred (https://webs.iiitd.edu.in/raghava/il10pred/predict3.php) [56] based on the SVM prediction model with SVM threshold of − 0.3. Based on these criteria, only the epitopes that were highly antigenic, non-allergenic, non-toxic, and inducers of all the three mentioned cytokines were selected for use in our final vaccine construction.

Prediction and assessment of LBL epitopes

In this study, Bepipred linear epitope prediction 2.0 tool (http://tools.iedb.org/bcell/) was used to predict LBL epitopes, that uses the random forest algorithm with the default threshold of 0.5 [57]. The antigenicity of the projected LBL epitopes was calculated by VaxiJen v2.0 server [38] and ANTIGENpro servers. Epitopes were selected by the VaxiJen v2.0 with the threshold of 0.4. The antigenic epitopes were then checked for toxicity using ToxinPred [45] and allergenicity by using AllerTOP 2.0 [46, 47] and AllergenFP v1.0 [48] with a priority to AllerTOP 2.0 results. All the selected epitopes were finally evaluated by Igpred (https://webs.iiitd.edu.in/raghava/igpred/help.html) server for IgG, IgM, and IgA inducibility properties.

Epitope conservancy analysis

To evaluate the conservancy of the regions that were selected as the epitopes, 150 epitopes of each BKV antigenic proteins sequences representing four distinct BKV genotypes were selected randomly. Their sequences were downloaded from the NCBI data base (https://www.ncbi.nlm.nih.gov/) and then aligned using the IEDB conservancy tool (http://tools.iedb.org/conservancy/) [58].

Molecular docking studies between T lymphocyte epitopes and MHC alleles

The binding affinity of CTL and HTL epitopes with their corresponding MHC alleles was evaluated using a molecular docking simulation approach. The MHC alleles were downloaded from the RCSB PDB (https://www.rcsb.org/) [59, 60] and processed using the PyMOL software to eliminate unnecessary ligands. The 3D crystal structure of a total of 12 MHC supertype alleles of class I and II including HLA-A*0101 (PDB ID:4NQV), HLA-A*0201 (PDB ID:4UQ3), HLA-A*0301 (PDB ID:3RL2), HLA-A*1101 (PDB ID:1X7Q), HLA*A2402 (PDB ID:5HGA), HLA-B*0702 (PDB ID:5EO1), HLA-B*0801 (PDB ID:3SPV), HLA-B*3501 (PDB ID:3LKN), HLA-DRB1*0101 (PDB ID:4AH2), HLA-DRB1*0401 (PDB ID:5LAX), HLA-DRB1*1101 (PDB ID:6CPL), and HLA-DRB1*1501 (PDB ID:5V4M) were downloaded. The 3D form of epitopes was then determined using the PEP-FOLD 3.5 server (https://bioserv.rpbs.univ-paris-diderot.fr/services/PEP-FOLD3/) [61,62,63]. The docking analyses between the epitopes and the MHC molecules were performed using the ClusPro 2.0 server (https://cluspro.bu.edu/) [64]. This web-based docking tool accomplishes rigid docking by sampling billions of conformations, energy minimization, and pairwising the root means square deviation (RMSD) of the complexes and estimates the binding energy score of the protein–protein docked complex based on the shaped complementarity. The best clusters (complexes) of epitopes and MHC alleles were selected for the construction of our vaccine based on the lowest docking energy score. In the end, PyMol was used to visualize the interaction between epitopes and MHC alleles.

Prediction of population coverage

In computational vaccine design, the population coverage study reveals the worldwide efficacy of the designed vaccine by evaluating the prevalence of human leukocyte antigen (HLA) alleles related to the epitope of interest. The combined coverage of the selected T lymphocyte epitopes with their corresponding MHC class I and II alleles as the input data were analyzed by the IEDB population coverage tool (http://tools.iedb.org/population/) [65].

Designing the mRNA vaccine construct

The desired immunogenic mRNA vaccine construct contains five main elements in its open reading frame (ORF): (1) the Kozak sequence, (2) epitopes, (3) adjuvant, (4) linkers (or spacers), and (5) stop codon. The start codon should be a part of the Kozak sequence for obvious [66, 67]; however, the sequence surrounding the stop codon may be optimized [68]. The selected epitopes that are included in the vaccine construct are antigenic, non-allergenic, non-toxic, and cytokine inducer (for HTLs only). Along with the appropriate antigenic epitopes, a potent adjuvant is critical to boost immune response [69]. In this study, 50S ribosomal protein L7/L12 was added to the vaccine construct as the adjuvant. The 50S ribosomal protein L7/L12 sequence was retrieved from the UniProt database (UniProt ID: P9WHES) [70]. The adjuvant and the LBL epitopes were linked via the EAAAK linker. Intra-LBL epitopes were combined using GPGPG linkers. Moreover, the GPGPG linker was used to conjugate LBL, HTL, and CTL epitopes. Intra-CTL epitopes were joined using AAY linkers.

Tissue plasminogen activator (tPA) secretory signal sequence (UniProt ID: P00750) and MHC I-targeting domain (MITD) (UniProt ID: Q8WV92) were added into the 5′ and 3′ regions of the ORF, respectively. The signal peptide was linked to the adjuvant by the AAY linker.

7-methyl(3-O-methyl) GpppG Cap (also known as ARCA) capped the 5′ end and then 5′ UTR from human β-globin (NCBI ID: NM 000,518.5) and 3′ UTR from rabbit β-globin gene (GenBank ID: V00882.1) flanked the 5′ end and 3′ end of the vaccine mRNA, respectively. In the 3′ end of the vaccine, a 120-nucleotide long poly(A) tail was also added.

Prediction of antigenicity, allergenicity, toxicity, and physicochemical properties of the final vaccine construct

The translated peptide form of the mRNA vaccine was used for predicting the different properties of the designed vaccine. Only adjuvant and the selected epitopes alongside with their linkers were considered for evaluation of the final vaccine; tPA and MITD sequences were excluded, as they will be cleaved while entering the secretory pathway and the MHC-I pathway, respectively. The antigenicity of the vaccine was predicted by the VaxiJen v2.0 [38] and ANTIGENpro [40] servers. To predict whether the vaccine is an allergen or not, AllerTOP 2.0 [46] and allergenFP v1.0 [48] servers were used. The toxicity of the vaccine peptide was evaluated using the ToxinPred server [45]. ProtParam (https://web.expasy.org/protparam/) server [71] was used to predict various physiochemical properties of vaccine construct such as amino acid composition, molecular weight (Mw), theoretical isoelectric point (pI), instability index (II), aliphatic index (AI), and grand average of hydropathicity (GRAVY).

Analysis of post‐translational modification

Post-translational modification (PTM) analysis, including glycosylation, phosphorylation, and acetylation of the designed vaccine construct, was carried out using NetNGlyc 1.0, NetPhos 3.0, and NetAct 1.0 servers, which are available at http://www.cbs.dtu.dk/services/. Additionally, lipid PTMs as N-terminal glycines myristoyl and GPI-modification were evaluated in the vaccine construct, by MyrPS/NMT (https://mendel.imp.ac.at/myristate/) [72] and big-PI/GPI animals servers (https://mendel.imp.ac.at/gpi/gpi_server.html) [73], respectively.

Tertiary structure modelling and validation

Several 3D models of the vaccine were generated using trRosetta server (https://yanglab.nankai.edu.cn/trRosetta/). This server uses a deep residual neural network to predict the inter-residue distance and orientation distributions of the input sequence. Then, it converts predicted distance and orientation distributions into smooth restraints to build 3D structure models-based on direct energy minimization [74,75,76]. The best model of the vaccine was chosen after validation by the SAVES server (https://saves.mbi.ucla.edu/) that included ERRAT [77], Verify-3D [78, 79], and PROCHECK (by Ramachandran Plot analysis) [80] tools as well as the ProSA-web (https://prosa.services.came.sbg.ac.at/prosa.php) [81, 82]. The nonbonded atomic interactions are examined by ERRAT [77] while Verify-3D analyzes the compatibility of the 3D protein model with its amino acid sequence [83]. The geometrical and stereochemical constraints of the 3D protein model by PROCHECK in the Ramachandran Plot [80]. ProSA-web provides an easy-to-use interface to the program ProSA. It calculates an overall quality score for a specific input structure, which is displayed in the context of all known protein structures. Besides, any problematic parts of a structure are shown and highlighted in a 3D molecule viewer. If the calculated score falls outside the characteristic range of native proteins, the structure likely contains errors [81].

Molecular docking analysis of the vaccine peptide

The ClusPro 2.0 server (https://cluspro.bu.edu/) was used to evaluate the interaction between the vaccine peptide and toll-like receptor 4 (TLR4), as well as MHC class I and class II receptors [64]. The ClusPro 2.0 server can predict protein–protein interactions using the PIPER docking algorithm, which generates four sets of models using different scoring schemes (balanced, electrostatic-favored, hydrophobic-favored, and van der Waals + electrostatics). The crystal structures of TLR 4 (PDB ID: 4G8A), TLR9 (UniProt ID: Q9NR96), C-type lectin 2 (PDB ID: 3WBP), MHC class I receptor (PDB ID:1I1Y), and MHC class II receptor (PDB ID: 1KG0) were retrieved from RCSB PDB (https://www.rcsb.org/) [59, 60] and used as receptor molecules in the docking study, while the vaccine peptide was used as the ligand molecule.

The PROtein binDIng enerGY prediction (PRODIGY) tool of the HADDOCK server (https://wenmr.science.uu.nl/prodigy/) [84] was utilized to compute the binding energy of receptor-ligand interactions based on their 3D structure. Relying on intermolecular contacts and properties obtained from non-interface surfaces, the PRODIGY server can predict the binding free energy (ΔG) and dissociation constant (Kd). The FireDock server (https://bioinfo3d.cs.tau.ac.il/FireDock/) [85, 86] was used to refine and re-score the docked complexes. It ranks the complexes based on binding score and global binding energy. The interacting residues between the vaccine and TLR4 were mapped using the PDBsum server (https://www.ebi.ac.uk/thornton-srv/databases/pdbsum/Generate.html) [87].

Molecular dynamic simulation of the vaccine-TLR4 complex

The iMODS server (http://imods.chaconlab.org/) [88] was utilized for the molecular dynamic (MD) studies of the vaccine-TLR4 complex. Based on the 3D structure of the given protein, the iMODS server is used for normal mode analysis, mobility (NMA B-factors), deformability, eigenvalues, covariance matrix, and linking matrix. The vaccine-TLR4 complex with the lowest binding energy was used for dynamics simulation analysis.

Codon optimization and secondary structure prediction of the mRNA vaccine

Codon optimization of the final peptide vaccine construct is considered as a crucial step for the efficient translation of the mRNA vaccine within the host cells. The codons of the final vaccine were optimized for efficient expression in human cells by using the GenSmart Codon Optimization Tool server (https://www.genscript.com/tools/gensmart-codon-optimization).

In the next step, the rare codon analysis tool (https://www.genscript.com/tools/rare-codon-analysis) was used for the assessment of the quality of the optimized codons. Three parameters having key roles in an optimized protein expression in the host expression system were calculated: (1) Codon adaptation index (CAI), which shows the translation efficiency of the mRNA, (2) codon frequency distribution (CFD), which represents the presence of any tandem unusual codons, and (3) GC content.

The optimized DNA sequence generated by the GenSmart codon optimization tool was then converted to RNA sequence by the DNA > RNA > Protein tool (http://biomodel.uah.es/en/lab/cybertory/analysis/trans.htm).

Afterward, the RNAfold tool of ViennaRNA Package 2.0 (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) [89] was used to predict the secondary structure of the mRNA construct. The server predicts the mRNA structures thermodynamically and assigns a minimal free energy score (MFE). Hence, the MFE structure and the centroid secondary structure, as well as their minimum free energy, were evaluated for further analysis.

In silico immune simulation

The C-IMMSIM server (https://kraken.iac.rm.cnr.it/C-IMMSIM/) was used to perform an in silico immune simulation to estimate the real-life immunogenic and immune response of the designed peptide vaccine [90]. The server is an agent-based model that uses position-specific scoring matrices (PSSM) derived from machine learning techniques for predicting immune interactions. According to the most currently used vaccines, the minimum recommended interval between the first and second doses is 4 weeks [91]. Three doses of injection, each containing 1000 antigen proteins and no LPS, were administered at 1, 84, and 168 time-steps (the first dose was given at time = 0, and each time-step equals to 8 h of real life). The other parameters, including random seed, simulation volume, and simulation step were kept at 12345, 10 µl, and 1050, respectively.

Results

Retrieval of protein sequences

The proteome of BKV consists of six proteins; the protein sequences were downloaded from the Uniprot database, which are shown in Table. 1 with their corresponding accession numbers.

Table 1 The antigenicity prediction of BKV proteins

Antigenic proteome selection

The results obtained from VaxiJen v2.0 and ANTIGENpro servers revealed that five out of the six proteins were antigenic (Table 1), which were selected as vaccine targets for the in silico analysis of mRNA vaccine development against BKV.

Prediction and assessment of CTL epitopes

A total of 63 CTL epitopes, binding to MHC-I supertype alleles, were selected according to high-scored and overlap** epitope regions resulted from both IEDB and RANKPEP servers. Thirty-two epitopes out of 63 were predicted to be positive in terms of immunogenicity by the IEDB class I immunogenicity tool, and 18 out of these 32 epitopes were identified as antigen by the VaxiJen server. Out of these 18 epitopes, 18 were founded to be non-toxic by the ToxinPred server, and 14 were non-allergen by both AllerTOP 2.0 and AllergenFP v1.0 servers. Out of the 14 non-allergen epitopes, four were selected as final CTL epitopes based on the lowest docking energy score with their corresponding MHC alleles evaluated using ClusPro 2.0 server. The final epitopes are highlighted in red (Table S1).

Prediction and assessment of HTL epitopes

The results of overlap** peptide regions of IEDB and RANKPEP MHC-II binding tools led our epitope selection to 29 high-scored HTL epitopes for eight MHC-II supertype alleles to be used in our vaccine construct. Ten out of 29 epitopes were predicted as an antigen by the VaxiJen server. All the 10 antigenic epitopes were determined as a non-toxic by ToxinPred, and only three of them were identified as a non-allergen by both AllerTOP 2.0 and AllergenFP v1.0 servers. After considering the IFN-γ, IL-4, and IL-10 inducibility utilizing IFNepitope, IL4pred, and IL10pred servers, respectively, only one epitope met all the listed criteria as to be selected for vaccine construction (Table 2, and Table S2).

Table 2 The list of selected epitopes for vaccine construction

Prediction and assessment of LBL epitopes

Bepipred Linear Epitope Prediction 2.0 predicted 53 LBL epitopes of variable length ranging from six to 29 amino acids. Thirty-six of them were identified as an antigen by the VaxiJen server. Among them, 32 epitopes were non-toxic, and 15 epitopes were non-allergen. Out of the 15 epitopes, four most antigenic ones were selected; all of them were recognized as IgG-epitopes by the Igpred server (Table 2).

Epitope conservancy analysis

The conservancy of the nine final selected epitopes analyzed by the IEDB conservancy tool indicated epitopes as high conserved regions (Table 3).

Table 3 The conservancy prediction of the final selected epitopes of the vaccine construct

Molecular docking between T lymphocyte epitopes and MHC alleles

Five out of 15 selected T lymphocyte epitopes were chosen for the vaccine construct based on the best cluster members, and the lowest weighted energy score of the complex of epitopes and MHC alleles. The selected epitopes with their corresponding MHC alleles are shown in supplementary Tables S1 and S2.

Prediction of population coverage

Population coverage of the selected T lymphocyte epitopes used in the vaccine construct was the focus of our IEDB population coverage analysis. The global population coverage of our vaccine was 93.77%. Europe, North America, and Oceania regions showed the highest rate of regional coverage among all with the percentage of 97.54%, 94.07%, and 92.69%, respectively, while Central America showed the least coverage with the percentage of 8.48% (Fig. 2).

Fig. 2
figure 2

The workflow of designing the mRNA vaccine against BKV

Notably, MHC molecules are highly polymorphic, and over a thousand different human MHC alleles have been known. In this regard, MHC molecules present a vastly varying binding specificity to the cognate epitopes. Thus, the population coverage scores are different among various ethnicities.

Vaccine construction

The final construct of our mRNA vaccine contained the following part in order from the N-terminal to the C-terminal direction as shown in Fig. 3.

Fig. 3
figure 3

Schematic presentation of the designed mRNA BKV vaccine

7-methyl(3-O-methyl) GpppG (ARCA) Cap–5′UTR–Kozak sequence–tPA (Signal peptide)–AAY linker–50S ribosomal protein L7/L12 (Adjuvant)–EAAAK linker–LBL epitope–GPGPG linker–LBL epitope–GPGPG linker–LBL epitope–GPGPG linker–LBL epitope–GPGPG linker–HTL epitope–GPGPG linker–CTL epitope–AAY linker–CTL epitope–AAY linker–CTL epitope–AAY linker–CTL epitope–AAY linker–MITD sequence–Stop codon–3′UTR–Poly (A) tail.

Prediction of antigenicity, allergenicity, toxicity, and physicochemical properties of the final vaccine construct

The results of antigenicity, allergenicity, toxicity, and physiochemical properties analyses determined the vaccine to be safe and suitable for the human usage (Table 4).

Table 4 Antigenicity, allergenicity, toxicity, and several physicochemical properties evaluation of the peptide vaccine

VaxiJen and ANTIGENpro both predicted the vaccine is an antigen with scores of 0.5553 and 0.884519, respectively. AllerTOP 2.0 and AllergenFP v1.0 both projected the vaccine is a non-allergen. ToxinPred evaluated the vaccine is non-toxic. In the context of physicochemical properties, the vaccine peptide consists of 359 amino acids, and its molecular weight is 37.91 kDa. The pI of the vaccine was calculated to be 7.70, revealing that it is almost basic in nature; the pI of a positively charged protein is greater than 7. Among the 359 amino acid residues, 45 were negatively charged (Asp + Glu) and 46 were positively charged (Arg + Lys). The instability index (II) of 30.34 was computed for the vaccine peptide, implying that the construct would be stable after expression (II of > 40 indicates instability) [92]. The aliphatic index was calculated as 79.47, indicating the peptide construct is thermostable [93]. The GRAVY was calculated to be negative (− 0.271), which revealed the hydrophilic nature of the vaccine. The vaccine peptide has a half-life of 30 h in human reticulocytes (in vitro), > 20 h in yeast (in vivo), and > 10 h in Escherichia coli (in vivo).

Analysis of post‐translational modifications

Various PTMs were predicted for the final vaccine construct. The analysis performed by the NetNGlyc-1.0 server revealed that there was no N-glycosylation site in the vaccine construct. Prediction of phosphorylation modification carried out by the NetPhos-3.0 server displayed 19 phosphorylation sites (Ser: 8, Thr: 11, Tyr: 0) in the construct. No N-acetylation site was found by the NetAcet-1.0 server. Additionally, there was no lipid PTMs at the N-terminal glycine myristoyl, and GPI-modification predicted by MyrPS/NMT and the big-PI/GPI animals servers, respectively (Table S3).

Tertiary structure modelling and validation

The 3D model of the final vaccine was built by the trRosetta web server; five models were generated. The best model was chosen based on the higher quality of the estimated TM-score and validation results of ERRAT, Verify-3D, PROCHECK, and ProSA-web servers and are shown in Fig. 4A (Table S4). The final selected 3D model had the ERRAT value of 94.774 and the Verify 3D score of 61.28% (Fig. 4B and C). The Ramachandran plot analysis of the final model revealed that 94.2%, 5.5%, 0.3%, and 0% of residues are located in the most favored regions, additional allowed regions, generously allowed regions, and disallowed regions, respectively (Fig. 4D). Moreover, the Z-score calculated by the ProSA-web server was − 5.44 (Fig. 4E).

Fig. 4
figure 4

Tertiary structure prediction and validation of the vaccine peptide: A 3D structure of the vaccine peptide B ERRAT analysis C verify 3D analysis D Ramachandran plot analysis from PROCHECK server and E Z-score analysis from Pro-SA web

Molecular docking analysis of the vaccine peptide

Molecular docking analysis was performed to confirm the interaction between the designed vaccine peptide and its receptors. Using ClusPro 2.0 server, the peptide vaccine was docked with TLR 4 (PDB ID: 4G8A), TLR9 (UniProt ID: Q9NR96), C-type lectin 2 (PDB ID: 3WBP), MHC class I receptor (PDB:1I1Y), and MHC class II receptor (PDB: 1KG0). The docked poses with the highest cluster member were subsequently analyzed for binding affinities and dissociation constants at 37 °C utilizing the PRODIGY web server, which provided the ∆G values of the vaccine-TLR4, vaccine-TLR9, vaccine-C-type lectin 2, vaccine-MHC I receptor, and vaccine-MHC II receptor complexes as − 18.7, − 13.0, − 15.8, − 15.5, and − 12.2, respectively (Supplementary Figs S1 and S2). It can be concluded that all the interactions were energetically feasible, which is evident from the negative values of Gibbs free energy and the low dissociation constant. Moreover, the ∆G value of the vaccine-TLR4 complex was greater among other studied receptors, indicating the specificity of the docked model. The corresponding ∆G values along with the dissociation constant (Kd) values of the docked complexes are shown in Table 5.

Table 5 Binding affinities and dissociation constant (Kd) values prediction of the docked complexes of the designed vaccine and TLR4, TLR9, C-type lectin 2, MHC-I receptor, and MHC-II receptor by PRODIGY server

The refinement process of the docked complexes conducted by the FireDock server revealed the global energy of the vaccine-TLR4, vaccine-MHC I receptor, and vaccine-MHC II receptor complexes were − 173.38, − 156.08, and − 201.61, respectively (Table 6). The docked complex of vaccine-TLR4 along with their interacting residues evaluated by PDBsum server are shown in Fig. 5A and B.

Table 6 The global energy, ACE, and HB energy of the refined docked complexes of vaccine-TLR4, vaccine-MHC I receptor, and vaccine-MHC II receptor
Fig. 5
figure 5figure 5

A Vaccine-TLR4 complex B Interacting residues between docked TLR4 (chain A) and vaccine (chain E) predicted by PDBsum C Deformability graph D B-factor graph E Eigenvalue (F) Covariance matrix (correlated residues in red, uncorrelated in white or anti-correlated in blue motions) (G) Elastic network model (darker gray regions show stiffer regions)

Molecular dynamics simulation

To determine the stability and physical motions of the atoms and molecules of the designed vaccine construct, the MD simulation study was performed by the normal mode analysis (NMA) through the iMODS server; and the vaccine-TLR4 docked complex was subjected to our MD simulation study.

Peaks in the deformability graph reveal the deformable regions of the complex (Fig. 5C). The B-factor graph indicates the relationship between NMA and PDB regions in the complex (Fig. 5D). The eigenvalue for the complex was 9.859968e−07 (Fig. 5E), and it increased gradually in each mode during the dynamics. The covariance matrix (Fig. 5F) depicts the relationship between amino acid duplets, with the correlated residues colored in red, anti-correlated residues colored in white, and non-correlated residues colored in blue, all scattered in dynamical regions. The elastic network model, which is illustrated as a connecting matrix, classifies which atom pairs are connected by springs. As seen in Fig. 5G, the higher stiff areas were found in each of the separate chains of the complex. In conclusion, all results of the iMODS analysis indicated the decrease in mobility and deformability in the coordinates of the complex structure, which signifies the stability of the vaccine-TLR4 complex interface.

Codon optimization and secondary structure prediction of the mRNA vaccine

The codons of the vaccine construct, for efficient mRNA translation in the host cells, were optimized by the GenSmart codon optimization tool. The total length of the mRNA was 1077 nucleotides. The quality of the optimized sequence was assessed via the rare codon analysis tool. The estimated CAI value of the sequence was 0.92; a CAI value of greater than 0.8 is considered for acceptable expression in the host organism (Fig. 6A). The CFD of the sequence was 0%, suggesting that there were no unusual tandem codons in the optimized sequence. Tandem rare codons can reduce translation efficiency or even disengage the translational machinery (Fig. 6B). The average GC content of the adapted sequence was 62.95%, revealing that the vaccine candidate could be expressed well in human cells. The optimal GC content percentage range for efficient expression is between 30 and 70% (Fig. 6C). The secondary structures of the mRNA along with their corresponding free energies were evaluated by the RNAfold server. The MFE secondary structure had a minimum free energy of − 434.70 kcal/mol, while the centroid secondary structure had a minimum free energy of − 389.20 kcal/mol. These results suggest that the mRNA could remain stable after manufacturing (Fig. 6D and E).

Fig. 6
figure 6

Codon optimization and secondary structure prediction of designed mRNA vaccine:(A) CAI value, (B) CFD value, (C) GC curve, (D) MFE secondary structure and (E) Centroid secondary structure of the vaccine mRNA

In silico immune simulation studies

The simulated immunological response was investigated through the C-ImmSim server.

The primary response was characterized by high IgM levels. B-cell populations and the levels of IgG1 + IgG2, IgM, and IgG + IgM antibodies were increased enormously after the second and third injections with concomitant reduction of the antigen concentration, which indicates the emergence of immunological memory and thereby proficient immunity upon subsequent exposures to the antigen (Fig. 7A and B). The TH (T helper) and TC (T cytotoxic) cell populations responded remarkably with the corresponding memory development (Fig. 7C and D). After the third injection, IgG1 levels increased, while IFN-γ concentration and TH cell population remained high throughout the exposure period. The level of cytokines after each injection increased concomitantly reflected by the escalation of IFN-γ and IL-2, which are considered as the most significant cytokines for an anti-viral immune response (Fig. 7E).

Fig. 7
figure 7

In silico simulation of immune response using vaccine as an antigen: (A) Antigen and immunoglobulins, (B) B-cell population, (C) TH (helper) cell population (D) TC (cytotoxic) cell population per state, (E) Cytokine and interleukin production

Discussion

Human polyomavirus type 1, or BKV, is a ubiquitous pathogen that causes various comorbidities and related malignancies in the human population. It is a significant risk factor for immunocompromised individuals, specifically in kidney transplant recipients, by causing renal transplant dysfunction and allograft loss [21, 24]. The efficacy of therapeutics in clinical studies is currently not as much as the desired amount, and only the limited studies have addressed immunoinformatic approaches to design vaccine targets for BKV [12, 94]. In this study, a reverse vaccinology strategy was applied to design a mRNA vaccine against BKV by identifying the most antigenic proteins of the pathogen.

Conventional vaccine approaches, such as live-attenuated, inactivated pathogens, and subunit vaccines are efficient for the prevention of different diseases; however, they do not have met the need for a completely safe and non-allergenic vaccine with a rapid development and large-scale production criteria [95, 96]. Peptide-based vaccines have been reported with lower immunogenicity indexes, especially when used alone without adjuvants [97, 98]. Plasmid DNA vaccines, known as the third generation of vaccines, evoke safety issue concerns such as activation of oncogenes due to genomic incorporation of immunizing DNA and the possible power of eliciting anti-DNA antibodies [99, 100]. Major technological innovation and research investment over the past decade demonstrate that the rapidly growing field of mRNA therapeutic agents has become a potent platform to solve many of the challenges in vaccine development for both infectious disease and cancer [101]. The use of mRNA vaccine is prior to subunit, killed and live-attenuated virus, as well as DNA-based vaccines, due to several features of that; as mRNA is a non-infectious and non-integrating platform, there is no potential risk of infection or insertional mutagenesis. Moreover, the in vivo half-life of mRNA can be regulated by various modifications and delivery methods, as it is degraded by normal cellular processes [102,103,104,105]. The immunogenicity of mRNA vaccines can also be modified and downmodulated to increase their safety profile [106]. Comparatively, low cost of production, capability of rapid development, and acceptable efficacy of vaccine could be as other beneficial aspects of mRNA vaccine. Due to their cytoplasmic expression without the need of entering into the nucleus, they possess a higher efficacy than DNA vaccines [107]. The mere alteration of a mRNA sequence can express almost every targeting protein with new indications, and utilizing the already established production process results in versatility, flexibility, time-saving, and cost-reduction manufacturing [67, 98, 108]. Though mRNA vaccines have some inherent drawbacks such as immunogenicity, instability, and delivery inefficiency, recent advances in synthesis technology and structural modifications of mRNA sequences have led to some prosperous results [109,110,111,112,113]. Considering that the first two mRNA vaccines developed by Pfizer/BioNTech (BNT162b2) and Moderna (mRNA-1273) were approved by the FDA through emergency use authorization in the USA in December 2020 [114, 115].

To our knowledge, this is the first study to design a mRNA multi-epitope BKV vaccine by identifying the most-antigenic epitopes in the entire BKV proteome.

Vaccination could provide immunological memory, which can persist for several years to several decades [116,117,118]. Stimulation of both B and T lymphocyte-mediated immunity is considered vital for any successful vaccination strategy by providing a faster and more efficient immune response in the host upon encountering the target pathogen in the future.

The cytotoxic activity of CTLs is an important part of the immune response in viral infections. Virus-infected cells degrade some of the viral proteins and present them to the CTLs in combination with MHC I molecules. The recognition of degraded parts of viral proteins, called epitopes, by CTLs leads to the killing of infected cells by releasing of cytotoxic molecules [119]. In our study, 63 CTL epitopes were identified. The highest score of binding along with overlap** regions was detected by using a variety of methods for further analysis of immunogenicity, antigenicity, allergenicity, and toxicity. Following the analysis, 14 epitopes were confirmed to be both safe and antigenic, which were suitable to be incorporated in the vaccine construct (Supplementary Table S1).

After the proteolytic cleavage of viral antigens, antigen-presenting cells display viral particles to HTLs, also known as CD4 + T cells, in combination with MHC-II molecules, which causes activation of HTLs. Afterward, the HTLs secrete a wide range of cytokines and chemokines such as IFN-γ, IL-4, and IL-10, which play vital roles in the immune response against the pathogen [120,121,122,123]. Here, 29 HTL high-scored and shared peptide regions were selected for further analyses of antigenicity, allergenicity, toxicity, and cytokine-inducing properties such as IFN-γ, IL-4, and IL-10. After evaluating the epitopes, only one epitope had all the proper criteria and was selected in the vaccine construct (Table 2).

B lymphocytes secrete specific antibodies to neutralize specific viral pathogens. Through differentiation into long-lived plasma cells and memory B lymphocytes, long-term immunological protection is provided [124, 125]. The activation of B lymphocytes happens through the binding of B cell receptors to either soluble or membrane-bound epitopes [126]. Reliable prediction of B lymphocyte epitopes via various computational tools is a crucial part of designing an efficient vaccine [127, 128]. There are two types of B cell epitopes, linear and conformational B cell epitopes [127]. In our investigation, 53 LBL epitopes were discovered for further analysis; after evaluating them for antigenicity, allergenicity, toxicity, and IgG inducibility, four LBLs were identified as suitable for using in the vaccine construct (Table 2).

According to the conservancy results of the selected epitopes among BKV strains, high protection against this viral invader was predicted (Table 3).

Molecular docking is an integral part of vaccine design studies, which is widely used to evaluate the binding affinity of CTL and HTL epitopes to their corresponding MHC alleles.

Our selected 15 T lymphocytes (14 CTLs and one HTL) epitopes had 12 corresponding MHC alleles. We subjected each of these CTL and HTL epitopes and one of their corresponding alleles to molecular docking. Among them, only four CTLs had the best cluster members and the lowest energy score, which indicates the highest binding affinity between them. Surprisingly, the only HTL epitope had the acceptable binding energy and cluster members and was confirmed to be involved in our vaccine construct (Supplementary Tables S1 and S2). All of these 5 T lymphocytes were finally selected for the vaccine construct (Table 2).

The population coverage analysis of the T lymphocyte epitopes upon the distribution of MHC alleles around the globe revealed a decent worldwide coverage of 93.77%. Furthermore, a higher degree of coverage was predicted for countries that were reported as the most affected regions by BKV (Fig. 2).

In our study, a potential multi-epitope mRNA vaccine model against the BKV, including the T- and B-cell epitopes was constructed. The TLR-4 agonist, 50S ribosomal protein L7/L12 protein from Mycobacterium tuberculosis, was used as an adjuvant to enhance the immune responses. 50S ribosomal protein L7/L12 induces MyD88 and TRIF signaling pathways and can enhance dendritic cell (DC) activation and Th1 polarization [129]. The translation efficiency and half-life of mRNA are influenced by the length of the poly(A) tail. Protein expression was elevated and prolonged in our mRNA vaccine with an A120 tail [130]. The 5′ UTR of the human β-globin gene, which can improve mRNA translation efficiency, and the 3′ UTR of the rabbit β-globin gene, which can modify antibody titer, seroconversion, and cytokine profiles, were also added to the vaccine construct [130, 131]. The 7-methyl(3-O-methyl)GpppG Cap (also known as ARCA [132, 133]) was used to cap the 5′ end. It is founded that the 5′ end cap and the 3′ end poly(A) tail synergistically regulate mRNA translational efficiency [134]. The MITD sequence was added as it has been shown to enhance CD4 + T cell antigen presentation efficiency [135]. The signal peptide of the tissue plasminogen activator (tPA) of Homo sapiens was also added into the vaccine to improve the immunogenicity of that [136].

The selection of an appropriate epitope-specific linker is a crucial step in designing a multi-epitope vaccine, so that the domains can work independently, avoiding interference between them. Linkers used in this study were chosen based on their length and rigidity-flexibility properties [137, 138]. The EAAAK linker was incorporated between epitopes and adjuvant to improve the bioactivity of fused protein and to reach a high level of expression and an increase in the stability of the vaccine construct [138]. Previous studies indicated that GPGPG and AAY linkers added between predicted the HTL and CTL epitopes, respectively, produced synergistic immunogenicity, consequently allowing the rational design construction of a potent multi-epitope vaccine [139, 140]. Our designed mRNA vaccine construct has a total length of 1077 nucleotides.

The computational analysis of the translated peptide from our mRNA construct predicted that the vaccine is almost neutral, stable, highly antigen, non-allergen, hydrophilic, and thermostable with a longer half-life in the human reticulocyte, making it as a potential candidate for mRNA vaccine (Table 4).

Analyses of various post-translational modification sites within the target protein indicated that there were only 19 sites of phosphorylation modification. Some studies indicate that the phosphorylation of some proteins causes them to be degraded by the ATP-dependent ubiquitin/proteasome pathway [141]. Since this construct contains several sites for aforesaid modification, it may improve its degradation, and subsequent entry into the MHC-I pathway (Supplementary Table S3).

In the context of the 3D model of the vaccine, according to ERRAT, Verify 3D, Ramachandran plot, and ProSA-web evaluations, the quality of the vaccine 3D model prepared by the trRosetta web server, has a high quality and there was no need for refinement (Fig. 4 and Supplementary Table S4).

The structure of the vaccine peptide was docked against TLR 4, MHC-I and MHC-II receptors, next the refinement of the best complex was performed. The lowest docking energy score was used to select the top-ranked vaccine-TLR4 complex structure for the MD simulation analysis. The MD simulation study of the vaccine-TLR4 and complex confirmed its stable nature at atomistic conditions (Tables 5 and 6).

The codon improvement of the designed mRNA sequence for better expression in the human host was performed using a codon optimization tool. Several properties of the optimized sequence demonstrated that the mRNA would be expressed efficiently by the host cells. The predicted CAI value of 0.92 suggested that the codons of the vaccine can efficiently express in the human cells (Fig. 6A). Moreover, the predicted CFD value suggested that the mRNA sequence contained no unusual tandem codons that can reduce the translation efficiency in the host cell (Fig. 6B). The free energies and secondary structures results of the optimized sequence indicated that the mRNA construct will be stable as well (Fig. 6D and A).

In silico immune simulation was performed to evaluate the immunogenic potential of our target antigen. The vaccination regime of the only two FDA-approved mRNA-based vaccines developed by Pfizer/BioNTech (BNT162b2) and Moderna (mRNA-1273) has widely been established at day 0 and day 21 or 28, respectively and followed by at least 6 months resting period between the second and the third dose [142,143,144,145,146]. Therefore, we conducted another study according to the Pfizer/BioNTech (BNT162b2) and Moderna (mRNA-1273) vaccination regime, where the designed vaccine was administered at 1, 63, 873 time-steps (the first dose was given at time = 0, and each time-step equals to 8 h of real life). The other parameters were as same as the main conducted study. The results indicated that the main vaccination scheme, including 1, 84, and 168 time-steps had overall superior immunological responses. The results indicated a general increase in immune responses following repeated exposure to the antigen (Fig. 7). B-cell and T-cell populations both responded remarkably with corresponding memory development. The levels of IL-2 and IFN-γ were also increased, which are cardinal for immune system activation in the case of viral infection.

Conclusion

Our immunoinformatic-based approach for designing a multi-epitope mRNA vaccine against BKV demonstrated the proposed novel vaccine model as an efficient candidate. However, it should be tested and determined by meticulous in vitro and in vivo studies and clinical trials for further validation of its efficacy.

Supplementary information