Introduction

Tuberculosis (TB) is a transmissible disease that is characterized by the infection of the opportunistic bacterial species Mycobacterium tuberculosis (Mtb), primarily targeting the lung [1, 2]. TB is initiated when Mtb is deposited onto the surface of the lung alveoli from the airborne droplets containing the pathogen. Pathogen-containing droplets are mostly dispersed by people with active pulmonary or laryngeal TB. Inhalation of these droplets results in symptoms such as persistent cough, fever and night sweats [3, 4].

The most common cause of mortality caused by a single infectious agent is TB [5]. In 2020, 1.44 million individuals died, with 214,000 of them being HIV-positive [6]. Around 1.7 billion people were latently infected with Mtb in 2014, almost a quarter of the world population [7]. Generally, individuals with a healthy immune system can suppress the growth of Mtb, but the situation gets complicated for immunocompromised patients (e.g. HIV infected or patients with diabetes) who cannot generate enough immune response to suppress the progression of infection [1, 4, 5, 8].

Pulmonary macrophages play a critical role in the primary immune response against Mtb upon entry while a minimum of 12 days are required for the CD4+ T-cells after aerosol infection to respond [9,10,11]. During this period, Mtb increases its population number by > 20,000-fold [9]. Mtb is readily phagocytized by the macrophages present in alveoli. Most often the entering bacteria are killed by these macrophages. But some bacteria can escape from being phagocytized and they start to proliferate within the macrophage as an intracellular parasites [10].

To provide better protection against Mtb, vaccination is a must as the adaptive immune system requires a lot of time to be activated against this pathogen. Moreover, studies on Mtb infected patients have revealed the presence of various multidrug-resistant strains as well as extensively drug-resistant strains [12]. Currently, the only available approved vaccine against TB is Bacillus Calmette-Guérin, BCG [12, 13]. BCG is generally injected into newborn babies which protects them from Mtb till the age of 10 years [8]. For adults, the efficacy of the BCG vaccine varies greatly between 0 and 80% [12]. Several vaccines are in clinical trial, showing promising results. For instance, the M72/AS01E vaccine, a subunit vaccine, is showing very good results in its clinical trial [14]. Although there are several vaccines in the clinical trial phase, currently there is no mRNA-based vaccine developed for TB, neither in the clinical trial phase nor licensed.

mRNA vaccines are a rapidly develo** area. A vast amount of preclinical evidence has been obtained recently, and several human clinical trials have begun [15]. Based on this evidence, mRNA is now considered a safe and effective alternative to (subunit) protein, chimeric virus, and even DNA-based therapies in the form of vaccination [16]. The transient expression and accumulation of selected antigens in the cytoplasm are induced by mRNA.

The proteasome in the cytoplasm of antigen-presenting cells (APCs) can breakdown the antigen into peptides. In the endoplasmic reticulum, the peptides with antigenic properties, are complexed with nascent major histocompatibility complex (MHC) class I molecules. The peptide–MHC complexes can then activate CD8+ cytotoxic T (Tc) cells when they are expressed on the surface of the cell membrane of APCs. CD4+ helper T (Th) cells are activated by MHC class II–peptide complexes expressed on the surface of the cell membrane of APCs. Antigens, secreted by, or released from dead cells that have uptaken and translated the exogenous mRNA, can bind with B-cells within the extracellular matrix, activating these cells. As a result, all adaptive immune effectors, including B lymphocytes, Tc-cells and Th-cells, will be activated by mRNA-based vaccines [16, 17].

In silico analysis of target proteins has simplified the identification of immunogenic B- and T-cell epitopes in the proteins, facilitating the detection of the antigenic epitopes specifically having the potential of eliciting an immune response in particular [18]. Because in silico predictions can minimize the number of experiments required, this strategy is both cost-effective and convenient [19, 20]. However, in silico vaccine design strategy seems to be quite effective, but it might not be efficient enough to catch pace with the advent of newer pathogens. All findings must be thoroughly and extensively analyzed in order to identify antigenic regions for designing an effective vaccine, which presents a substantial overhead and can be time-intensive [21]. This technique has been successfully implemented to develop a vaccine against various pathogens, for example, serogroup B Neisseria meningitides (MenB) [22].

In this study, an mRNA vaccine has been modeled, named MT. P495, using several bioinformatics tools targeting the phosphate-binding protein PstS1 of Mtb and also has been tested computationally for its ability to elicit immunogenic response and safety, predicted several types of T-cell and B-cell epitopes present within this antigen and their ability to generate an immune response within the host body. PstS1 protein is an immunodominant, TLR-2 agonist, inorganic phosphate up-taking lipoprotein found on the cell membrane surface of Mtb and also exhibits function as an adhesion molecule that facilitates binding with macrophage through mannose receptor (MR). This mRNA vaccine model thus serves as ready to test model in vivo by experimentalists and industries.

Materials and methods

A graphical depiction of the workflow is represented in Fig. 1A.

Fig. 1
figure 1

Graphical presentation of A steps involved in modeling an mRNA-based vaccine and B mRNA of the MT. P495 vaccine

Retrieval of protein sequence and 3D structure

A total of 3470 sequences of phosphate-binding protein PstS1 of Mtb were retrieved from the NCBI [23] database. The retrieved protein sequences had a length of 374 amino acids. The first 23 amino acids were the signal peptide and the rest of them encoded the PstS1 protein. Multiple sequence alignment (MSA) between the retrieved sequences was performed using the Clustal Omega [24] and the result was analyzed using the JalView [25]. The consensus sequence was retrieved from JalView and was used for sequence-based epitope prediction. The consensus sequence was 100% identical to the reference sequence of PstS1 retrieved from UniProt [26] (UniProtKB accession number P9WGU1). The X-ray crystal structure Mtb H37Rv PstS1 was obtained from RCSB PDB [27] (PDB ID-1PC3) for the prediction of structure-based discontinuous B-cell epitope.

Identification of cytotoxic T-lymphocyte (CTL) epitopes

To predict CD8+ CTL epitopes, NetCTL 1.2 Server [28] was used. The NetCTL server predicts CTL epitopes in a given protein sequence using the stabilized matrix base method [29]. The threshold level for prediction of a CTL epitope was set at 0.5 with a specificity of 0.94 along with a sensitivity of 0.89. Further analysis was performed on predicted epitopes with a higher combined score.

SMMPMBEC prediction method [30] of the T-cell epitope prediction resource of the Immune Epitope Database (IEDB) [31] was utilized to predict both types of MHC-I-binding alleles, occurring frequently as well as non-frequently. The half-maximal inhibitory concentration (IC50) threshold was set at 250 nM.

Helper T-lymphocyte (HTL) epitopes identification

The T-cell epitope prediction resource of the IEDB server was used to predict promiscuous CD4+ HTL epitopes. The NN-align 2.3 (NetMHCII 2.3) method [32] was employed in this study to predict MHC-II-binding alleles. Peptide length was set to be 15. For further analysis, predicted peptides having an IC50 value less than 50 nM were considered.

Evaluation of the selected epitopes

For the analysis of antigenic, allergenic and toxic properties of all predicted epitopes, VaxiJen v2.0 server [33], AllerTOP v.2.0 server [34] and ToxinPred server [35] were employed respectively. Based on an alignment-independent manner, VaxiJen predicts the antigenic probability of a given peptide [33]. In this study, the target organism was set to bacteria and the threshold was set to 0.5. For the categorization of allergenic and nonallergenic peptides, AllerTOP v.2.0 utilizes the k-nearest neighbours (kNN) method [34]. Employing both the dipeptide-based SVM algorithm and the MEME/MAST algorithm, the ToxinPred can predict the toxicity of the given peptide [35]. Using the IEDB conservancy analysis tool, the conservancy of the epitopes was determined [36] using a 100% sequence identity threshold.

Cytokine inducing ability of the HTL epitopes was subject to further assessment. IL4pred server [37] was used to predict the interleukin-4 (IL4) inducing ability of the selected peptides while Interleukin-10 (IL10) inducing ability was assessed using the IL10pred server [38]. Another property that was assessed of the HTL peptides was the interferon-γ (IFN-γ) inducing probability. IFNepitope server [39] was used in this study.

Estimation of population coverage

The population coverage of the selected CTL and HTL epitopes was evaluated. Employing the population coverage tool [40] of the IEDB server, population coverage of the selected epitopes and their corresponding MHC HLA-binding alleles was predicted.

Prediction of three-dimensional (3D) structure of the epitopes and HLA proteins

After confirmation of epitope antigenicity, non-allergenicity, non-toxicity, conservancy and the availability of alleles, the 3D structure of the selected 8 CTL peptides, as well as 3 HTL peptides, were predicted using the PEP-FOLD server at the Ressource Parisienne en Bioinformatique Structurale (RPBS) Mobyle Portal [41]. The five most probable structures were predicted by the server for each peptide sequence. For further investigation, the best-predicted structure with the lowest energy model was chosen.

In this study, HLA-C*12:03 and HLA-DRB1*01:01 were used for the validation of the binding of selected epitope and HLA molecules. The 3D structure of HLA-DRB1*01:01 was retrieved from RCSB PDB (PDB ID 4I5B). 3D structure of HLA-C*12:03 being unavailable in PDB, SWISS-MODEL server [42] that performs homology-based modeling, was used for the prediction of the 3D structure of this molecule. The sequence of the HLA-C*12:03 was retrieved from IPD-IMGT/HLA Database [43] (IMGT/HLA Accession No. HLA00455). The predicted structure was validated using PROCHECK [44], MolProbity [45], ProSA-web [46] and QMEAN [47].

Molecular docking of the selected epitopes

To confirm the interaction between the selected alleles and their respective epitopes, at the DINC 2.0 web-server [48], molecular docking was carried out following the protocol described elsewhere [49]. The box centre was set at 13.7, − 60.0, and − 20.2 Å in the X, Y, and Z axes, respectively, to predict the binding energy of HLA-C*12:03 with an epitope. Similarly, to predict the binding affinity of HLA-DRB1*01:01 with an epitope, the box centre, in this case, was set at 39.9, 47.5 and 93.0 Å in the X, Y, and Z axes, respectively. In both cases, box size was also set to automatic (based on ligand). Results from this server were further analyzed using BIOVIA Discovery Studio [50] and UCSF Chimera [51].

Molecular dynamics simulation

Following a procedure described elsewhere [49], MD simulations were carried out employing GROMACS (v2021) [52]. For the simulation process, CHARMM36 all-atom additive force field [53] was used. The protein was cleaned using UCSF Chimera dock prep functionality, and then missing residues were added by using the Dunbrack rotamer Library, employing an in-house python script. GROMACS pdb2gmx tool was used in order to add hydrogens to the protein. The GROMACS program was used to generate the protein topology for HLA and epitope. The system was contained within a cubic box of a simple point charge extended (SPC/E) water model [53], with a minimum distance of 1.0 nm between the wall and any component of the protein. The system was neutralized by adding an aqueous solution of Na+ (sodium) and Cl (chloride) to a 0.15 M ionic strength.

For 100 picoseconds (ps), the system was equilibrated utilizing the NVT and NPT ensemble. For the next 100 ps under an isothermal ensemble, soft coupling with the Berendsen thermostat (NVT) [54] was used to progressively heat the minimized system to target temperatures. Position restrictions to the ligand were imposed before performing NVT simulations to prevent the ligand from drifting away from the protein during equilibration. Employing the LINCS algorithm [55], all of the bonds were restricted. At a temperature of 300 K, the NPT ensemble (constant pressure, and constant temperature simulation criteria) was performed, employing periodic boundary conditions (PBC).

The system was then coupled to a Parrinello–Rahman barostat [56] for an equilibration period of 100 ps at 1 bar of pressure. The Particle Mesh Ewald (PME) method was employed to process the electrostatic interactions. With a cutoff of 1.0 nm, the short-range van der Waals cutoff (rvdW) interactions were computed. Simulations were run for 100 ns. GROMACS was used to compute the root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent accessible surface area (SASA) (using van der Waals Volumes and Radii) [57] and hydrogen bond (H-bond). In-house Python script with Matplotlib [58] and NumPy [59] library, as well as [R] (version 3.6.3) [60] Peptides library [61] were used to generate trajectory plots and figures.

Conformational B-cell epitopes identification

The conformational B-cell epitopes were identified by using the ElliPro [62] tool of the IEDB. ElliPro can identify both the linear and conformational B-cell epitopes based on the 3D structure of the given protein. In this study, the minimum PI (protrusion index) score was set to 0.5 and the maximum distance was set to 6 Å for the prediction of conformational B-cell epitopes.

Linear B-cell (LBL) epitopes identification

To predict LBL epitope, linear B-cell epitope prediction tool from IEDB was used. Epitopes were predicted using Emini surface accessibility prediction [63], BepiPred Linear Epitope Prediction 2.0 [64], Kolaskar and Tongaonkar antigenicity [65] and Karplus and Schulz flexibility prediction method [66], present at the IEDB server. Epitopes having a length between 10 and 40 amino acids were selected. Linear epitopes predicted by ElliPro were also selected. Antigenicity, allergenicity, toxicity and conservancy of the selected epitopes were assessed as described earlier.

Immune simulation of the epitopes

At the C-ImmSim server [67], in silico immune simulation was carried out for the characterization of the immune response profile of the selected peptides. Two injections of the target antigen were administered 4 weeks apart at 1 and 84 time-steps (wherein, the first dose is administered at time = 0 and each time-step is corresponding to 8 h in real life) with a dose of 1000 antigen proteins each, containing no LPS. The simulation was conducted for 5000 simulation steps. Host HLAs were selected according to their occurring frequency. Frequently occurring HLA alleles were selected to perform the study. Other simulation parameters were kept default.

Designing the vaccine mRNA construct

The open reading frame (ORF) of a conventional mRNA-based vaccine consists of five fundamental parts. The target antigen, also known as the gene of interest (GOI) is linked with an adjuvant by a linker. This construct is flanked by 5′ and 3′ untranslated regions (UTRs) and a terminal poly(A) tail. The 5′ end is capped by Cap1 (m7GpppNm) Cap. In this study, PstS1 was used as the GOI which exhibited high immunogenic activity in several studies [68,69,70]. As an adjuvant, 50S ribosomal protein L7/L12 (UniProtKB accession number P9WHE3) from Mtb was used while the signal peptide from the tissue plasminogen activator (tPA, UniProtKB accession number P00750) of Homo sapiens was also used.

Two peptide linkers were used to join the polypeptide chains. GGGGSEAAAKGGGGS linker was used to link the GOI and the adjuvant. Another peptide linker, AAY was used between the signal peptide and the adjuvant. In this study, the 5′ UTR from human β-globin gene (NCBI accession number NM_000518.5) along with the 3′ UTR from rabbit β-globin gene (GenBank [71] accession number V00882.1) were used to flank the construct. A 120-nucleotides (nts) long poly(A) tail was added to complete the vaccine construct (Fig. 1B). The construct was named MT. P495.

Optimizing codons and predicting secondary structure of the vaccine mRNA

For the vaccine mRNA to be efficiently translated by the host cells, codon optimization is important. Therefore, the codons of the final vaccine construct were optimized for efficient expression in human cells using several codon optimization tools; JCat [72], GeneArt Instant Designer by Thermofisher, GenSmart™ Codon optimization by GenScript (GS), Codon Optimization Tool by Integrated DNA Technologies (IDT). The quality of the optimized codons was analyzed Using Rare Codon Analysis tools by GS. This tool can predict the efficiency of the translation of the mRNA expressed as the codon adaptation index (CAI) value. Also, the presence of any tandem unusual codons can be detected, shown as codon frequency distribution (CFD). Based on these parameters, the best-optimized sequence was chosen for further assessment.

The secondary structure of the mRNA construct was predicted using the RNAfold tool of ViennaRNA Package 2.0 [73]. Both the minimum free energy (MFE) structure and the centroid secondary structure of the mRNA were obtained from this tool along with their MFE.

Physicochemical properties assessment of the vaccine peptide

For the assessment of physiochemical properties, several bioinformatics tools were used. tPA signal sequence was excluded as this segment will be cleaved by the protease, only the adjuvant and the target antigen were assessed for various properties. To predict the antigenicity of this peptide segment, VaxiJen v2.0 [33] and ANTIGENpro [74] were used. Allergenicity of the peptide was predicted using AllerTOP v.2.0 [34]. The toxicity of the peptide was predicted using ToxinPred [35]. Various physiochemical properties [i.e., theoretical isoelectric point (pI), instability index (II), aliphatic index (AI), and grand average of hydropathicity (GRAVY)] were predicted using ProtParam [75]. Adhesin probability was checked using Vaxign [111]. All the interacting amino acids of PstS1 with those antibodies were also identified in this study as conformational B-cell epitopes. MSA showed that most of the interacting residues were 100% conserved in all the isolates of Mtb. Twenty-two linear epitopes were also identified which were further analyzed for their antigenic, allergenic and toxicity profile. After analysis, eight epitopes were found to be safe and immunogenic. Among them, several were found to overlap with previously identified epitopes [112, 113] (Supplementary Table 4). These evidences suggest that PstS1 would be able to efficiently stimulate the B-cell epitope.

In this study, a theoretical mRNA vaccine, MT. P495 was delineated against Mtb. To construct the MT. P495 mRNA, PstS1 was used as the GOI. To generate a stronger immune response, an adjuvant was added with the GOI. Adjuvant plays an important role in improving humoral and cellular immune response [114, 115]. 50S ribosomal protein L7/L12 protein from Mtb was used as an adjuvant. 50S ribosomal protein L7/L12 itself is a TLR-4 agonist and has the potential to induce dendritic cell maturation [116]. The translation efficiency is also influenced by the length of the poly(A) tail. mRNAs with an A120 tail showed increased, prolonged expression of the protein and a poly(A) tail longer than A120 did not show any significant effect on the expression of the target protein [117]. UTRs from two different genes were used in this vaccine construct: 5′ UTR of the human β-globin gene and 3′ UTR of the rabbit β-globin gene. These UTRs are important for the efficient translation of the mRNA. The UTRs of the human β-globin gene can increase the efficiency of mRNA translation [118]. The 3′ UTR of rabbit β-globin gene can influence seroconversion, antibody titer and cytokine profiles [117, 119]. To cap the 5′ end, Cap1 (m7GpppNm) was used. The half-life of mRNA is also increased as a result of a synergistic effect of 5′ end cap and 3′ end poly(A) tail [120]. Another peptide chain, the signal peptide was obtained from the tPA of H. sapiens which can improve the immunogenicity of the vaccine [121]. Two different linkers were used to join different components of the vaccine construct. Linker molecules can join two polypeptide chains, the protein moieties [122]. They are also used for increasing the stability of the final mRNA product, the protein [122, 123]. The linker GGGGSEAAAKGGGGS can increase both the thermal as well as the pH stability of the fused protein [124]. Another linker, AAY is an in vivo cleavable linker that acts as a cleavage site of proteasomes [125]. The final length of the mRNA construct was 1856 nts.

The codons of the delineated mRNA were optimized for efficient translation and better expression in the host (Supplementary Fig. S9; Supplementary Table 5). The properties of the optimized sequence suggested that host cells will efficiently express the mRNA. No tandem unusual codons were identified which is also an indicator of efficient translation. For optimization of the codons, we integrated two different codon optimization tools. In vivo experiments also suggest that the combined use of these two tools can increase protein expression efficiency [126]. The optimized codons have been used for the prediction of the secondary structure of the mRNA. Predicted free energy and the structures indicated that the mRNA will be stable.

The physiochemical properties of the vaccine peptide were assessed to ensure the safety of the vaccine (Supplementary Table 6). The peptide was predicted to be antigenic, non-allergenic, and non-toxic. So, it will be safe for human usage. Since the protein will have a longer half-life in mammalian reticulocytes, it is likely to remain stable inside the host cells and will be bioavailable for a long time. After the confirmation of safety in the host, the structure of the vaccine peptide (Fig. 4) was predicted to confirm the binding ability of the vaccine peptide with TLRs and MR by the molecular docking study. Both of the ligand peptides (adjuvant-linked GOI and GOI alone) could bind with the receptors with a high binding affinity (Table 2). The binding affinity also indicated that the vaccine peptide will be able to bind with the receptors with a higher binding affinity than the GOI alone. It was also found that our vaccine peptide could bind with the receptors 100 times more tightly than GOI. The stability of the receptor–ligand complex was checked (Fig. 5). To perform this analysis, the vaccine–TLR4 complex was selected. The deformability graph and covariance map indicated that the complex will be stable.

Instead of using an in vivo method, IVT is a popular and preferred method for producing mRNA on a large-scale [127]. In this study, promoter and termination sequences specific to the T7 polymerase were used for the efficient synthesis of RNA in IVT. Maintaining the homogenous length of the poly(A) tail is a major problem of this method. Most of the time, the poly(A) tails vary in length if the circular plasmid is used. To solve this problem, the pJAZZ-OK® linear plasmid was utilized as a cloning vector [84]. The total length of the recombinant plasmid was 14,977 bp.

All of the results above, as well as previous research on humans as hosts, suggest that the proposed mRNA vaccine candidate, MT. P495, will probably elicit a strong immune response, specific against Mtb. In order to develop a viable Mtb vaccine in the future, this modeled mRNA is an excellent vaccine model that can be readily employed for laboratory testing, including in vitro as well as in vivo studies.