Introduction

It has been about 2.5 years since the first case of novel coronavirus pneumonia was reported. So far, there have been millions of cases of COVID-19, including a lot of deaths, reported to WHO [1]. Although this is not the first time humans have been hit by coronavirus, no other epidemic has been as widespread as this one [2, 3]. The virus that caused this outbreak was named SARS-CoV-2 by WHO [4]. In short, the novel coronavirus pandemic has affected health, economy, culture, and politics around the world [5, 6]. Therefore, we have to accelerate our speed to develop drugs to prevail over the pandemic.

SARS-CoV-2 which appears round or oval in shape is a β coronavirus with a coating. The viral genome contains about 30,000 nucleotides, encodes two polyproteins and four structural proteins including spike protein, envelope protein, membrane protein, and nucleocapside protein, and interspersed with several auxiliary proteins from 5′ to 3′ [7, 8]. Mpro (nsp5) and PLpro (a domain in nsp3) cleave two polyproteins to create 16 non-structural proteins, which are crucial for SARS-CoV-2 replication and immunological escape [9,10,11,12].

Because of how close its function is to that of the 3C protease found in small RNA viruses, Mpro is also called 3C-like protease. Mpro has a catalytic Cys-His binary structure, unlike other chymotrypsin-like and many serine or cysteine proteases. It consists of 306 amino acids, and most of the cleaved sites of this protease are identified as leu-gln-|- (ser, ala, gly). The mass of this protease is determined by mass spectrometry to 33,797.0 Da [13]. The Mpro structure is divided into 3 regions, region 1 (8-101) and region 2 (102-184) have an inverse parallel β structure containing 3 β barrels each, region 3 (201-303) contains 5 α spirals, and region 2 and region 3 are connected by a long ring structure (185-200). The dimer form of Mpro is active, the monomer is inactivated, and region 3 plays a crucial role in the dimerization process [14]. It has 96% sequence homology with SARS-CoV. But there are two mutation sites T285A and I286L in SARS-CoV-2, which makes the three regions closer, which eventually leads to an increase in catalytic turnover [15, 16].

The protease’s substrate binding pocket lies between regions 1 and 2, and its active position is made up of 4 pockets [17, 18] (Fig. 1). The Mpro of SARS-CoV-2 has also been found to have a novel role, demonstrating that the protease may directly cleave proteins including NLRP12 and TAB1 involved in the host’s normal immune response. The elevated cytokines and improved inflammatory response in patients may be explained by NSP5-mediated cleavage of NLRP12 and TAB1 [19].

Fig. 1
figure 1

Pocket of Mpro

Mpro is a desirable target for the exploitation of antiviral medicines due to its functional significance and the lack of closely similar human homologues [20].

So far, the FDA has approved a new oral coronavirus drug, Paxlovid, consisting of Mpro inhibitor nirmatrelvir and cytochrome P450 enzyme CYP3A4 inhibitor ritonavir [21]. In addition, S-217622 developed by Japan’s Shionogi & Co., Ltd has also conducted clinical trials. The drug has strong metabolic stability, high oral availability, and low clearance, and it can turn into a relatively expectant heavy-weight drug against SARS-CoV-2 [22]. Some non-covalent inhibitors are listed here (Table 1) [23,24,25,26].

Table 1 Non-covalent inhibitors

Based on the severity of the transmission of new coronavirus, we still need to find drugs to fight or live with COVID-19, and this paper found potential non-covalent potential inhibitors, which lays a foundation for the design of novel non-covalent inhibitors of Mpro.

Results

Sequence alignment and protein structure alignment

First, we compared the amino acid sequences of the main protease of five variants of SARS-CoV-2 and found that the amino acid sequence identity was 99.8% [28] (Fig. 2). Among them, one mutation is K90R, while another mutation is H132P. We superposed the three crystal structures and highlighted their active pockets and found that the small changes in the active pockets are almost negligible (Fig. 3). We selected the structures of these two mutants (PDB: 7U29 [24], 7TVX [29]) and primitive Mpro (7RNK [30]) on the RCSB PDB and compared them. Pairwise structure alignment [31] shows that RMSD between 7RNK and 7U29 (K90R) is 0.5 and TM score is 0.99. The two values are between 7RNK and 7TVX (P132H) of 0.35 and 0.98, respectively. These results reflect to some extent that the Mpro of multiple variants of SARS-CoV-2 is highly conservative.

Fig. 2
figure 2

Amino acid sequence alignment

Fig. 3
figure 3

Superposition of the three crystal structures and minor differences in the active pocket

Drug-like screening and structure-based virtual screening

We screened 5365 drug-like compounds from 15,033 compounds (L5610 and L5600 from TargetMol), and the drug-like standards were partially explained by ADMETlab 2.0 [32]. Prior to the structure-based virtual screening, we used the PDB structure 6LU7 and 6W79 to verify the effectiveness of virtual screening [33]. By docking X77 and Mpro in 6LU7 for many times, we found that Qvina2.1 [20, 34] was reliable in evaluating the non-covalent binding between small molecules and proteins. The RMSD between X77 in crystal structure and highest scoring conformation produced by docking is 1.275 at − 8.3 kcal/mol, which explains the credibility of Qvina2.1 (Fig. 4).

Fig. 4
figure 4

X77 and Mpro docking by Qvina2.1 (cyan: crystal structure, green: docking structure)

We screened 901 compounds from 5365 drug-like compounds with docking score less than − 8.0 kcal/mol by using Qvina2.1. The top eight compounds with the highest docking score are provided. The presented parameters include the docking score, MW, logP, nHA, nHD, and TPSA (Table 2). The compound number that corresponds to the compound name attached to the supplementary materials (Table S1).

Table 2 Top eight compounds screened by Qvina2.1

Pharmacophore modeling and pharmacophore screening

Since there are several known active Mpro inhibitors, we selected several crystal structures from the RCSB PDB (PDB: 6lze, 6m0k, 6wtj, 6y2f, 7jyc, 7k0g, 7k0h, 7k40, 7lbn, 7ltj, 7lzv, 7m04, 7tdu, 7tfr, 7tll [35,36,37,38,39,40,41,42,43]) to define the pharmacophore characteristics of the inhibitors, including hydrogen bond donors (HD) and acceptors (HA), aromatic rings (AR), and covalent bond (CV) features (Fig. 5). Almost all inhibitors contain hydrogen bond donors, hydrogen bond receptors, and covalent bonds, differing in their number. 7ltj contains a special inhibitor that contains an aromatic ring that has a Π-Π stacking interaction with Mpro of SARS-CoV-2. In addition to providing the characteristics of the pharmacophore, the pharmacophore model also provides the diversity of the pharmacophore position.

Fig. 5
figure 5

Pharmacophore modeling by AncPhore

At the same time, we used AncPhore to generate the low-energy conformations of 901 compounds selected by Qvina2.1. Each compound corresponds to ten low-energy conformations at most. The top ten compounds of APScore [44] were selected from the low-energy conformation database of these compounds (Table 3). And they have a high degree of matching with the pharmacophore characteristics of 7ltj, which corresponds to the Mcule-5948770040 (take the 9000 as an example) (Fig. 6). 7ltj’s high match rate may be due to AncPhore’s determination that Mcule-5948770040 has relatively few interactions with Mpro.

Table 3 Top ten compounds screened by Qvina2.1 combined with AncPhore
Fig. 6
figure 6

The 9000 compound matched with Mcule-5948770040

Molecular dynamic simulation

RMSD

From the eutectic structure of Mpro and its noncovalent inhibitor X77 (PDB: 6w79 [27]), the chemical structure of X77 was extracted and processed and then docked with Mpro (PDB: 6lu7). Then, the initial conformation with the highest Qvina2.1 score was selected as the control group for molecular dynamic simulation. The size of RMSD reflects structural deviation of study object. We extracted the RMSD of each complex (Fig. 7a) relative to the Mpro protein backbone and the RMSD (Fig. 7b) of the Mpro protein backbone itself from the 100-ns MD simulation trajectory. We also extracted the RMSD of each ligand (Fig. 8) relative to the ligand in the initial conformation of the complex to explore whether the ligand is stable in its initial position. In all complexes, we found that the RMSD trend of the complex was the same as the RMSD of the Mpro backbone itself, so we only discuss the change of the RMSD trend of the complex and ligand here.

Fig. 7
figure 7figure 7

a The RMSD of each complex relative to the Mpro backbone in the 100-ns MD. b The RMSD of the Mpro relative to itself in the 100-ns MD

Fig. 8
figure 8

The RMSD of ligands relative to ligands in the 100-ns MD

The RMSD of the complexes of Mpro with eight compounds (2246, 2308, 2933, 3717, 5606, 481, 2432, 2908) remained at a stable level during the simulation of 100 ns. For 1047 (45 ns), 4368 (55 ns), 2490 (22 ns), 3083 (40 ns), 5125 (15 ns), and 9000 (65 ns), all of them keep low RMSD at first, then rise at a certain node, and keep stable in the following time range (the time node after rising and stabilizing is in parentheses). The RMSD of Mpro-783 complex fluctuates in the first 50 ns and returns to stable in the last 50 ns. Mpro-1179 complex has low RMSD between 60 and 80 ns, and it is stable in other time. The RMSD of Mpro-1543 complex increased steadily in the first 50 ns and remained stable in the first 50–85 ns and then decreased slightly. The most interesting complex is Mpro-9328, whose RMSD has a sharp rising and falling trend between 32 and 35 ns. This reflects the instability of the complex over a short period of time, but does not affect its subsequent RMSD which remains stable for a longer period of time. Finally, the RMSD of the control group Mpro-X77 increased within 35–65 ns but remained low and stable for the rest of the time.

As can be seen from that RMSD of the ligand, almost all of the compounds are not maintained in the initial conformation. Some compounds (1047, 2308, 481) maintain one conformation initially and then cross the energy barrier to maintain another low energy conformation. Some compounds (4368, 5125, 3083, X77) were initially unstable and subsequently maintained a stable conformation for a long period of time. Most compounds (2246, 2933, 783, 1179, 2432, 1543, 5606, 9328, 2908, 9000, 3717) are in a state of fluctuation, but some (1179, 3717, 5606, 2908) of these compounds can be seen in the final period of conformational stability. The RMSD of these compounds is likely to be normal, no matter how to change, because many highly flexible small molecules do not have a single stable conformation, and they have a large number of low-energy conformations. And that potential energy barrier between the conformations is low and can be easily crossed.

Root mean square fluctuation analysis

RMSF is the average value of atomic position change over time, which can characterize the flexibility and motion intensity of protein amino acids in the whole simulation process. According to the RMSF extraction results (Fig. 9), it can be seen that the compound 9328 disturbs the protein residues to a greater extent, which is also consistent with the RMSD extraction results of the complex. The effects of other compounds on protein residue flexibility were similar to those of X77.

Fig. 9
figure 9

The RMSF of Mpro amino acid residues in the 100-ns MD

H_Bond number

Hydrogen bonds is crucial in maintaining the stability of the complex. The number of hydrogen bonds (Fig. 10) formed between compounds and Mpro during the whole simulation process was counted. The results showed that the number of H_Bonds formed between compounds 2308, 4368, 5606, 3083, and 1543 and Mpro was higher or equal to that between X77 and Mpro. The amount of hydrogen bond with Mpro for 3717 and 2490 increased from 30 and 60 ns, respectively, and that amount of hydrogen bond with Mpro for 783 and 1047 decreased from 30 and 50 ns respectively. The remaining compounds form fewer hydrogen bonds with Mpro.

Fig. 10
figure 10

H_Bond number between ligands and Mpro in the 100-ns MD

FEL (free energy landscape)

The free energy landscape is a graph that shows the change of the free energy of the complexes. The free energy topography map is drawn when the RMSD of the complex is stable, that is, when the molecule is stable, generally it has only one energy well, so as to find the characteristic structure of the complex. Therefore, we select the simulation track after 50 ns of each complex here and take Rg (Rg is the gyration radius of the composite, which can characterize the change of the loose degree of the composite structure in the simulation process) as the abscissa and RMSD as the ordinate to draw the free energy topography map (Fig. 11).

Fig. 11
figure 11

FEL of all compounds and corresponding characteristic conformation (red: initial conformation in simulation, cyan: characteristic low-energy conformation corresponding to time)

A rise from blue to red indicates that the energy is increasing, while the low-energy conformation tends to be stable. We extracted the low-energy conformations indicated in each free energy landscape map separately. The complexes of compounds 1047, 5606, and 9328 with Mpro suggest two low-energy conformations, while the other complexes have only one. However, all compounds are similar to X77 in that binding to Mpro results in a similar state of inhibition of the complex.

MM-GBSA

We also used the trajectory after 50 ns of simulation when calculating the ligand and Mpro’s binding free energy. The binding energies (Table 4) of 1503, 2308, and 5606 with Mpro are slightly lower than those of X77 with Mpro, indicating that the interaction between these three compounds and Mpro is slightly stronger than that between X77 and Mpro. The fact that the binding energies of 3717, 9000, and X77 are slightly higher than those of X77 suggests that these two compounds have a slightly weaker ability to bind to Mpro, while the other complexes’ binding free energies are reasonable. It is worth noting that the electrostatic interaction between 1047 and 481 and Mpro is positive because the docked small molecule has a negative charge, and the protein itself also has a negative charge, so it is in a mutually repelling state in the gas phase. It is also because these two compounds are negatively charged that their polar solvation energy is greater, so it shows a more negative value in ΔEGB. Compounds 2246 and 9000 are positively charged after docking, which can explain the energy decomposition term for these two compounds.

Table 4 Binding free energy of complex

Interaction between Mpro and its potential inhibitors

According to the MD analysis results and MM-GBSA calculation results, we found five potential inhibitors of Mpro, which are 1543, 2308, 3717, 5606, and 9000, respectively. Then, we analyzed the interaction between these five compounds and Mpro (Fig. 12). In the Mpro-1543 complex, Glu166 forms a hydrogen bond with the sulfur-oxygen bond of the compound, Gln192 forms a hydrogen bond with the hydroxyl group of 1543, and the catalytic residue His 41 forms a Π-Π stacking with one of the benzene rings of 1543, as shown in the figure below. In the Mpro-2308 complex, the amide group in 2308 plays an important role in forming hydrogen bonds with multiple amino acids of the main protease. In the Mpro-3717 complex, the sulfur-oxygen bond acts as a hydrogen bond acceptor, forming hydrogen bonds with Gly143, Ser144, His163, and His172 of the main protease, in addition to which the sulfur in the sulfur-oxygen bond forms a Π-sulfur interaction with the benzene ring in His172. In the Mpro-5606 complex, Π-alkyl interactions are more common. In the Mpro-9000 complex, the more specific interaction is the halogen bond, mainly due to the presence of fluorine in the compound, with Thr26 acting as an electron donor.

Fig. 12
figure 12

Interaction between Mpro and potential inhibitors

Discussion

In this study, we demonstrated the conservation of Mpro and preliminarily screened 18 compounds according to the structure of the known inhibitors of Mpro. Through further molecular dynamic simulation and MM-GBSA binding free energy calculation, we found five potential inhibitors of Mpro and analyzed the interaction between the potential inhibitors and Mpro. These five compounds have more diverse interactions with Mpro than previously discovered non-covalent inhibitors, due to the diversity of groups in the compounds themselves. Compared with other works of the same type, the same point is that we have carried out the whole process of virtual screening of SARS-CoV-2 Mpro, and the difference may be that the screening methods we choose are different. Some work uses classic QSAR screening, and other work involves the design of small molecules from scratch. In the MD simulation stage, the force field and the software selected in various works are also different. It must be admitted that GROMACS is still well suited to simulating biological systems. Due to limitations, we have not been able to verify the enzyme activation experiments of the screened potential inhibitors. But computational experiments still provide us with a subtle perspective to observe the characteristics of potential inhibitors.

Three compounds (1047, 3717, 5606) have a sulfur-oxygen bond and thus provide hydrogen bond acceptors and sulfur elements. Compound 9000 has a trifluoromethyl group and a cyano group, which is very similar to nirmatrelvir, but the cyano group seems to have only weak van der Waals forces with the main protease due to its position. And we found that the chemical structure of the positive control X77 occupies almost the entire active pocket of SARS-CoV-2 Mpro. The two sulfur-containing residues of Mpro, Met49, and Cys145 form Π-S bond with the aromatic ring of X77, and the N and O in X77 mainly act as hydrogen bond receptors. The potential inhibitors we identified, while not completely occupying the active pocket of the enzyme, provide interactions outside the active pocket. Overall, the structures of these compounds provide some ideas for drug design.

Materials and methods

Compounds and database

The database are from TargetMol and the compound files are from ChemDiv.

Sequence alignment

Download the nucleotide sequences of SARS-CoV-2 variants on NCBI. These variants include B.1.1.7 (α), B.1.351 (β), P.1 (γ), B.1.617.2 (δ), BA.1 (o), BA.1.1 (o), BA.2 (o), and BA.3 (o). Convert the nucleotide sequence of each variant strain into an amino acid sequence using the ExPASy translate tool. DNAMAN was used for amino acid sequence alignment.

Drug-like screening

In that aspect of analogue screen, we used the predicted physicochemical properties in the sdf file of compounds and referred to the partial analogue standard of ADMETlab 2.0, i. e. 100 < MW < 600, 0 < nHA < 12, 0 < nHD < 7, 0 < TPSA < 140, and 0 < logP < 3.

Preparation for docking and virtual screening

We chose chain A of 6lu7 as the receptor. Download 6LU7 PDB structure from the RCSB PDB website, pretreat it with PyMOL to remove water molecules, chain B and N3 ligands, and then use AutoDock Tools to hydrogenate the protein, calculate the charge, and set the atomic type. Convert the small molecule from L5610 and L5600 library of TargetMol into a 3D structure with Open Babel. AutoDock Tools’ prepare_ ligand4.py script was used to deal with small molecules. The detailed information about the grid box was shown as center (x =  − 10.351, y = 13.549, z = 69.904.) and size (x = 22.5, y = 28.5, z = 26.25).

When everything was ready, we validated the docking model using the processed receptors and ligands. We wrote a script for virtual screening using Qvina2.1 with energy_range = 3, num_modes = 9, and exhaustiveness = 24.

AncPhore screening based on Mpro inhibitor structures

We used AncPhore [27] to generate 10 conformations of 901 small molecules screened by Qvina2.1. At the same time, we downloaded several co-crystal structures of Mpro and its inhibitors from RCSB PDB, including 6lze, 6m0k, 6wtj, 6y2f, 7jyc, 7k0g, 7k0h, 7k40, 7lbn, 7ltj, 7lzv, 7m04, 7tdu, 7tfr, and 7tll. The PDB number corresponds to the inhibitor name attached to the supplementary materials (Table S2). We treated the complex with PyMOL to save the protein as pdb and the ligand as mol2, then generated individual pharmacophore models using AncPhore, and then combined these pharmacophore models to screen the small molecule database.

Molecular dynamic simulation and post-MD analysis

In the MD simulation phase, the force field used was Amber14SB + GAFF. RESP2 charge of small molecules was calculated using Multiwfn in combination with ORCA [26, 45,46,47]. GROMACS (5.1.2) was used to set up initial parameters within TIP3P water model dodecahedral box with an edge distance of 1 nm. The complexes were neutralized with 0.15 M Nacl. We then performed energy minimization utilizing the steepest plummet technique to restrict the energy to 100 kJ/mol/nm [48].

In the NVT phase, the temperature rises from 0 to 300 K in 100 ps and continues for 100 ps. Hydrogen atoms are confined by LINCS. Temperature coupling algorithm is V-rescale without pressure coupling. Cutoff scheme is Verlet, and periodic boundary conditions are used in all directions. Subsequently, we perform npt for 500 ps, the pressure control mode is Berendsen, and no initial velocity is generated [49]. In the formal MD stage, the time is set to 100 ns. And the voltage control mode is changed to Parrinello-Rahman, and the temperature control mode remains unchanged.

We extracted RMSD of protein, small molecule and complex, RMSF of amino acid of protein, a number of hydrogen bonds formed between small molecule and protein, rdf of 50 ns after simulation and radius of gyration of protein, combined Rg (deleting xyz direction information) and RMSD information of complex, and generated sham file by using sham command of gmx. The xpm2png.py script was used to generate that free energy topography, and the shamlog and bindex files were used to find the conformation information of the complex at the lowest energy point.

Binding free energy calculation

We installed gmx_MMPBSA [50] and generated an input file for calculating MMGBSA with a start time of 50 ns and an end time of 100 ns, sampling 5000 frames, sampling every 10 frames. All other settings are default.

In summary, the following is how the binding free energy is calculated:

$$\Delta {G}_{bind}=\left(\Delta {G}_{vdw}+\Delta {G}_{ele}\right)+\left(\Delta {G}_{solv,polar}+\Delta {G}_{solv,nonpolar}\right)$$

Data analysis

GraphPad Prism 9.0 is used to process the data, and PyMOL and Discovery Studio Visualizer are used to process the view.