1 INTRODUCTION

Currently, thrombotic and hemorrhagic complications caused by disorders in the blood coagulation system are found in oncology, hematology, immunology and other fields of medicine, causing mortality and disability [1]. Therefore, the development of new anticoagulants capable of preventing pathological conditions without disrupting normal hemostasis is an urgent task today.

The blood coagulation system is a cascade of biochemical reactions that sequentially activate blood coagulation factors (proteins), that ultimately leads to the conversion of water-soluble fibrinogen (factor I) into insoluble fibrin (factor Ia), and the formation of a clot. Low molecular weight inhibitors of thrombin (factor IIa) and factor Xa are currently widely used as anticoagulants, but they have a number of disadvantages, among which bleeding is the most dangerous. Blood coagulation factors XIIa or XIa are among the most promising therapeutic targets for creating new generation of anticoagulants since a deficiency of these factors protects against thrombosis without causing spontaneous bleeding [1–4].

The activation of the cascade of biochemical reactions is possible in two pathways: external (tissue factor pathway) and internal (contact) pathways [1]. The position of coagulation factor XIIa in the cascade of biochemical reactions makes it possible to create a new class of anticoagulants based on factor XIIa inhibitors that can block the internal pathway of coagulation activation without disturbing the chain of reactions of plasma hemostasis activated by the external pathway [1–4]. With factor XII deficiency, laboratory animals including mice, rabbits, and even primates, are either completely protected from the development of induced thrombosis, or suffer significantly less from its consequences.

At the initial stage of the search for inhibitors of a given target protein, molecular docking is widely used [5, 6]. The use of docking is possible if the three-dimensional structure of the target protein is known. However, until recently, there was no three-dimensional structure of factor XIIa. In a recent study [7], the crystal structures of factor XIIa are determined in complexes with two inhibitors: a non-covalent benzamidine inhibitor and a covalent synthetic inhibitor. Based on these structures, a three-dimensional model of factor XIIa can be created and used to search for its inhibitors using molecular modeling.

Currently, there are various inhibitors of the coagulation factor XIIa tested \(\mathit{invitro}\) and \(\mathit{invivo}\) experiments, including monoclonal antibodie [8, 9], natural peptide or protein inhibitors [10], a synthetic peptide macrocycle inhibitor [11], RNA aptamer [12], small interfering RNA [13] and antisense oligonucleotide [14]. Various experimentally confirmed inhibitors of factor XIIa, including low molecular weight inhibitors, are presented in the review [15]. Five of most active low molecular weight inhibitors from [15] are shown in Fig. 1.

Fig. 1
figure 1

Structures and activities of several inhibitors of factor XIIa from [15].

Based on 3,6-disubstituted coumarins, selective but weak inhibitors of factor XIIa are found using scaffold-based discovery [16, 17]. A study [18] identified seventeen low molecular weight weak factor XIIa inhibitors with IC\({}_{50}\) values below 50 \(\mu\)M using virtual high-throughput screening, where IC\({}_{50}\) is the inhibitor concentration, which suppresses the activity of the target protein, in our case factor XIIa, by 50\(\%\). To date, none of the existing factor XIIa inhibitors has entered the stage of clinical trials in humans. Thus, the development of factor XIIa low molecular weight inhibitors is at an early stage, and the task of creating anticoagulants using such inhibitor is extremely urgent, especially if we take into account the important role of anticoagulants in the treatment of COVID-19.

In this study, the virtual screening of a database of about 19000 drug-like low molecular weight organic compounds using docking was carried out to search for factor XIIa non-covalent inhibitors. Several hundred molecules with best docking scores estimating the protein-ligand binding free energy are subjected to the quantum-chemical calculations of the enthalpy of protein-ligand binding. Sixteen best candidates to become inhibitors of factor XIIa are selected for the in vitro experimental testing.

2 MODEL DESCRIPTION

The reliability of virtual screening in the search for inhibitors of a given target protein using docking strongly depends on the quality of three-dimensional models of the target protein and ligands, the quality of docking and post-processing programs.

2.1 Protein Model

To create a model for coagulation factor XIIa, we used experimental information from the Protein Data Bank (PDB) [19]. PDB contains 3D structures of crystallized proteins and their complexes with ligands. These 3D structures contain information about the Cartesian coordinates of all heavy atoms (not hydrogen atoms) of proteins and their complexes. At the time of this study, five structures of factor XIIa are found in PDB. A number of characteristics determine quality of the structures. The main ones are: resolution, which should be better than 2.5 Å, and the absence of missing atoms and amino acid residues. When searching for non-covalent inhibitors, we limit ourselves to using only complexes of factor XIIa with non-covalent inhibitors. There are only five structures of factor XIIa in PDB, and only two of them with PDB ID 6B74 and 6B77 are crystallized with inhibitors. However, only the 6B74 structure contains a non-covalent inhibitor [7], and this PDB complex is used for preparing the 3D target protein model for docking.

The 6B74 complex contains 2 chains: A and B, corresponding to a heavy chain and a catalytic domain of factor XIIa. The catalytic triad consists of three amino acid residues: \(His57\), \(Asp102\), and \(Ser195\). The active site of this protein is represented by three main binding pockets—S1, S1\({}^{\prime}\) and S3 (see Fig. 2).

Fig. 2
figure 2

The active site of coagulation factor XIIa (PDB ID: 6B74) with the known inhibitor RF1 (see Fig. 1). The geometry is shown prior to the formation of a covalent bond between a benzyl chloride group and \(Ser195\). The protein active site is represented by the solvent accessible surface. The position of the inhibitor in the protein is obtained as a result of docking using the SOL program.

To prepare the target protein model for docking, it is necessary to clear the PDB structure from all external elements, such as salts, ions, ligands that do not belong to the protein. Hydrogen atoms are added to the protein structure at pH = 7.4 using the Aplite program [20, 21]. Each atom is assigned a certain type in accordance with the MMFF94 force field [22]. The force field is a set of classical potentials describing intra- and inter-molecular interactions. The coordinates of the geometric center of the crystallized native ligand determine the center of the docking cube. The docking cube defines the spatial region in which the ligand can be placed during docking.

To validate the created model of the target protein, docking of the native ligand co-crystallized with the target protein is carried out. The best native ligand position found using docking is compared with the crystallized position of this ligand. Comparing is made by calculating the root-mean-square deviation (RMSD) between these two positions of the native ligand. Native docking is considered successful if RMSD \(<\) 2 Å.

2.2 Preparation of Ligands and Database

For virtual screening, a database of low molecular weight drug-like compounds of Voronezh State University (VSU) [23], consisting of more than 19 thousand drug-like molecules, is used. In this database, ligands are stored in 2D format. The LigPrep module [24] is used to protonate these molecules at pH = 7.4 and generate their low-energy 3D conformers. As a result, more than 30000 different conformers are prepared and used for virtual screening.

2.3 SOL Docking Program

This docking program is described in details in several publications [20, 25, 26], and we present here only its main features. SOL finds the position of the ligand in the active site of the target protein based on the docking paradigm [6]. This paradigm assumes that the ligand position corresponds to the global minimum of the energy of the protein-ligand complex. The search for the global energy minimum is carried out using the genetic algorithm of the global optimization which is as follows. An initial population of individuals is randomly generated to represent the various possible positions of the ligand in the docking region. The evolution of this population through a given number of generations is considered. The main parameters of the genetic algorithm are: the population size—30000, and the number of generations—1000. In the transition from generation to generation, the size of the population of individuals is preserved. In the process of evolution, the strongest individuals survive. An individual is stronger than another if the target energy function is more negative. Individuals in each subsequent generation are created on the basis of the direct transfer of the ‘‘strongest’’ individuals and through crossingover and mutations. Each ligand position is encoded by a chromosome consisting of genes (dimensionless numbers between 0 and 1), each of which describes one of the ligand’s degrees of freedom. Each chromosome corresponds to an individual position of the ligand in the protein. The chromosome representation is convenient for the crossingover and mutation operations.

The target function of the global optimization is the sum of the energy of the ligand in the field of the protein and the ligand strain energy calculated using the MMFF94 force field. To accelerate the process of global optimization, the interaction energies of the ligand atoms with the protein are preliminary calculated and stored in the nodes of 3D grid using the SOLGRID module. The grid is created in the docking cube covering the whole active site of the protein. All possible positions of the ligand must be inside the cube. During the global optimization, the energy of ligand in the field of the protein is calculated using the grid potentials: the energy of ligand is calculated as a sum of energies of all ligand atoms. If a ligand atom is placed between the nodes, its energy is interpolated using eight nearest nodes. The solution to the optimization problem is the position of the ligand corresponding to the lowest target energy function.

To increase the reliability of the global energy optimization, the genetic algorithm is run by several independent runs. After each run, the position of the ligand corresponding to the lowest target function is found. By default, the number of independent runs is 50, and we have 50 independently obtained solutions of the global optimization problem. These 50 solutions are grouped into clusters using a criterion RMSD \(<\) 1 Å, where RMSD is the root mean-square deviation between two different ligand poses calculated over all ligand atoms. An indicator of reliable finding of the solution of the global optimization problem is a relatively high population of the cluster containing a ligand pose with the lowest value of the target energy function. Docking is considered successful if the population of the first cluster was at least 10. To rank the different ligands, the SOL scoring function is used which consists of two terms: the energy of the ligand in the field of the protein and the entropic contribution proportional to the number of internal rotational degrees of freedom of the ligand.

The docking of the native ligand 6B74 is successful, since the RMSD from the crystallized native ligand is \({<}\)2 Å, and the clustering analysis of docking solutions showed a high population of the first cluster—49 poses, which indicates a high probability of finding the global minimum of the target energy function, since with 50 independent launches of global optimization algorithm, 49 cases resulted in the ligand pose with the best energy, and all these poses differ from each other by only RMSD \(<\) 1 Å. The SOL scoring function of the native ligand is –5.19 kcal/mol. The prepared model of factor XIIa is chosen for further search for inhibitors.

Virtual screening of the database with the SOL program is carried out on the Lomonosov-2 super-computer [27] of Lomonosov Moscow State University. High throughput screening used thousands of computational cores to deploy thousands of docking tasks with one core per ligand. Depending on the size and number of ligand torsions, docking of one ligand using one computation core takes from one to several hours.

2.4 Protein-ligand Binding Enthalpy

In this work, a quantum-chemical method is used to calculate the enthalpy of protein-ligand binding at the second stage of virtual screening for compounds with best values of the SOL scoring function. The enthalpy of formation of a protein-ligand complex is calculated using the PM7 semiempirical quantum-chemical method [28], which allows calculations at the level of accuracy of density functional theory (DFT) methods [29, 30] with a correct description of the formation of hydrogen and halogen bonds, as well as dispersion interactions. To describe the polar interactions of molecules with water in quantum-chemical calculations, the COSMO implicit solvent model [31] is used. In this solvent model, water is replaced by a metal or dielectric with infinite permittivity, and then a correction factor is introduced that takes into account the finite value of the permittivity of water. The PM7 and COSMO methods are implemented in the MOPAC program [32]. The program also includes the MOZYME module, which allows fast quantum-chemical calculations of proteins and protein-ligand complexes using the method of localized molecular orbitals. The binding enthalpy is calculated according to the equation

$$\triangle H_{\textrm{bind}}=\triangle H_{\textrm{PL}}-(\triangle H_{\textrm{P}}+\triangle H_{\textrm{L}}),$$

where \(\triangle H_{\textrm{PL}}\) is the enthalpy of formation of the protein-ligand complex, \(\triangle H_{\textrm{L}}\) is the enthalpy of formation of the unbound ligand, \(\triangle H_{\textrm{P}}\) is the enthalpy of formation of unbound protein. These enthalpies of formation are calculated using MOPAC as follows.

The enthalpy of formation of the unbound protein is calculated using the PM7 method and the COSMO solvent model without geometry optimization for the same protein conformation as used for docking.

The enthalpy of formation of the unbound ligand \(\triangle H_{\textrm{L}}\) is calculated as follows. For the unbound ligand, low energy conformers are generated using the LigPrep module. Then, local optimization of the energy of each conformer is carried out by the PM7 method without taking into account the solvent, and the energy of the minimum found for each conformer is recalculated using PM7 and the COSMO solvent model. When calculating the binding enthalpy, the most negative value of the PM7+COSMO enthalpy of formation among all these conformers is used.

To calculate \(\triangle H_{\textrm{PL}}\), the energy of the protein-ligand complex is locally optimized from the initial position of the ligand found during docking. Optimization is carried out using PM7 without solvent by varying positions of all ligand atoms at fixed protein atoms. The enthalpy of formation of the complex is then recalculated for the found minimum using PM7 and the COSMO solvent model. The obtained value of the enthalpy of formation is used as the value of \(\triangle H_{\textrm{PL}}\) when calculating the enthalpy of protein-ligand binding.

3 RESULTS AND DISCUSSION

At the first stage of virtual screening of the VSU database, more than 30000 different conformers are docked into the active site of the prepared model of the coagulation factor XIIa using the SOL docking program. Two criteria are used to select the best ligands for the next postprocessing stage: the SOL score should be more negative than \(-\)6.00 kcal/mol, and the population of the first cluster should be greater than or equal to 10. A total of 423 ligands met these two criteria, and the protein-ligand binding enthalpy was calculated for all of them. Using the criterium \(\triangle H_{\textrm{bind}}\) < \(-\)53 kcal/mol, we selected 41 best compounds for experimental testing. These 41 compounds are then grouped into 25 clusters according to their chemical similarity using the DataWarrior software [33], choosing a similarity threshold of 0.85. One or two representative ligands are selected from each cluster, resulting in 28 ligands being selected for further interaction analysis, some of which are then discarded due to lack of specific interactions. Finally, for further experimental \(\mathit{invitro}\) testing of their ability to inhibit factor XIIa, 16 ligands are selected, the calculated characteristics and chemical structures of which are presented in Table 1 and Fig. 3, respectively.

Table 1 Calculated SOL score and protein-ligand binding enthalpy \(\triangle H_{\textrm{bind}}\) for the best ligands selected for experimental testing. Ligand name is an ID of the compound in the VSU database
Fig. 3
figure 3

Structures of 16 best compounds selected for experimental testing of the factor XIIa inhibition.

We see that all these 16 ligands belong to 14 different chemical classes. In terms of novelty, only one compound, 224836, contains scaffold that is similar to a central core of confirmed pyrrolo-dihydroquinoline-based factor XIIa inhibitors with micromolar activity and published in [34]. The chemical classes of the rest compounds are novel and different from chemical classes of published experimentally confirmed inhibitors of factor XIIa [34-36]. As an example, specific interactions of the ligand 2572 with the active site of factor XIIa are in Fig. 4.

Fig. 4
figure 4

The specific interaction of the ligand with ID \(=\) 2572 with the amino acid residues in the active site of factor XIIa are presented schematically: dotted arrows show hydrogen bonds, solid arrows show halogen bonds, and \(\pi\)-stacking is shown by a solid line ending with solid dots.

We see in Fig. 4 several specific interactions of the ligand 2572 with the amino acid residues of the factor XIIa active site: 2 hydrogen bonds with \(Tyr99\) and \(Gly193\), several halogen bonds with \(Asp189\), \(Gly193\), \(Asp194\) and \(Ser195\), and also \(\pi\)-stacking with \(His57\).

4 CONCLUSIONS

In this study, a virtual structure-based screening of the whole drug like compounds database of Voronezh State University was carried out to identify inhibitors of blood coagulation factor XIIa. Docking followed by quantum-chemical calculations of the protein-ligand binding enthalpy are used for the ligand selection, as well as presence of specific interactions and chemical diversity are taken into account. As a result, 16 best compounds are selected for further experimental in vitro testing. The selected compounds belong to various chemical classes and only one class, pyrrolo-dihydroquinoline, has been published as a core of experimentally confirmed factor XIIa inhibitors.