Abstract
Context
With the wide application of deep learning in drug research and development, de novo molecular design methods based on recurrent neural network (RNN) have strong advantages in drug molecule generation. The RNN model can be used to learn the internal chemical structure of molecules, which is similar to a natural language processing task. Although techniques for generating target-specific molecular libraries based on RNN models are mature, research related to drug design and screening continues around the clock. Research based on de novo drug design methods to generate larger quantities of valid compounds is necessary.
Methods
In this study, a molecular generation model based on RNN was designed, which abandoned the traditional way of stacked RNN and introduced the Nested long short-term memory network structure. To enrich the library of focused molecules for specific targets, we fine-tuned the model using active molecules from novel coronavirus pneumonia and screened the molecules using machine learning models. Following rigorous screening, the selected molecules underwent molecular docking with the SARS-CoV-2 M-pro receptor using AutoDock2.4 to identify the top 3 potential inhibitors. Subsequently, 100-ns molecular dynamics simulations were conducted using Amber22. Molecule parameterization involved the GAFF2 force field, while the proteins were modeled using the ff19SB force field, with solvation facilitated by a truncated octahedral TIP3P solvent environment. Upon completion of molecular dynamics simulations, stability of ligand-protein complexes was assessed by analysis of RMSD, H-bonds, and MM-GBSA. Reasonable results prove that the model can complete the task of de novo drug design and has the potential to be ideal drug molecules.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00894-023-05772-5/MediaObjects/894_2023_5772_Fig9_HTML.png)
Similar content being viewed by others
Data availability
In this article, all data are available for download, for example, the pre-training dataset is available for download on https://www.ebi.ac.uk/chembl/ , and the N3 and 7BQY dataset are available for download on https://www.rcsb.org/ . All calculations in this study were performed in the TensorFlow-GPU 2.3.0 framework with Keras 2.8.0 and Python 3.7.13, which is freely available from: https://pypi.org/project/tensorflow/2.3.0/ and https://pypi.org/project/keras/2.8.0/ .
Code availability
The source code and training data used in the experiment are provided at: https://github.com/****1025/Drug_RNN .
References
Gao KF, Nguyen DD, Tu MH, Wei GW (2020) Generative network complex for the automated generation of drug-like molecules. J Chem Inf Model 60:5682–5698. https://doi.org/10.1021/acs.jcim.0c00599
Lipinski C, Hopkins A (2004) Navigating chemical space for biology and medicine. Nature 432:855–861. https://doi.org/10.1038/nature03193
Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18:851–869. https://doi.org/10.1093/bib/bbw068
Veliz-Cuba A, Shouval HZ, Josic K, Kilpatrick ZP (2015) Networks that learn the precise timing of event sequences. J Comput Neurosci 39:235–254. https://doi.org/10.1007/s10827-015-0574-4
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. https://doi.org/10.48550/ar**v.1609.08144
Subakan YC, Smaragdis P (2017) Ieee, Diagonal RNNS in symbolic music modeling. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, New Paltz, NY, pp 354–358 https://ieeexplore.ieee.org/document/8170054
X. Liu, Deep recurrent neural network for protein function prediction from sequence, 2017. https://www.biorxiv.org/content/10.1101/103994v1
Segler MHS, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4:120–131. https://doi.org/10.1021/acscentsci.7b00512
Olivecrona M, Blaschke T, Engkvist O, Chen HM (2017) Molecular de-novo design through deep reinforcement learning. Aust J Chem 9:14 48. https://doi.org/10.1186/s13321-017-0235-x
Popova M, Isayev O, Tropsha A (2018) Deep reinforcement learning for de novo drug design. Sci Adv 4:eaap7885. https://doi.org/10.1126/sciadv.aap7885
Liu X, Ye K, Vlijmen HWTV, Ijzerman AP, Westen GJPV (2019) An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor. Aust J Chem 11. https://doi.org/10.1186/s13321-019-0355-6
Santana MVS, Silva FP (2021) De novo design and bioactivity prediction of SARS-CoV-2 main protease inhibitors using recurrent neural network-based transfer learning. BMC Chem 15:20 8. https://doi.org/10.1186/s13065-021-00737-2
Moniz J, Krueger D (2018) Nested LSTMs. https://doi.org/10.48550/ar**v.1801.10308
Weininger D (1988) SMILES, a chemical language and information-system .1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36. https://doi.org/10.1021/ci00057a005
Landrum G (2019) RDKit: Open-source cheminformatics from machine learning to chemical registration. Abstr Pap Am Chem Soc 258:1 https://rdkit.org/docs/index.html
Anand K, Ziebuhr J, Wadhwani P, Mesters JR, Hilgenfeld R Coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs. Science 300. https://doi.org/10.1126/science.1085658
Dai WH, Zhang B, Jiang XM, Su HX, Li JA, Zhao Y, **e X, ** ZM, Peng JJ, Liu FJ, Li CP, Li Y, Bai F, Wang HF, Cheng X, Cen XB, Hu SL, Yang XN, Wang J et al (2020) Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease. Science 368:1331. https://doi.org/10.1126/science.abb4489
Yang HT, **e WQ, Xue XY, Yang KL, Ma J, Liang WX, Zhao Q, Zhou Z, Pei DQ, Ziebuhr J, Hilgenfeld R, Yuen KY, Wong L, Gao GX, Chen SJ, Chen Z, Ma DW, Bartlam M, Rao Z (2005) Design of wide-spectrum inhibitors targeting coronavirus main proteases, PLoS. Biol. 3:1742–1752e324. https://doi.org/10.1371/journal.pbio.0030324
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107. https://doi.org/10.1093/nar/gkr777
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han LY, He JE, He SQ, Shoemaker BA, Wang JY, Yu B, Zhang J, Bryant SH (2016) PubChem Substance and Compound databases. Nucleic Acids Res 44:D1202–D1213. https://doi.org/10.1093/nar/gkv951
Tang BW, He FM, Liu DP, He F, Wu T, Fang MJ, Niu ZM, Wu Z, Xu D (2022) AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2. Biomolecules 12:746. https://doi.org/10.3390/biom12060746
Yang XF, Zhang JZ, Yoshizoe K, Terayama K, Tsuda K (2017) ChemTS: an efficient python library for de novo molecular generation. Sci Technol Adv Mater 18:972–976. https://doi.org/10.1080/14686996.2017.1401424
Ertl P, Lewis R, Martin E, Polyakov V (2017) In silico generation of novel, drug-like chemical matter using the LSTM neural network. https://doi.org/10.48550/ar**v.1712.07449
Amilpur S, Bhukya R (2022) Predicting novel drug candidates against Covid-19 using generative deep neural networks. J Mol Graph 110:11 108045. https://doi.org/10.1016/j.jmgm.2021.108045
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Comp Sci. https://doi.org/10.48550/ar**v.1301.3781
Gupta A, Muller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G (2018) Generative recurrent networks for de novo drug design. Mol Inf 37:9 1700111. https://doi.org/10.1002/minf.201700111
Pan SJ, Yang QA (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359. https://doi.org/10.1109/tkde.2009.191
Lokhande KB, Nagar S, Swamy KV (2019) Molecular interaction studies of Deguelin and its derivatives with Cyclin D1 and Cyclin E in cancer cell signaling pathway: the computational approach. Sci Rep 9:13 1778. https://doi.org/10.1038/s41598-018-38332-6
Govindarasu M, Ganeshan S, Ansari MA, Alomary MN, AlYahya S, Alghamdi S, Almehmadi M, Rajakumar G, Thiruvengadam M, Vaiyapuri M (2021) In silico modeling and molecular docking insights of kaempferitrin for colon cancer-related molecular targets. J Saudi Chem Soc 25:21 101319. https://doi.org/10.1016/j.jscs.2021.101319
Salomon-Ferrer R, Case DA, Walker RC (2013) An overview of the Amber biomolecular simulation package. WIREs Comput Mol Sci 3:198–210. https://doi.org/10.1002/wcms.1121
Brown N, Fiscato M, Segler MH et al (2019) GuacaMol: benchmarking models for de novo molecular design. J Chem Inf Model 59:1096–1108. https://doi.org/10.1021/acs.jcim.8b00839
Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL (2012) Quantifying the chemical beauty of drugs. Nat Chem 4:90–98. https://doi.org/10.1038/nchem.1243
Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminformatics 1:11 8. https://doi.org/10.1186/1758-2946-1-8
Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30:1145–1159. https://doi.org/10.1016/s0031-3203(96)00142-2
Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2012) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 64:4–17. https://doi.org/10.1016/j.addr.2012.09.019
** Z, Du X, Xu Y, Deng Y, Liu M, Zhao Y, Zhang B, Li X, Zhang L, Peng C, Duan Y, Yu J, Wang L, Yang K, Liu F, Jiang R, Yang X, You T, Liu X et al (2020) Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors. Nature 582:289–293. https://doi.org/10.1038/s41586-020-2223-y
Chong CR, Sullivan Jr DJ (2007) New uses for old drugs. Nature 448:645–646. https://doi.org/10.1038/448645a
Hwang CL, Yoon KP (1981) Multiple attribute decision making. Methods Appl A state-of- the-art Surv. https://doi.org/10.1007/978-3-642-48318-9
Li X, Wei S, Niu S, Ma X, Li H, **g M, Zhao Y (2022) Network pharmacology prediction and molecular docking-based strategy to explore the potential mechanism of Huanglian Jiedu Decoction against sepsis. Comput Biol Med 144:105389. https://doi.org/10.1016/j.compbiomed.2022.105389
Wong F, Krishnan A, Zheng EJ, Stärk H, Manson AL, Earl AM, Jaakkola T, Collins JJ (2022) Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol Syst Biol 18:e11081. https://doi.org/10.15252/msb.202211081
Blaschke T, Bajorath J (2022) Fine-tuning of a generative neural network for designing multi-target compounds. J Comput Aided Mol Des 36:363–371. https://doi.org/10.1007/s10822-021-00392-8
Funding
This work was supported by the National Natural Science Foundation of China (12261060) and the Jiangxi province graduate student innovation special funds (YC2021-S029).
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception, design, and analysis. Model design and construction, material preparation, and data collection were performed by **** Zou. Long Zhao completed molecular docking and molecular dynamics simulation. The first draft of the manuscript was written by **** Zou. Shao** Shi commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
ESM 1
(PDF 1143 kb)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zou, J., Zhao, L. & Shi, S. Generation of focused drug molecule library using recurrent neural network. J Mol Model 29, 361 (2023). https://doi.org/10.1007/s00894-023-05772-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00894-023-05772-5