Log in

Generation of focused drug molecule library using recurrent neural network

  • Original Paper
  • Published:
Journal of Molecular Modeling Aims and scope Submit manuscript

Abstract

Context

With the wide application of deep learning in drug research and development, de novo molecular design methods based on recurrent neural network (RNN) have strong advantages in drug molecule generation. The RNN model can be used to learn the internal chemical structure of molecules, which is similar to a natural language processing task. Although techniques for generating target-specific molecular libraries based on RNN models are mature, research related to drug design and screening continues around the clock. Research based on de novo drug design methods to generate larger quantities of valid compounds is necessary.

Methods

In this study, a molecular generation model based on RNN was designed, which abandoned the traditional way of stacked RNN and introduced the Nested long short-term memory network structure. To enrich the library of focused molecules for specific targets, we fine-tuned the model using active molecules from novel coronavirus pneumonia and screened the molecules using machine learning models. Following rigorous screening, the selected molecules underwent molecular docking with the SARS-CoV-2 M-pro receptor using AutoDock2.4 to identify the top 3 potential inhibitors. Subsequently, 100-ns molecular dynamics simulations were conducted using Amber22. Molecule parameterization involved the GAFF2 force field, while the proteins were modeled using the ff19SB force field, with solvation facilitated by a truncated octahedral TIP3P solvent environment. Upon completion of molecular dynamics simulations, stability of ligand-protein complexes was assessed by analysis of RMSD, H-bonds, and MM-GBSA. Reasonable results prove that the model can complete the task of de novo drug design and has the potential to be ideal drug molecules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

In this article, all data are available for download, for example, the pre-training dataset is available for download on https://www.ebi.ac.uk/chembl/ , and the N3 and 7BQY dataset are available for download on https://www.rcsb.org/ . All calculations in this study were performed in the TensorFlow-GPU 2.3.0 framework with Keras 2.8.0 and Python 3.7.13, which is freely available from: https://pypi.org/project/tensorflow/2.3.0/ and https://pypi.org/project/keras/2.8.0/ .

Code availability

The source code and training data used in the experiment are provided at: https://github.com/****1025/Drug_RNN .

References

  1. Gao KF, Nguyen DD, Tu MH, Wei GW (2020) Generative network complex for the automated generation of drug-like molecules. J Chem Inf Model 60:5682–5698. https://doi.org/10.1021/acs.jcim.0c00599

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Lipinski C, Hopkins A (2004) Navigating chemical space for biology and medicine. Nature 432:855–861. https://doi.org/10.1038/nature03193

    Article  CAS  PubMed  Google Scholar 

  3. Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics. Brief Bioinform 18:851–869. https://doi.org/10.1093/bib/bbw068

    Article  PubMed  Google Scholar 

  4. Veliz-Cuba A, Shouval HZ, Josic K, Kilpatrick ZP (2015) Networks that learn the precise timing of event sequences. J Comput Neurosci 39:235–254. https://doi.org/10.1007/s10827-015-0574-4

    Article  PubMed  Google Scholar 

  5. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. https://doi.org/10.48550/ar**v.1609.08144

  6. Subakan YC, Smaragdis P (2017) Ieee, Diagonal RNNS in symbolic music modeling. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE, New Paltz, NY, pp 354–358 https://ieeexplore.ieee.org/document/8170054

    Google Scholar 

  7. X. Liu, Deep recurrent neural network for protein function prediction from sequence, 2017. https://www.biorxiv.org/content/10.1101/103994v1

    Book  Google Scholar 

  8. Segler MHS, Kogej T, Tyrchan C, Waller MP (2018) Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent Sci 4:120–131. https://doi.org/10.1021/acscentsci.7b00512

    Article  CAS  PubMed  Google Scholar 

  9. Olivecrona M, Blaschke T, Engkvist O, Chen HM (2017) Molecular de-novo design through deep reinforcement learning. Aust J Chem 9:14 48. https://doi.org/10.1186/s13321-017-0235-x

    Article  Google Scholar 

  10. Popova M, Isayev O, Tropsha A (2018) Deep reinforcement learning for de novo drug design. Sci Adv 4:eaap7885. https://doi.org/10.1126/sciadv.aap7885

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Liu X, Ye K, Vlijmen HWTV, Ijzerman AP, Westen GJPV (2019) An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor. Aust J Chem 11. https://doi.org/10.1186/s13321-019-0355-6

  12. Santana MVS, Silva FP (2021) De novo design and bioactivity prediction of SARS-CoV-2 main protease inhibitors using recurrent neural network-based transfer learning. BMC Chem 15:20 8. https://doi.org/10.1186/s13065-021-00737-2

    Article  CAS  Google Scholar 

  13. Moniz J, Krueger D (2018) Nested LSTMs. https://doi.org/10.48550/ar**v.1801.10308

  14. Weininger D (1988) SMILES, a chemical language and information-system .1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36. https://doi.org/10.1021/ci00057a005

    Article  CAS  Google Scholar 

  15. Landrum G (2019) RDKit: Open-source cheminformatics from machine learning to chemical registration. Abstr Pap Am Chem Soc 258:1 https://rdkit.org/docs/index.html

    Google Scholar 

  16. Anand K, Ziebuhr J, Wadhwani P, Mesters JR, Hilgenfeld R Coronavirus main proteinase (3CLpro) structure: basis for design of anti-SARS drugs. Science 300. https://doi.org/10.1126/science.1085658

  17. Dai WH, Zhang B, Jiang XM, Su HX, Li JA, Zhao Y, **e X, ** ZM, Peng JJ, Liu FJ, Li CP, Li Y, Bai F, Wang HF, Cheng X, Cen XB, Hu SL, Yang XN, Wang J et al (2020) Structure-based design of antiviral drug candidates targeting the SARS-CoV-2 main protease. Science 368:1331. https://doi.org/10.1126/science.abb4489

    Article  CAS  PubMed  Google Scholar 

  18. Yang HT, **e WQ, Xue XY, Yang KL, Ma J, Liang WX, Zhao Q, Zhou Z, Pei DQ, Ziebuhr J, Hilgenfeld R, Yuen KY, Wong L, Gao GX, Chen SJ, Chen Z, Ma DW, Bartlam M, Rao Z (2005) Design of wide-spectrum inhibitors targeting coronavirus main proteases, PLoS. Biol. 3:1742–1752e324. https://doi.org/10.1371/journal.pbio.0030324

    Article  CAS  Google Scholar 

  19. Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:D1100–D1107. https://doi.org/10.1093/nar/gkr777

    Article  CAS  PubMed  Google Scholar 

  20. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han LY, He JE, He SQ, Shoemaker BA, Wang JY, Yu B, Zhang J, Bryant SH (2016) PubChem Substance and Compound databases. Nucleic Acids Res 44:D1202–D1213. https://doi.org/10.1093/nar/gkv951

    Article  CAS  PubMed  Google Scholar 

  21. Tang BW, He FM, Liu DP, He F, Wu T, Fang MJ, Niu ZM, Wu Z, Xu D (2022) AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2. Biomolecules 12:746. https://doi.org/10.3390/biom12060746

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Yang XF, Zhang JZ, Yoshizoe K, Terayama K, Tsuda K (2017) ChemTS: an efficient python library for de novo molecular generation. Sci Technol Adv Mater 18:972–976. https://doi.org/10.1080/14686996.2017.1401424

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Ertl P, Lewis R, Martin E, Polyakov V (2017) In silico generation of novel, drug-like chemical matter using the LSTM neural network. https://doi.org/10.48550/ar**v.1712.07449

  24. Amilpur S, Bhukya R (2022) Predicting novel drug candidates against Covid-19 using generative deep neural networks. J Mol Graph 110:11 108045. https://doi.org/10.1016/j.jmgm.2021.108045

    Article  CAS  Google Scholar 

  25. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. Comp Sci. https://doi.org/10.48550/ar**v.1301.3781

  26. Gupta A, Muller AT, Huisman BJH, Fuchs JA, Schneider P, Schneider G (2018) Generative recurrent networks for de novo drug design. Mol Inf 37:9 1700111. https://doi.org/10.1002/minf.201700111

    Article  CAS  Google Scholar 

  27. Pan SJ, Yang QA (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22:1345–1359. https://doi.org/10.1109/tkde.2009.191

    Article  Google Scholar 

  28. Lokhande KB, Nagar S, Swamy KV (2019) Molecular interaction studies of Deguelin and its derivatives with Cyclin D1 and Cyclin E in cancer cell signaling pathway: the computational approach. Sci Rep 9:13 1778. https://doi.org/10.1038/s41598-018-38332-6

    Article  CAS  Google Scholar 

  29. Govindarasu M, Ganeshan S, Ansari MA, Alomary MN, AlYahya S, Alghamdi S, Almehmadi M, Rajakumar G, Thiruvengadam M, Vaiyapuri M (2021) In silico modeling and molecular docking insights of kaempferitrin for colon cancer-related molecular targets. J Saudi Chem Soc 25:21 101319. https://doi.org/10.1016/j.jscs.2021.101319

    Article  CAS  Google Scholar 

  30. Salomon-Ferrer R, Case DA, Walker RC (2013) An overview of the Amber biomolecular simulation package. WIREs Comput Mol Sci 3:198–210. https://doi.org/10.1002/wcms.1121

    Article  CAS  Google Scholar 

  31. Brown N, Fiscato M, Segler MH et al (2019) GuacaMol: benchmarking models for de novo molecular design. J Chem Inf Model 59:1096–1108. https://doi.org/10.1021/acs.jcim.8b00839

    Article  CAS  PubMed  Google Scholar 

  32. Bickerton GR, Paolini GV, Besnard J, Muresan S, Hopkins AL (2012) Quantifying the chemical beauty of drugs. Nat Chem 4:90–98. https://doi.org/10.1038/nchem.1243

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ertl P, Schuffenhauer A (2009) Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J Cheminformatics 1:11 8. https://doi.org/10.1186/1758-2946-1-8

    Article  CAS  Google Scholar 

  34. Bradley AP (1997) The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recogn 30:1145–1159. https://doi.org/10.1016/s0031-3203(96)00142-2

    Article  Google Scholar 

  35. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2012) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 64:4–17. https://doi.org/10.1016/j.addr.2012.09.019

    Article  Google Scholar 

  36. ** Z, Du X, Xu Y, Deng Y, Liu M, Zhao Y, Zhang B, Li X, Zhang L, Peng C, Duan Y, Yu J, Wang L, Yang K, Liu F, Jiang R, Yang X, You T, Liu X et al (2020) Structure of M(pro) from SARS-CoV-2 and discovery of its inhibitors. Nature 582:289–293. https://doi.org/10.1038/s41586-020-2223-y

    Article  CAS  PubMed  Google Scholar 

  37. Chong CR, Sullivan Jr DJ (2007) New uses for old drugs. Nature 448:645–646. https://doi.org/10.1038/448645a

    Article  CAS  PubMed  Google Scholar 

  38. Hwang CL, Yoon KP (1981) Multiple attribute decision making. Methods Appl A state-of- the-art Surv. https://doi.org/10.1007/978-3-642-48318-9

  39. Li X, Wei S, Niu S, Ma X, Li H, **g M, Zhao Y (2022) Network pharmacology prediction and molecular docking-based strategy to explore the potential mechanism of Huanglian Jiedu Decoction against sepsis. Comput Biol Med 144:105389. https://doi.org/10.1016/j.compbiomed.2022.105389

    Article  CAS  PubMed  Google Scholar 

  40. Wong F, Krishnan A, Zheng EJ, Stärk H, Manson AL, Earl AM, Jaakkola T, Collins JJ (2022) Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol Syst Biol 18:e11081. https://doi.org/10.15252/msb.202211081

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Blaschke T, Bajorath J (2022) Fine-tuning of a generative neural network for designing multi-target compounds. J Comput Aided Mol Des 36:363–371. https://doi.org/10.1007/s10822-021-00392-8

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China (12261060) and the Jiangxi province graduate student innovation special funds (YC2021-S029).

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception, design, and analysis. Model design and construction, material preparation, and data collection were performed by **** Zou. Long Zhao completed molecular docking and molecular dynamics simulation. The first draft of the manuscript was written by **** Zou. Shao** Shi commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Shao** Shi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(PDF 1143 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, J., Zhao, L. & Shi, S. Generation of focused drug molecule library using recurrent neural network. J Mol Model 29, 361 (2023). https://doi.org/10.1007/s00894-023-05772-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00894-023-05772-5

Keywords

Navigation