Automatic Quantitative Structure–Activity Relationship Modeling to Fill Data Gaps in High-Throughput Screening

  • Protocol
  • First Online:
High-Throughput Screening Assays in Toxicology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2474))

Abstract

Advances in high-throughput screening (HTS) revolutionized the environmental and health sciences data landscape. However, new compounds still need to be experimentally synthesized and tested to obtain HTS data, which will still be costly and time-consuming when a large set of new compounds need to be studied against many tests. Quantitative structureactivity relationship (QSAR) modeling is a standard method to fill data gaps for new compounds. The major challenge for many toxicologists, especially those with limited computational backgrounds, is efficiently develo** optimized QSAR models for each assay with missing data for certain test compounds. This chapter aims to introduce a freely available and user-friendly QSAR modeling workflow, which trains and optimizes models using five algorithms without the need for a programming background.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ciallella HL, Zhu H (2019) Advancing computational toxicology in the big data era by artificial intelligence: data-driven and mechanism-driven modeling for chemical toxicity. Chem Res Toxicol 32:536–547. https://doi.org/10.1021/acs.chemrestox.8b00393

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Zhao L, Ciallella HL, Aleksunes LM, Zhu H (2020) Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 25:1624–1638. https://doi.org/10.1016/j.drudis.2020.07.005

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Wang Y, Bolton E, Dracheva S et al (2010) An overview of the PubChem BioAssay resource. Nucleic Acids Res 38:D255–D266. https://doi.org/10.1093/nar/gkp965

    Article  CAS  PubMed  Google Scholar 

  4. Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:1100–1107. https://doi.org/10.1093/nar/gkr777

    Article  CAS  Google Scholar 

  5. Zhu H (2020) Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol Toxicol 60:1–17. https://doi.org/10.1146/annurev-pharmtox-010919-023324

    Article  CAS  Google Scholar 

  6. Jia X, Ciallella HL, Russo DP et al (2021) Construction of a virtual opioid bioprofile: a data-driven QSAR modeling study to identify new analgesic opioids. ACS Sustain Chem Eng 9(10):3909–3919. https://doi.org/10.1021/acssuschemeng.0c09139

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Ciallella HL, Russo DP, Aleksunes LM et al (2020) Predictive modeling of estrogen receptor agonism, antagonism, and binding activities using machine- and deep-learning approaches. Lab Investig 101:490–502. https://doi.org/10.1038/s41374-020-00477-2

    Article  CAS  PubMed  Google Scholar 

  8. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. https://doi.org/10.1021/ci100050t

    Article  CAS  PubMed  Google Scholar 

  9. Huang R, Sakamuru S, Martin MT et al (2014) Profiling of the Tox21 10K compound library for agonists and antagonists of the estrogen receptor alpha signaling pathway. Sci Rep 4:1–9. https://doi.org/10.1038/srep05664

    Article  CAS  Google Scholar 

  10. Kim MT, Wang W, Sedykh A, Zhu H (2016) Curating and preparing high throughput screening data for quantitative structure activity relationship modeling. In: Zhu H, **a M (eds) High-throughput screening assays in toxicology. Methods in molecular biology, vol 1473. Humana Press, Totowa, New Jersey, pp 161–172

    Chapter  Google Scholar 

  11. Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2

    Article  Google Scholar 

  12. Shanker MS, Hu MY, Hung MS (1996) Effect of data standardization on neural network training. Omega 24:385–397. https://doi.org/10.1016/0305-0483(96)00010-2

    Article  Google Scholar 

  13. Russo DP, Zorn KM, Clark AM et al (2018) Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction. Mol Pharm 15:4361–4370. https://doi.org/10.1021/acs.molpharmaceut.8b00546

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zhu J, Zou H, Rosset S, Hastie T (2009) Multi-class AdaBoost. Stat. Interface 2:349–360. https://doi.org/10.4310/SII.2009.v2.n3.a8

    Article  Google Scholar 

  15. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139

    Article  Google Scholar 

  16. Manning CD, Raghavan P, Schuetze H (2009) The Bernoulli model. In: Introduction to information retrieval. Cambridge University Press, Cambridge, pp 234–265

    Google Scholar 

  17. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27. https://doi.org/10.1109/TIT.1967.1053964

    Article  Google Scholar 

  18. Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  19. Vapnik VN (2000) Methods of pattern recognition. In: The nature of statistical learning theory, 2nd edn. Springer Science & Business Media, Berlin, pp 123–170

    Chapter  Google Scholar 

  20. Korotcov A, Tkachenko V, Russo DP, Ekins S (2017) Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol Pharm 14:4462–4475. https://doi.org/10.1021/acs.molpharmaceut.7b00578

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Organization for Economic Co-operation and Development (2007) Guidance document on the validation of (Quantitative) structure-activity relationship [(Q)SAR] models. OECD Environ Heal Saf Publ Ser Test Assess 69:1–154

    Google Scholar 

  22. Chinchor N (1992) MUC-4 evaluation metrics. MUC4 ‘92 proc 4th Conf Messag Underst 22–29. https://doi.org/10.3115/1072064.1072067

  23. Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874. https://doi.org/10.1016/j.patrec.2005.10.010

    Article  Google Scholar 

  24. Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46. https://doi.org/10.1177/001316446002000104

    Article  Google Scholar 

  25. Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451. https://doi.org/10.1016/0005-2795(75)90109-9

    Article  CAS  PubMed  Google Scholar 

  26. Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Int J Mach Learn Technol 2:37–63

    Article  Google Scholar 

  27. Altman DG, Bland JM (1994) Diagnostic tests. 1: sensitivity and specificity. BMJ 308:1552. https://doi.org/10.1136/bmj.308.6943.1552

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Velez DR, White BC, Motsinger AA et al (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31:306–315. https://doi.org/10.1002/gepi.20211

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Zhu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Ciallella, H.L., Chung, E., Russo, D.P., Zhu, H. (2022). Automatic Quantitative Structure–Activity Relationship Modeling to Fill Data Gaps in High-Throughput Screening. In: Zhu, H., **a, M. (eds) High-Throughput Screening Assays in Toxicology. Methods in Molecular Biology, vol 2474. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2213-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2213-1_16

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2212-4

  • Online ISBN: 978-1-0716-2213-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Navigation