Automatic Quantitative Structure–Activity Relationship Modeling to Fill Data Gaps in High-Throughput Screening

Ciallella, Heather L.; Chung, Elena; Russo, Daniel P.; Zhu, Hao

doi:10.1007/978-1-0716-2213-1_16

Heather L. Ciallella⁴,
Elena Chung⁴,
Daniel P. Russo^4,5 &
…
Hao Zhu^4,5

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2474))

786 Accesses
2 Altmetric

Abstract

Advances in high-throughput screening (HTS) revolutionized the environmental and health sciences data landscape. However, new compounds still need to be experimentally synthesized and tested to obtain HTS data, which will still be costly and time-consuming when a large set of new compounds need to be studied against many tests. Quantitative structure–activity relationship (QSAR) modeling is a standard method to fill data gaps for new compounds. The major challenge for many toxicologists, especially those with limited computational backgrounds, is efficiently develo** optimized QSAR models for each assay with missing data for certain test compounds. This chapter aims to introduce a freely available and user-friendly QSAR modeling workflow, which trains and optimizes models using five algorithms without the need for a programming background.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment

References

Ciallella HL, Zhu H (2019) Advancing computational toxicology in the big data era by artificial intelligence: data-driven and mechanism-driven modeling for chemical toxicity. Chem Res Toxicol 32:536–547. https://doi.org/10.1021/acs.chemrestox.8b00393
Article CAS PubMed PubMed Central Google Scholar
Zhao L, Ciallella HL, Aleksunes LM, Zhu H (2020) Advancing computer-aided drug discovery (CADD) by big data and data-driven machine learning modeling. Drug Discov Today 25:1624–1638. https://doi.org/10.1016/j.drudis.2020.07.005
Article CAS PubMed PubMed Central Google Scholar
Wang Y, Bolton E, Dracheva S et al (2010) An overview of the PubChem BioAssay resource. Nucleic Acids Res 38:D255–D266. https://doi.org/10.1093/nar/gkp965
Article CAS PubMed Google Scholar
Gaulton A, Bellis LJ, Bento AP et al (2012) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40:1100–1107. https://doi.org/10.1093/nar/gkr777
Article CAS Google Scholar
Zhu H (2020) Big data and artificial intelligence modeling for drug discovery. Annu Rev Pharmacol Toxicol 60:1–17. https://doi.org/10.1146/annurev-pharmtox-010919-023324
Article CAS Google Scholar
Jia X, Ciallella HL, Russo DP et al (2021) Construction of a virtual opioid bioprofile: a data-driven QSAR modeling study to identify new analgesic opioids. ACS Sustain Chem Eng 9(10):3909–3919. https://doi.org/10.1021/acssuschemeng.0c09139
Article CAS PubMed PubMed Central Google Scholar
Ciallella HL, Russo DP, Aleksunes LM et al (2020) Predictive modeling of estrogen receptor agonism, antagonism, and binding activities using machine- and deep-learning approaches. Lab Investig 101:490–502. https://doi.org/10.1038/s41374-020-00477-2
Article CAS PubMed Google Scholar
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50:742–754. https://doi.org/10.1021/ci100050t
Article CAS PubMed Google Scholar
Huang R, Sakamuru S, Martin MT et al (2014) Profiling of the Tox21 10K compound library for agonists and antagonists of the estrogen receptor alpha signaling pathway. Sci Rep 4:1–9. https://doi.org/10.1038/srep05664
Article CAS Google Scholar
Kim MT, Wang W, Sedykh A, Zhu H (2016) Curating and preparing high throughput screening data for quantitative structure activity relationship modeling. In: Zhu H, **a M (eds) High-throughput screening assays in toxicology. Methods in molecular biology, vol 1473. Humana Press, Totowa, New Jersey, pp 161–172
Chapter Google Scholar
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830. https://doi.org/10.1007/s13398-014-0173-7.2
Article Google Scholar
Shanker MS, Hu MY, Hung MS (1996) Effect of data standardization on neural network training. Omega 24:385–397. https://doi.org/10.1016/0305-0483(96)00010-2
Article Google Scholar
Russo DP, Zorn KM, Clark AM et al (2018) Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction. Mol Pharm 15:4361–4370. https://doi.org/10.1021/acs.molpharmaceut.8b00546
Article CAS PubMed PubMed Central Google Scholar
Zhu J, Zou H, Rosset S, Hastie T (2009) Multi-class AdaBoost. Stat. Interface 2:349–360. https://doi.org/10.4310/SII.2009.v2.n3.a8
Article Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55:119–139
Article Google Scholar
Manning CD, Raghavan P, Schuetze H (2009) The Bernoulli model. In: Introduction to information retrieval. Cambridge University Press, Cambridge, pp 234–265
Google Scholar
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13:21–27. https://doi.org/10.1109/TIT.1967.1053964
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Vapnik VN (2000) Methods of pattern recognition. In: The nature of statistical learning theory, 2nd edn. Springer Science & Business Media, Berlin, pp 123–170
Chapter Google Scholar
Korotcov A, Tkachenko V, Russo DP, Ekins S (2017) Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol Pharm 14:4462–4475. https://doi.org/10.1021/acs.molpharmaceut.7b00578
Article CAS PubMed PubMed Central Google Scholar
Organization for Economic Co-operation and Development (2007) Guidance document on the validation of (Quantitative) structure-activity relationship [(Q)SAR] models. OECD Environ Heal Saf Publ Ser Test Assess 69:1–154
Google Scholar
Chinchor N (1992) MUC-4 evaluation metrics. MUC4 ‘92 proc 4th Conf Messag Underst 22–29. https://doi.org/10.3115/1072064.1072067
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Article Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20:37–46. https://doi.org/10.1177/001316446002000104
Article Google Scholar
Matthews BW (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 405:442–451. https://doi.org/10.1016/0005-2795(75)90109-9
Article CAS PubMed Google Scholar
Powers DMW (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. Int J Mach Learn Technol 2:37–63
Article Google Scholar
Altman DG, Bland JM (1994) Diagnostic tests. 1: sensitivity and specificity. BMJ 308:1552. https://doi.org/10.1136/bmj.308.6943.1552
Article CAS PubMed PubMed Central Google Scholar
Velez DR, White BC, Motsinger AA et al (2007) A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction. Genet Epidemiol 31:306–315. https://doi.org/10.1002/gepi.20211
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Heather L. Ciallella, Elena Chung, Daniel P. Russo & Hao Zhu
Department of Chemistry, Rutgers University, Camden, NJ, USA
Daniel P. Russo & Hao Zhu

Authors

Heather L. Ciallella
View author publications
You can also search for this author in PubMed Google Scholar
Elena Chung
View author publications
You can also search for this author in PubMed Google Scholar
Daniel P. Russo
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Zhu .

Editor information

Editors and Affiliations

Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, USA
Hao Zhu
National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, USA
Menghang **a

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Ciallella, H.L., Chung, E., Russo, D.P., Zhu, H. (2022). Automatic Quantitative Structure–Activity Relationship Modeling to Fill Data Gaps in High-Throughput Screening. In: Zhu, H., **a, M. (eds) High-Throughput Screening Assays in Toxicology. Methods in Molecular Biology, vol 2474. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2213-1_16

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2213-1_16
Published: 17 March 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2212-4
Online ISBN: 978-1-0716-2213-1
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Automatic Quantitative Structure–Activity Relationship Modeling to Fill Data Gaps in High-Throughput Screening

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment

Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment

Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Quantitative Structure–Activity Relationship Modeling to Fill Data Gaps in High-Throughput Screening

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment

Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment

Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation