Log in

First report of q-RASAR modeling toward an approach of easy interpretability and efficient transferability

  • Original Article
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

Quantitative structure–activity relationship (QSAR) and read-across techniques have recently been merged into a new emerging field of read-across structure–activity relationship (RASAR) that uses the chemical similarity concepts of read-across (an unsupervised step) and finally develops a supervised learning model (like QSAR). The RASAR method has so far been used only in case of graded predictions or classification modeling. In this work, we attempt, for the first time, to apply RASAR for quantitative predictions (q-RASAR) using a case study of androgen receptor binding affinity data. We have computed a number of error-based and similarity-based measures such as weighted standard deviation of the predicted values, coefficient of variation of the computed predictions, average similarity level of close training compounds for each query molecule, standard deviation and coefficient of variation of similarity levels, maximum similarity levels to positive and negative close training compounds, a concordance measure indicating similarity to positive, negative or both classes of close training compounds, etc. We have clubbed these additional measures along with the selected chemical descriptors from the previously developed QSAR model and redeveloped new partial least squares models from the training set, and predicted the endpoint using the query data set. Interestingly, these new models outperform the internal and external validation quality of the original QSAR model. In this study, we have also introduced a new similarity-based concordance measure (Banerjee-Roy coefficient) that can significantly contribute to the model quality. A q-RASAR model also has the advantage over read-across predictions in providing easy interpretation and indicating quantitative contributions of important chemical features. The strategy described here should be applicable to other biological/toxicological/property data modeling for enhanced quality of predictions, easy interpretability, and efficient transferability.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data and software availability

The data set was originally collected from the Endocrine Disruptor Knowledge Base (EDKB) database (https://www.fda.gov/science-research/bioinformatics-tools/endocrine-disruptor-knowledge-base) and is available in Supplementary Information in the Excel format. The DTC Laboratory tools used in this study are available free of charge from https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home and http://teqip.jdvu.ac.in/QSAR_Tools/

Abbreviations

QSAR:

Quantitative structure–activity relationship

RA:

Read-across

RASAR:

Read-across structure–activity relationship

PLS:

Partial least squares

ICP:

Intelligent consensus predictions

CM:

Consensus model

SD:

Standard deviation

CV:

Coefficient of variation

References

  1. Bowes J, Brown AJ, Hamon J, Jarolimek W, Sridhar A, Waldron G, Whitebread G (2012) Reducing safety-related drug attrition: the use of in vitro pharmacological profiling. Nat Rev Drug Discov 11: 909–922. https://www.nature.com/articles/nrd3845

  2. Knapen D, Angrish MM, Fortin MC, Katsiadaki I, Leonard M, Margiotta-Casaluci L, Munn S, O’Brien JM, Pollesch N, Smith LC, Zhang X, Villeneuvei DL (2018) Adverse outcome pathway networks I: development and applications. Environ Toxicol Chem 37(6):1723–1733. https://doi.org/10.1002/etc.4125

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Maldonado AG, Doucet JP, Petitjean M, Fan BT (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 10:39–79. https://doi.org/10.1007/s11030-006-8697-1

    Article  CAS  PubMed  Google Scholar 

  4. Schultz TW, Amcoff P, Berggren E, Gautier F, Klaric M, Knight DJ, Mahony C, Schwarz M, White A, Cronin MTD (2015) A strategy for structuring and reporting a read-across prediction of toxicity. Regul Toxicol Pharmacol 72:586–601. https://doi.org/10.1016/j.yrtph.2015.05.016

    Article  CAS  PubMed  Google Scholar 

  5. Luechtefeld T, Maertens A, Russo DP, Rovida C, Zhu H, Hartung T (2016) Analysis of publically available skin sensitization data from REACH registrations 2008–2014. Altex 33(2):135–148. https://doi.org/10.14573/altex.1510055

    Article  PubMed  PubMed Central  Google Scholar 

  6. Luechtefeld T, Marsh D, Rowlands C, Hartung T (2018) Machine learning of toxicological big data enables read-across structure activity relationships (RASAR) outperforming animal test reproducibility. Toxicol Sci 165(1):198–212. https://doi.org/10.1093/toxsci/kfy152

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Hemmerich J, Ecker GF (2020) In silico toxicology: from structure–activity relationships towards deep learning and adverse outcome pathways WIREs. Comput Mol Sci 10:e1475. https://doi.org/10.1002/wcms.1475

    Article  CAS  Google Scholar 

  8. Huang T, Sun G, Zhao L, Zhang N, Zhong R, Peng Y (2021) Quantitative structure-activity relationship (QSAR) studies on the toxic effects of nitroaromatic compounds (NACs): a systematic review. Int J Mol Sci 22:8557. https://doi.org/10.3390/ijms22168557

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Diwan M, AbdulHameed M, Liu R, Schyman P, Sachs D, Xu Z, Desai V, Wallqvist A (2021) ToxProfiler: toxicity-target profiler based on chemical similarity. Comput Toxicol 18:100162. https://doi.org/10.1016/j.comtox.2021.100162

    Article  CAS  Google Scholar 

  10. Villeneuve DL, Angrish MM, Fortin MC, Katsiadaki I, Leonard M, Margiotta-Casaluci L, Munn S, O’Brien JM, Pollesch NL, Smith LC, Zhang X, Knapen D (2018) Adverse outcome pathway networks II: network analytics. Environ Toxicol Chem 37(6):1734–1748. https://doi.org/10.1002/etc.4124

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Wu J, D’Ambrosi S, Ammann L, Stadnicka-Michalak J, Schirmer K, Baity-Jesi M (2022) Predicting chemical hazard across taxa through machine learning. Environ Int 163:107184. https://doi.org/10.1016/j.envint.2022.107184

    Article  CAS  PubMed  Google Scholar 

  12. Banerjee A, De P, Kumar V, Kar S, Roy K (2022) Quick and efficient quantitative predictions of androgen receptor binding affinity for screening endocrine disruptor chemicals using 2D-QSAR and chemical read-across. ChemRxiv. https://doi.org/10.26434/chemrxiv-2022-gcrjg

  13. Chatterjee M, Banerjee A, De P, Gajewicz-Skretna A, Roy K (2022) A novel quantitative read-across tool designed purposefully to fill the existing gaps in nanosafety data. Environ Sci Nano 9:189–203. https://doi.org/10.1039/D1EN00725D

    Article  CAS  Google Scholar 

  14. Wallach I, Heifets A (2018) Most ligand-based classification benchmarks reward memorization rather than generalization. J Chem Inf Model 58(5):916–932. https://doi.org/10.1021/acs.jcim.7b00403

    Article  CAS  PubMed  Google Scholar 

  15. Wold S, Sjöström M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Sys 58(2):109–130. https://doi.org/10.1016/S0169-7439(01)00155-1

    Article  CAS  Google Scholar 

  16. Roy K, Kar S, Das RN (2015) Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment. Academic Press, NY

    Google Scholar 

  17. Roy K, Das RN, Ambure P, Aher RB (2016) Be aware of error measures further studies on validation of predictive QSAR models. Chemom Intell Lab Sys 152:18–33. https://doi.org/10.1016/j.chemolab.2016.01.008

    Article  CAS  Google Scholar 

  18. Gramatica P, Sangion AA (2016) A historical excursus on the statistical validation parameters for QSAR Models: A clarification concerning metrics and terminology. J Chem Inf Model 56:1127–1131. https://doi.org/10.1021/acs.jcim.6b00088

    Article  CAS  PubMed  Google Scholar 

  19. Roy K, Ambure P, Kar S, Ojha PK (2018) Is it possible to improve the quality of predictions from an “intelligent” use of multiple QSAR/QSPR/QSTR models? J Chemom 32:e2992. https://doi.org/10.1002/cem.2992

    Article  CAS  Google Scholar 

  20. Eriksson L, Byrne T, Johansson E, Trygg J, Vikström C (2013) Multi- and megavariate data analysis basic principles and applications. Umetrics Academy, Umeå

    Google Scholar 

  21. Hong H, Fang H, **e Q, Perkins R, Sheehan DM, Tong W (2003) Comparative molecular field analysis (CoMFA) model using a large diverse set of natural, synthetic and environmental chemicals for binding to the androgen receptor. SAR QSAR Environ Res 14(5–6):373–388. https://doi.org/10.1080/10629360310001623962

    Article  CAS  PubMed  Google Scholar 

  22. Piir G, Sild S, Maran U (2021) Binary and multi-class classification for androgen receptor agonists, antagonists and binders. Chemosphere 262:128313. https://doi.org/10.1016/j.chemosphere.2020.128313

    Article  CAS  PubMed  Google Scholar 

  23. Lill MA, Winiger F, Vedani A, Ernst B (2005) Impact of induced fit on ligand binding to the androgen receptor: a multidimensional QSAR study to predict endocrine-disrupting effects of environmental chemicals. J Med Chem 48:5666–5674. https://doi.org/10.1021/jm050403f

    Article  CAS  PubMed  Google Scholar 

  24. Bennett MJ, Albert RH, Jez JM, Ma H, Penning TM, Lewis M (1997) Steroid recognition and regulation of hormone action: crystal structure of testosterone and NADP+ bound to 3α-hydroxysteroid/dihydrodiol dehydrogenase. Structure 5:799–812. https://doi.org/10.1016/S0969-2126(97)00234-7

    Article  CAS  PubMed  Google Scholar 

  25. Banerjee A, Chatterjee M, De P, Roy K (2022) Quantitative predictions from chemical read-across and their confidence measures. ChemRxiv. https://doi.org/10.26434/chemrxiv-2022-4s53w

  26. Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M, Dearden J, Gramatica P, Martin YC, Todeschini R, Consonni V, Kuz’min VE, Cramer R, Benigni R, Yang C, Rathman J, Terfloth L, Gasteiger J, Richard A, Tropsha A (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57:4977–5010. https://doi.org/10.1021/jm4004285

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Fujita T, Winkler DA (2016) Understanding the roles of the “Two QSARs.” J Chem Inf Model 56(2):269–274. https://doi.org/10.1021/acs.jcim.5b00229

    Article  CAS  PubMed  Google Scholar 

Download references

Funding

This research is funded by Science and Engineering Research Board (SERB), New Delhi under the MATRICS scheme (MTR/2019/000008). AB thanks Jadavpur University, Kolkata for a scholarship.

Author information

Authors and Affiliations

Authors

Contributions

The manuscript was written through contributions of both authors. Both authors have given approval to the final version of the manuscript.

Corresponding author

Correspondence to Kunal Roy.

Ethics declarations

Conflict of interest

Declared none.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

RASAR Descriptor Calculator v1.0 has now been made available at https://sites.google.com/jadavpuruniversity.in/dtc-lab-software/home.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary Materials SI-1 contains raw data files in the Excel format (ZIP 653 KB)

11030_2022_10478_MOESM2_ESM.docx

Supplementary Material SI-2 contains score plots, applicability domain plots and randomization plots of models M1 to M4 (DOCX 1671 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banerjee, A., Roy, K. First report of q-RASAR modeling toward an approach of easy interpretability and efficient transferability. Mol Divers 26, 2847–2862 (2022). https://doi.org/10.1007/s11030-022-10478-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11030-022-10478-6

Keywords

Navigation