Validation subset selections for extrapolation oriented QSPAR models

Szántai-Kis, Csaba; Kövesdi, István; Kéri, György; Örfi, László

doi:10.1023/B:MODI.0000006538.99122.00

Validation subset selections for extrapolation oriented QSPAR models

Published: March 2003

Volume 7, pages 37–43, (2003)
Cite this article

Molecular Diversity Aims and scope Submit manuscript

Csaba Szántai-Kis^1,2,
István Kövesdi^3,4,
György Kéri^1,4,5 &
…
László Örfi^1,4,5

71 Accesses
10 Citations
Explore all metrics

Abstract

One of the most important features of QSPAR models is their predictive ability. The predictive ability of QSPAR models should be checked by external validation. In this work we examined three different types of external validation set selection methods for their usefulness in in-silico screening. The usefulness of the selection methods was studied in such a way that: 1) We generated thousands of QSPR models and stored them in `model banks'. 2) We selected a final top model from the model banks based on three different validation set selection methods. 3) We predicted large data sets, which we called `chemical universe sets', and calculated the corresponding SEPs. The models were generated from small fractions of the available water solubility data during a GA Variable Subset Selection procedure. The external validation sets were constructed by random selections, uniformly distributed selections or by perimeter-oriented selections. We found that the best performing models on the perimeter-oriented external validation sets usually gave the best validation results when the remaining part of the available data was overwhelmingly large, i.e., when the model had to make a lot of extrapolations. We also compared the top final models obtained from external validation set selection methods in three independent and different sizes of `chemical universe sets'.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

A novel applicability domain technique for map** predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood

Article Open access 03 December 2016

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

Article 07 October 2023

RRegrs: an R package for computer-aided model selection with multiple regression models

Article Open access 15 September 2015

References

Wermuth, C.-G., Ganellin, C. R., Lindberg, P. and Mitscher, L. A., Glossary of terms used in medicinal chemistry, Pure Appl. Chem., 70 (1998) 1129–1143.
Google Scholar
Őrfi, L. and Kövesdi, I., Lead Search, Selection and Optimization in Silico (Virtual) Screening, in Kéri, G. and Tóth, I. (eds.), Molecular Pathomechanisms and New Trends in Drug Research, Taylor & Francis, London, 2003, pp 166–177, ISBN: 0-415-27725-6.
Google Scholar
Golbraikh, A. and Tropsha, A., Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection, Mol. Div., 5 (2000) 231–243.
Google Scholar
Ran, Y., Jain, N. and Yalkowsky, S. H., Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE), J. Chem. Inf. Comput. Sci., 41 (2001) 1208–1217.
Google Scholar
Huuskonen, J., Marja Salo, M. and Taskinen, J., Aqueous solubility prediction of drugs based on molecular topology and neural network modeling, J. Chem. Inf. Comput. Sci., 38 (1998) 450–456.
Google Scholar
Mitchell, B. E. and Jurs, P. C., Prediction of aqueous solubility of organic compounds from molecular structure, J. Chem. Inf. Comput. Sci., 38 (1998) 489–496.
Google Scholar
McElroy, N. R. and Jurs. P. C., Prediction of aqueous solubility of heteroatom-containing organic compounds from molecular structure., J. Chem. Inf. Comput. Sci., 41 (2001) 1237–1247.
Google Scholar
McFarland, J. W., Estimating the water solubilities of crystalline compounds from their chemical structures alone, J.Chem. Inf. Comput. Sci., 41 (2001) 1355–1359.
Google Scholar
. Yaffe, D., Cohen, Y., Espinosa, G., Arenas, A. and Giralt, F., A fuzzy ARTMAP based on quantitative structure-property relationships (QSPRs) for predicting aqueous solubility of organic compounds, J. Chem. Inf. Comput. Sci., 41 (2001) 1177–1207.
Google Scholar
Huuskonen, J., Estimation of aqueous solubility for diverse set of organic compounds based on molecular topology, J.Chem. Inf. Comput. Sci., 40 (2000) 773–777.
Google Scholar
DRAGON Web version 3.0, Talete srl., 1997–2003.
Snee, R. D., Validation of regression-models-methods and examples, Technometrics, 19 (1977) 415–428.
Google Scholar
Johnson, M. E., Ylvisaker, D. and Moore, L., Minimax and maximin distance designs, J. Stat. Plan. Infer., 26 (1990) 131–148.
Google Scholar
Erös, D., Kéri G., Kövesdi I., Szántai-Kis C., Mészáros G. and Őrfi L., Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods, in press, Mini Rew. in. Med. Chem. (2003).
Erös D., Kövesdi I., Őrfi L., Takács-Novák K., Acsády G. and Kéri G., Reliability of logP predictions based on calculated molecular descriptors: A critical review, Curr. Med. Chem. 9 (2002) 1819–1829.
Google Scholar
3-DNET4W 1.1.36, Vichem Chemie Ltd., 1022 Budapest, Herman O u. 15, 2001–2003.
Kövesdi, I., Kéri, G. and Őrfi, L., Method for Generating a Quantitative Structure Property Activity Relationship, WO 02/082329 (2001).

Download references

Author information

Authors and Affiliations

Cooperative Research Center, Semmelweis University, Pf 131, Budapest 5, Hungary, 1367
Csaba Szántai-Kis, György Kéri & László Örfi
Department of Pharmaceutical Chemistry, Semmelweis University, Budapest, Hungary, 1092
Csaba Szántai-Kis
EGIS Pharmaceuticals Ltd., Budapest, Hungary, 1106
István Kövesdi
VICHEM Ltd., Budapest, Hungary, 1022
István Kövesdi, György Kéri & László Örfi
Department of Medical Chemistry, Semmelweis University Peptide Biochemistry Research Group of the Hungarian Academy of Sciences, Budapest, Hungary, 1088
György Kéri & László Örfi

Authors

Csaba Szántai-Kis
View author publications
You can also search for this author in PubMed Google Scholar
István Kövesdi
View author publications
You can also search for this author in PubMed Google Scholar
György Kéri
View author publications
You can also search for this author in PubMed Google Scholar
László Örfi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to László Örfi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szántai-Kis, C., Kövesdi, I., Kéri, G. et al. Validation subset selections for extrapolation oriented QSPAR models. Mol Divers 7, 37–43 (2003). https://doi.org/10.1023/B:MODI.0000006538.99122.00

Download citation

Issue Date: March 2003
DOI: https://doi.org/10.1023/B:MODI.0000006538.99122.00

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Validation subset selections for extrapolation oriented QSPAR models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel applicability domain technique for map** predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

RRegrs: an R package for computer-aided model selection with multiple regression models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Subscribe and save

Buy Now

Navigation

Validation subset selections for extrapolation oriented QSPAR models

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A novel applicability domain technique for map** predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood

MASSA Algorithm: an automated rational sampling of training and test subsets for QSAR modeling

RRegrs: an R package for computer-aided model selection with multiple regression models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Subscribe and save

Buy Now

Search

Navigation