Log in

Validation subset selections for extrapolation oriented QSPAR models

  • Published:
Molecular Diversity Aims and scope Submit manuscript

Abstract

One of the most important features of QSPAR models is their predictive ability. The predictive ability of QSPAR models should be checked by external validation. In this work we examined three different types of external validation set selection methods for their usefulness in in-silico screening. The usefulness of the selection methods was studied in such a way that: 1) We generated thousands of QSPR models and stored them in `model banks'. 2) We selected a final top model from the model banks based on three different validation set selection methods. 3) We predicted large data sets, which we called `chemical universe sets', and calculated the corresponding SEPs. The models were generated from small fractions of the available water solubility data during a GA Variable Subset Selection procedure. The external validation sets were constructed by random selections, uniformly distributed selections or by perimeter-oriented selections. We found that the best performing models on the perimeter-oriented external validation sets usually gave the best validation results when the remaining part of the available data was overwhelmingly large, i.e., when the model had to make a lot of extrapolations. We also compared the top final models obtained from external validation set selection methods in three independent and different sizes of `chemical universe sets'.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Wermuth, C.-G., Ganellin, C. R., Lindberg, P. and Mitscher, L. A., Glossary of terms used in medicinal chemistry, Pure Appl. Chem., 70 (1998) 1129–1143.

    Google Scholar 

  2. Őrfi, L. and Kövesdi, I., Lead Search, Selection and Optimization in Silico (Virtual) Screening, in Kéri, G. and Tóth, I. (eds.), Molecular Pathomechanisms and New Trends in Drug Research, Taylor & Francis, London, 2003, pp 166–177, ISBN: 0-415-27725-6.

    Google Scholar 

  3. Golbraikh, A. and Tropsha, A., Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection, Mol. Div., 5 (2000) 231–243.

    Google Scholar 

  4. Ran, Y., Jain, N. and Yalkowsky, S. H., Prediction of aqueous solubility of organic compounds by the general solubility equation (GSE), J. Chem. Inf. Comput. Sci., 41 (2001) 1208–1217.

    Google Scholar 

  5. Huuskonen, J., Marja Salo, M. and Taskinen, J., Aqueous solubility prediction of drugs based on molecular topology and neural network modeling, J. Chem. Inf. Comput. Sci., 38 (1998) 450–456.

    Google Scholar 

  6. Mitchell, B. E. and Jurs, P. C., Prediction of aqueous solubility of organic compounds from molecular structure, J. Chem. Inf. Comput. Sci., 38 (1998) 489–496.

    Google Scholar 

  7. McElroy, N. R. and Jurs. P. C., Prediction of aqueous solubility of heteroatom-containing organic compounds from molecular structure., J. Chem. Inf. Comput. Sci., 41 (2001) 1237–1247.

    Google Scholar 

  8. McFarland, J. W., Estimating the water solubilities of crystalline compounds from their chemical structures alone, J.Chem. Inf. Comput. Sci., 41 (2001) 1355–1359.

    Google Scholar 

  9. . Yaffe, D., Cohen, Y., Espinosa, G., Arenas, A. and Giralt, F., A fuzzy ARTMAP based on quantitative structure-property relationships (QSPRs) for predicting aqueous solubility of organic compounds, J. Chem. Inf. Comput. Sci., 41 (2001) 1177–1207.

    Google Scholar 

  10. Huuskonen, J., Estimation of aqueous solubility for diverse set of organic compounds based on molecular topology, J.Chem. Inf. Comput. Sci., 40 (2000) 773–777.

    Google Scholar 

  11. DRAGON Web version 3.0, Talete srl., 1997–2003.

  12. Snee, R. D., Validation of regression-models-methods and examples, Technometrics, 19 (1977) 415–428.

    Google Scholar 

  13. Johnson, M. E., Ylvisaker, D. and Moore, L., Minimax and maximin distance designs, J. Stat. Plan. Infer., 26 (1990) 131–148.

    Google Scholar 

  14. Erös, D., Kéri G., Kövesdi I., Szántai-Kis C., Mészáros G. and Őrfi L., Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods, in press, Mini Rew. in. Med. Chem. (2003).

  15. Erös D., Kövesdi I., Őrfi L., Takács-Novák K., Acsády G. and Kéri G., Reliability of logP predictions based on calculated molecular descriptors: A critical review, Curr. Med. Chem. 9 (2002) 1819–1829.

    Google Scholar 

  16. 3-DNET4W 1.1.36, Vichem Chemie Ltd., 1022 Budapest, Herman O u. 15, 2001–2003.

  17. Kövesdi, I., Kéri, G. and Őrfi, L., Method for Generating a Quantitative Structure Property Activity Relationship, WO 02/082329 (2001).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to László Örfi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Szántai-Kis, C., Kövesdi, I., Kéri, G. et al. Validation subset selections for extrapolation oriented QSPAR models. Mol Divers 7, 37–43 (2003). https://doi.org/10.1023/B:MODI.0000006538.99122.00

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:MODI.0000006538.99122.00

Navigation