Log in

Comparison of logP and logD correction models trained with public and proprietary data sets

  • Published:
Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Abstract

In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa’s. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The ChEMBL logD data, as described in the methods section, is included in the supplemental material, along predictions. We also provide as supplemental material a detailed list of descriptors used in model training and an analysis of the property space for the main three sets used in this work. Due to licensing limitations, the BioByte logD data sets are not available in this publication. Moka calculations for the various sets are subjected to licensing limitations and are only shown for two examples in the results section. Genentech’s internal logD data sets are not available for publication.

References

  1. Waring MJ (2010) Lipophilicity in drug discovery. Expert Opin Drug Discov 5(3):235–248. https://doi.org/10.1517/17460441003605098

    Article  CAS  PubMed  Google Scholar 

  2. Leo A, Hansch C, Elkins D (1971) Partition coefficients and their uses. Chem Rev 71(6):525–616. https://doi.org/10.1021/cr60274a001

    Article  CAS  Google Scholar 

  3. Leo A, Hansch C, Jow YC (1976) Dependence of hydrophobicity of apolar molecules on their molecular volume. J Med Chem 19(5):611–615. https://doi.org/10.1021/jm00227a007

    Article  CAS  PubMed  Google Scholar 

  4. Dearden JC (1985) Partitioning and lipophilicity in quantitative structure-activity relationships. Environ Health Perspect 61(9):203–228. https://doi.org/10.1289/ehp.8561203

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Wang P-H, Lien EJ (1980) Effects of different buffer species on partition coefficients of drugs used in quantitative structure-activity relationships. J Pharm Sci 69(6):662–668. https://doi.org/10.1002/jps.2600690614

    Article  CAS  PubMed  Google Scholar 

  6. Ferreira LA, Chervenak A, Placko S, Kestranek A, Madeira PP, Zaslavsky BY (2015) Effect of ionic composition on the partitioning of organic compounds in octanol-buffer systems. RSC Adv 5(26):20574–20582. https://doi.org/10.1039/c5ra01402f

    Article  CAS  Google Scholar 

  7. Chou JT, Jurs PC (1979) Computer-assisted computation of partition coefficients from molecular structures using fragment constants. J Chem Inf Comput Sci 19(3):172–178. https://doi.org/10.1021/ci60019a013

    Article  CAS  Google Scholar 

  8. Ghose AK, Viswanadhan VN, Wendoloski JJ (1998) Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: an analysis of ALOGP and CLOGP methods. Society 5639(98):3762–3772

    Google Scholar 

  9. Ghose AK, Pritchett A, Crippen GM (1988) Atomic physicochemical parameters for three dimensional structure directed quantitative structure-activity: modeling hydrophobic interactions relationships 1. J Comput Chem 9(1):80–90

    Article  CAS  Google Scholar 

  10. Wang R, Fu Y, Lai L (1997) A new atom-additive method for calculating partition coefficients. J Chem Inf Comput Sci 37(3):615–621. https://doi.org/10.1021/ci960169p

    Article  CAS  Google Scholar 

  11. Işık M, Bergazin TD, Fox T, Rizzi A, Chodera JD, Mobley DL (2020) Assessing the accuracy of octanol-water partition coefficient predictions in the SAMPL6 part II Log P challenge. J Comput Aided Mol Des 34:335. https://doi.org/10.1007/s10822-020-00295-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Giaginis C, Tsantili-Kakoulidou A (2008) Alternative measures of lipophilicity: from octanol-water partitioning to IAM retention. J Pharm Sci 97(8):2984–3004. https://doi.org/10.1002/jps.21244

    Article  CAS  PubMed  Google Scholar 

  13. Garmire LX, Hunt CA (2008) In silico methods for unraveling the mechanistic complexities of intestinal absorption: metabolism-efflux transport interactions ABSTRACT. Drug Metabol Dispos 36(7):1414–1424. https://doi.org/10.1124/dmd.107.020164.1996

    Article  CAS  Google Scholar 

  14. Lipinski CA, Lombardo F, Dominy BW, Feeney PJ (2012) Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv Drug Deliv Rev 64(SUPPL):4–17. https://doi.org/10.1016/j.addr.2012.09.019

    Article  Google Scholar 

  15. Wenlock MC, Austin RP, Barton P, Davis AM, Leeson PD (2003) A comparison of physiochemical property profiles of development and marketed oral drugs. J Med Chem 46(7):1250–1256. https://doi.org/10.1021/jm021053p

    Article  CAS  PubMed  Google Scholar 

  16. **ng L, Glen RC (2002) Novel methods for the prediction of LogP, Pka, and LogD. J Chem Inf Comput Sci 42(4):796–805. https://doi.org/10.1021/ci010315d

    Article  CAS  PubMed  Google Scholar 

  17. Klamt A, Thormann M, Wichmann K, Tosco P (2012) COSMO Sar3D: molecular field analysis based on local COSMO σ-profiles. J Chem Inf Model 52(8):2157–2164. https://doi.org/10.1021/ci300231t

    Article  CAS  PubMed  Google Scholar 

  18. Mansouri K, Cariello NF, Korotcov A, Tkachenko V, Grulke CM, Sprankle CS, Allen D, Casey WM, Kleinstreuer NC, Williams AJ (2019) Open-source QSAR models for PKa prediction using multiple machine learning approaches. J Cheminform. https://doi.org/10.1186/s13321-019-0384-1

    Article  PubMed  PubMed Central  Google Scholar 

  19. Yang Q, Li Y, Yang J, Liu Y, Zhang L, Luo S, Cheng J (2020) Holistic prediction of the p K a in diverse solvents based on a machine-learning approach. Angew Chem 132(43):19444–19453. https://doi.org/10.1002/ange.202008528

    Article  Google Scholar 

  20. Mannhold R, Van De Waterbeemd H (2001) Substructure and whole molecule approaches for calculating Log P. J Comput Aided Mol Des 15(4):337–354. https://doi.org/10.1023/A:1011107422318

    Article  CAS  PubMed  Google Scholar 

  21. Kramer C, Beck B, Clark T (2010) A surface-integral model for log pow. J Chem Inf Model 50(3):429–436. https://doi.org/10.1021/ci900431f

    Article  CAS  PubMed  Google Scholar 

  22. Taft RW (1952) Linear free energy relationships from rates of esterification and hydrolysis of aliphatic and ortho-substituted benzoate esters. J Am Chem Soc 74(11):2729–2732. https://doi.org/10.1021/ja01131a010

    Article  CAS  Google Scholar 

  23. Hansch C, Leo A, Taft RW (1991) A survey of Hammett substituent constants and resonance and field parameters. Chem Rev 91(2):165–195. https://doi.org/10.1021/cr00002a004

    Article  CAS  Google Scholar 

  24. Da Silva CO, Da Silva EC, Nascimento MAC (1999) Ab initio calculations of absolute PKa values in aqueous solution I. Carboxylic acids. J Phys Chem A 103(50):11194–11199. https://doi.org/10.1021/jp9836473

    Article  CAS  Google Scholar 

  25. Citra MJ (1999) Estimating the PK(a) of phenols, carboxylic acids and alcohols from semi-empirical quantum chemical methods. Chemosphere 38(1):191–206. https://doi.org/10.1016/S0045-6535(98)00172-6

    Article  CAS  PubMed  Google Scholar 

  26. Abraham MH, Acree JWE (2010) The transfer of neutral molecules, ions and ionic species from water to wet octanol. Phys Chem Chem Phys 12(40):13182. https://doi.org/10.1039/c0cp00695e

    Article  CAS  PubMed  Google Scholar 

  27. Bouchard G, Carrupt P, Testa B, Gobry V, Girault HH (2001) The apparent lipophilicity of quaternary ammonium ions is influenced by galvani potential difference, not ion-pairing: a cyclic voltammetry study. Pharm Res 18(5):702–708. https://doi.org/10.1023/A:1011001914685

    Article  CAS  PubMed  Google Scholar 

  28. Zamora WJ, Curutchet C, Campanera JM, Luque FJ (2017) Prediction of PH-dependent hydrophobic profiles of small molecules from miertus-scrocco-tomasi continuum solvation calculations. J Phys Chem B 121(42):9868–9880. https://doi.org/10.1021/acs.jpcb.7b08311

    Article  CAS  PubMed  Google Scholar 

  29. Livingston DJ (2012) Theoretical property predictions. Front Med Chem 2:545–570. https://doi.org/10.2174/978160805205910502010545

    Article  Google Scholar 

  30. Tetko IV, Poda GI, Ostermann C, Mannhold R (2009) Accurate in silico Log P predictions: one can’t embrace the unembraceable. QSAR Comb Sci 28(8):845–849. https://doi.org/10.1002/qsar.200960003

    Article  CAS  Google Scholar 

  31. Mannhold R, Poda GI, Ostermann C, Tetko IV (2009) Calculation of molecular lipophilicity: state-of-the-art and comparison of log P methods on more than 96,000 compounds. J Pharm Sci 98(3):861–893. https://doi.org/10.1002/jps.21494

    Article  CAS  PubMed  Google Scholar 

  32. Ertl P, Rohde B, Selzer P (2000) Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem 43(20):3714–3717. https://doi.org/10.1021/jm000942e

    Article  CAS  PubMed  Google Scholar 

  33. Milletti F, Storchi L, Goracci L, Bendels S, Wagner B, Kansy M, Cruciani G (2010) Extending PKa prediction accuracy: high-throughput PKa measurements to understand PKa modulation of new chemical series. Eur J Med Chem 45(9):4270–4279. https://doi.org/10.1016/j.ejmech.2010.06.026

    Article  CAS  PubMed  Google Scholar 

  34. Milletti F, Storchi L, Sforna G, Cruciani G (2007) New and original PKa prediction method using grid molecular interaction fields. J Chem Inf Model 47(6):2172–2181. https://doi.org/10.1021/ci700018y

    Article  CAS  PubMed  Google Scholar 

  35. Leo AJ, Hoekman D (2000) Calculating log P(Oct) with no missing fragments; the problem of estimating new interaction parameters. Perspect Drug Discov Des 18:19–38. https://doi.org/10.1023/A:1008739110753

    Article  CAS  Google Scholar 

  36. Fu L, Liu L, Yang ZJ, Li P, Ding JJ, Yun YH, Lu AP, Hou TJ, Cao DS (2020) Systematic modeling of log D7.4 based on ensemble machine learning, group contribution, and matched molecular pair analysis. J Chem Inf Model 60(1):63–76. https://doi.org/10.1021/acs.jcim.9b00718

    Article  CAS  PubMed  Google Scholar 

  37. Lapins M, Arvidsson S, Lampa S, Berg A, Schaal W, Alvarsson J, Spjuth O (2018) A confidence predictor for logD using conformal regression and a support-vector machine. J Cheminform 10(1):1–10. https://doi.org/10.1186/s13321-018-0271-1

    Article  CAS  Google Scholar 

  38. Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E, Davies M, Dedman N, Karlsson A, Magariños MP, Overington JP, Papadatos G, Smit I, Leach AR. (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–D954. https://chembl.gitbook.io/chembl-interface-documentation/frequently-asked-questions/general-questions

  39. Pérez-Villanueva J, Yépez-Mulia L, Rodríguez-Villar K, Cortés-Benítez F, Palacios-Espinosa JF, Soria-Arteche O (2021) The giardicidal activity of lobendazole, fabomotizole, tenatoprazole and ipriflavone: a ligand-based virtual screening and in vitro study. Eur J Med Chem. https://doi.org/10.1016/j.ejmech.2020.113110

    Article  PubMed  Google Scholar 

  40. Tetko IV, Maran U, Tropsha A (2017) Public (Q)SAR services, integrated modeling environments, and model repositories on the web: state of the art and perspectives for future development. Mol Inf 36(3):1–13. https://doi.org/10.1002/minf.201600082

    Article  CAS  Google Scholar 

  41. Lin B, Pease J (2013) A novel method for high throughput lipophilicity determination by microscale shake flask and liquid chromatography tandem mass spectrometry. Comb Chem High Throughput Screen 16(10):817–825. https://doi.org/10.2174/1386207311301010007

    Article  CAS  PubMed  Google Scholar 

  42. Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29(2):97–101. https://doi.org/10.1021/ci00062a008

    Article  CAS  Google Scholar 

  43. Xue L, Bajorath J (2012) Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Comb Chem High Throughput Screen 3(5):363–372. https://doi.org/10.2174/1386207003331454

    Article  Google Scholar 

  44. Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754. https://doi.org/10.1021/ci100050t

    Article  CAS  PubMed  Google Scholar 

  45. Hall L, Kier L (2000) The E-state as the basis for molecular structure space definition and structure similarity. J Chem Inf Comput 40:784–791

    Article  CAS  Google Scholar 

  46. Varnek A, Baskin I (2011) Machine learning methods for property prediction in chemoinformatics: quo vadis ? J Chem Inf Model 52:1413

    Article  Google Scholar 

  47. Varnek A, Baskin I (2012) Machine learning methods for property prediction in chemoinformatics: quo vadis? J Chem Inf Model. https://doi.org/10.1021/ci200409x

    Article  PubMed  Google Scholar 

  48. Fernández-Delgado M, Sirsat MS, Cernadas E, Alawadi S, Barro S, Febrero-Bande M (2019) An extensive experimental survey of regression methods. Neural Netw 111:11–34. https://doi.org/10.1016/j.neunet.2018.12.010

    Article  PubMed  Google Scholar 

  49. R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  50. Lee ML, Aliagas I, Feng JA, Gabriel T, O’Donnell TJ, Sellers BD, Wiswedel B, Gobbi A (2017) Chemalot and chemalot-knime: command line programs as workflow tools for drug discovery. J Cheminform 9(1):1–14. https://doi.org/10.1186/s13321-017-0228-9

    Article  CAS  Google Scholar 

  51. Tetko IV, Tanchuk VY, Villa AEP (2001) Prediction of N-octanol/water partition coefficients from PHYSPROP database using artificial neural networks and E-state indices. J Chem Inf Comput Sci 41(3–6):1407–1421. https://doi.org/10.1021/ci010368v

    Article  CAS  PubMed  Google Scholar 

  52. Nikolova N, Jaworska J (2004) Approaches to measure chemical similarity—a review. QSAR Comb Sci 22(9–10):1006–1026. https://doi.org/10.1002/qsar.200330831

    Article  CAS  Google Scholar 

  53. Garrido NM, Queimada AJ, Jorge M, Macedo EA, Economou IG (2009) 1-Octanol/water partition coefficients of n-alkanes from molecular simulations of absolute solvation free energies. J Chem Theory Comput 5(9):2436–2446. https://doi.org/10.1021/ct900214y

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

At Genentech: Huy Nguyen, Fabio Broccatelli, and Hao Zheng.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ignacio Aliagas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (CSV 543 KB)

Supplementary file2 (PDF 34 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aliagas, I., Gobbi, A., Lee, ML. et al. Comparison of logP and logD correction models trained with public and proprietary data sets. J Comput Aided Mol Des 36, 253–262 (2022). https://doi.org/10.1007/s10822-022-00450-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10822-022-00450-9

Keywords

Navigation