Unsupervised Learning Methods and Similarity Analysis in Chemoinformatics

Living reference work entry
First Online: 01 January 2016

pp 1–38
Cite this living reference work entry

Handbook of Computational Chemistry

Katarzyna Odziomek²,
Anna Rybinska² &
Tomasz Puzyn²

274 Accesses

Abstract

In this chapter, we present an overview of various chemometric methods, appropriate for analyzing and interpreting data from social media, industry, academia, medicine, and other sources. We discuss unsupervised machine-learning techniques used for grou** (hierarchical cluster analysis, k-means) and exploring (principal component analysis, self-organizing Kohonen maps) all types of data, both quantitative and qualitative. For each method described in this chapter, we explain the basic concepts, provide a rudimentary algorithm, and present practical applications. All the examples are based on a set of molecular descriptors calculated for a selected group of persistent organic pollutants (POPs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Similar content being viewed by others

Unsupervised Learning Methods and Similarity Analysis in Chemoinformatics

Chapter © 2017

Combinatorial Optimization Approaches for Data Clustering

Chapter © 2016

Machine Learning-Based Clustering Analysis: Foundational Concepts, Methods, and Applications

Chapter © 2022

References

Brereton, R. G. (2003). Chemometrics: Data analysis for the laboratory and chemical plant. Chichester/Hoboken: Wiley.
Book Google Scholar
Brereton, R. G. (2009). Chemometrics for pattern recognition. Chichester: Wiley.
Book Google Scholar
Brown, S. D., TauleriFerre, R., & Walczak, B. (2009). Comprehensive chemometrics: Chemical and biochemical data analysis. Amsterdam/London: Elsevier.
Google Scholar
Everitt, B., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). Oxford: Wiley-Blackwell.
Book Google Scholar
Gajewicz, A., Haranczyk, M., & Puzyn, T. (2010). Predicting logarithmic values of the subcooled liquid vapor pressure of halogenated persistent organic pollutants with QSPR: How different are chlorinated and brominated congeners? Atmospheric Environment, 44(11), 1428–1436.
Article CAS Google Scholar
Gemperline, P. (2006). Practical guide to chemometrics (2nd ed.). Boca Raton: CRC/Taylor & Francis.
Book Google Scholar
Golebiowski, M., Sosnowska, A., Puzyn, T., Bogus, M. I., Wieloch, W., Włóka, E., & Stepnowski, P. (2014). Application of two-way hierarchical cluster analysis for the identification of similarities between the individual lipid fractions of Lucilia sericata. Chemistry and Biodiversity, 11, 733–748.
Article CAS Google Scholar
Han, J., Kamber, M., & Pei, J. P. D. (2012). Data mining: Concepts and techniques (3rd ed.). Waltham/Oxford: Morgan Kaufmann/Elsevier Science, distributor.
Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. H. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). New York: Springer.
Book Google Scholar
Jolliffe, I. T. (2002). Principal component analysis (Springer series in statistics 2nd ed.). New York: Springer.
Google Scholar
Khan, S. S., & Kant, S. (2007). Computation of initial modes for K-modes clustering algorithm using evidence accumulation. Paper presented at the Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad.
Google Scholar
Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin/London: Springer.
Book Google Scholar
Kountchev, R., & Iantovics, B. (2013). Advances in intelligent analysis of medical data and decision support systems (Studies in Computational Intelligence, Vol. 473). Springer International Publishing Switzerland.
Google Scholar
Li, Y., Pang, G.-F., Fan, C.-L., & Chen, X. (2013). Hierarchical cluster analysis of matrix effects on 110 pesticide residues in 28 tea matrixes. Journal of AOAC International, 96(6), 1453–1465.
Article CAS Google Scholar
Livingstone, D. (2009). A practical guide to scientific data analysis. Chichester: Wiley.
Book Google Scholar
Maimon, O. Z., & Rokach, L. (2005). Data mining and knowledge discovery handbook. Ramat-Aviv: Springer.
Book Google Scholar
Milligan, G., & Cooper, M. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.
Article Google Scholar
Myatt, G. J. (2007). Making sense of data: A practical guide to exploratory data analysis and data mining. Hoboken: Wiley-Interscience.
Book Google Scholar
Petushkova, N. A., Pyatnitskiy, M. A., Rudenko, V. A., Larina, O. V., Trifonova, O. P., Kisrieva, J. S., Samenkova, N. F., Kuznetsova, G. P., Karuzina, I. I., & Lisitsa, A. V. (2014). Applying of hierarchical clustering to analysis of protein patterns in the human cancer-associated liver. PloS One, 9(8), e103950.
Article Google Scholar
Schnegg, M., Massonnet, G., & Gueissaz, L. (2015). Motorcycle helmets: What about their coating? Forensic Science International, 252, 114–126.
Article CAS Google Scholar
Skwarzec, B., Kabat, K., Puzyn, T., & Astel, A. (2011). Inflow of polonium, uranium and plutonium radionuclides in Odra River catchment area assessment by environmetric expertise. Journal of Radioanalytical and Nuclear Chemistry, 292(2), 519–529.
Article Google Scholar
Varmuza, K., & Filzmoser, P. (2009). Introduction to multivariate statistical analysis in chemometrics. CRC Press: Boca Raton, p xiii, 321 p.
Google Scholar
Vesanto, J., & Alhoniemi, E. (2000). Clustering of the self-organizing map. IEEE Transactions on Neural Networks/A Publication of the IEEE Neural Networks Council, 11(3), 586–600.
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Environmental Chemometrics, Faculty of Chemistry, University of Gdansk, 80-308, Gdansk, Poland
Katarzyna Odziomek, Anna Rybinska & Tomasz Puzyn

Authors

Katarzyna Odziomek
View author publications
You can also search for this author in PubMed Google Scholar
Anna Rybinska
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Puzyn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Katarzyna Odziomek .

Editor information

Editors and Affiliations

Jackson State University, Jackson, Mississippi, USA
Jerzy Leszczynski

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Dordrecht

About this entry

Cite this entry

Odziomek, K., Rybinska, A., Puzyn, T. (2016). Unsupervised Learning Methods and Similarity Analysis in Chemoinformatics. In: Leszczynski, J. (eds) Handbook of Computational Chemistry. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6169-8_53-1

Download citation

DOI: https://doi.org/10.1007/978-94-007-6169-8_53-1
Received: 12 January 2016
Accepted: 01 February 2016
Published: 14 March 2016
Publisher Name: Springer, Dordrecht
Online ISBN: 978-94-007-6169-8
eBook Packages: Springer Reference Chemistry and Mat. ScienceReference Module Physical and Materials ScienceReference Module Chemistry, Materials and Physics

Publish with us

Policies and ethics