Feature Selection Using Correlation Matrix on Metagenomic Data with Pearson Enhancing Inflammatory Bowel Disease Prediction

  • Conference paper
  • First Online:
International Conference on Artificial Intelligence for Smart Community

Abstract

The Fourth Industrial Revolution has brought up a vast amount of new innovative implementations. These can be used for numerous areas to make wealth and to improve human ways of living. Our point of view is to consider medical problems for enhancing prediction first. In this study, we like to rise a question about whether if we could enhancing Inflammatory Bowel Disease (IBD) prediction for early detect related sickness by feature selection on metagenomic data. Over the last few years, the prediction has been a challenge. Because of rare information and lacking data, the problem is not well considered enough. To bring back the subject, in this work, we propose a new way of enhancing Inflammatory Bowel Disease (IBD) prediction by using the Correlation Matrix with the Pearson on Metagenomic Data. Our implications have the purpose of finding out whether we could do predictions better using a specific amount of features selected by Pearson correlation coefficient. The result with the proposed method is quite promising, when we address some high correlation features out, the model can predict better comparing to randomly select features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kim ER, Chang DK (2014) Colorectal cancer in inflammatory bowel disease: the risk, pathogenesis, prevention and diagnosis. World J Gastroenterol 20(29):9872–9881. https://doi.org/10.3748/wjg.v20.i29.9872

  2. Centers for Disease Control and Prevention: Inflammatory bowel disease (IBD), from https://www.cdc.gov/ibd/what-is-IBD.htm

  3. NIH-U.S. National Library of Medicine: Crohn’s Disease, from https://medlineplus.gov/crohnsdisease.html

  4. National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK): Ulcerative Colitis, from https://www.niddk.nih.gov/health-information/digestive-diseases/ulcerative-colitis

  5. NHS: Overview Cancer, from https://www.nhs.uk/conditions/cancer/

  6. NIH-National Cancer Institute: Understanding Cancer, from https://www.cancer.gov/about-cancer/understanding/what-is-cancer

  7. American Cancer Society: What Is Cancer? from https://www.cancer.org/cancer/cancer-basics/what-is-cancer.html

  8. (ASCRS) American Society of Colon & Rectal Surgeons: The Colon: What it is, What it Does and Why it is Important: Overview Cancer, from https://fascrs.org/patients/diseases-and-conditions/a-z/the-colon-what-it-is,-what-it-does

  9. Innerbody: Rectum, from https://www.innerbody.com/image_digeov/dige14-new3.html

  10. World Health Organization: Cancer. Retrieved September 28, 2020, from https://www.who.int/news-room/fact-sheets/detail/cancer

  11. American Cancer Society: Key Statistics for Colorectal Cancer. Retrieved September 28, 2020, from https://www.cancer.org/cancer/colon-rectal-cancer/about/key-statistics.html

  12. Vogenberg F, Isaacson Barash C, Pursel M (2010) Personalized medicine: Part 1: Evolution and development into theranostics. Retrieved September 27, 2020, from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2957753/

  13. Top 10 Applications of Machine Learning in Healthcare - FWS. (n.d.). Retrieved September 27, 2020, from https://www.flatworldsolutions.com/healthcare/articles/top-10-applications-of-machine-learning-in-healthcare.php

  14. Nguyen TH, Zucker J (2019) Enhancing metagenome-based disease prediction by unsupervised binning approaches. In: 2019 11th international conference on knowledge and systems engineering (KSE), da nang, Vietnam, 2019, pp 1–5. https://doi.org/10.1109/KSE.2019.8919295

  15. Nguyen TH, Nguyen TN (2019) Disease prediction using metagenomic data visualizations based on manifold learning and convolutional neural network. Lecture Notes in Computer Science, vol 11814. Springer, Cham. https://doi.org/10.1007/978-3-030-35653-8_9

  16. O ndov BD, Bergman NH, Phillippy AM (2011) Interactive metagenomic visualization in a web browser. BMC Bioinform. 12:385. https://doi.org/10.1186/1471-2105-12-385. (ISSN:1471-2105)

  17. Nguyen TH et al (2018) Disease classification in metagenomics with 2D embeddings and deep learning. In: The annual French conference in machine learning (CAp 2018). France: Rouen; June 2018. ar**v: 1806.09046

  18. Thanh-Hai N, Thai-Nghe N (2020) Diagnosis approaches for colorectal cancer using manifold learning and deep learning. SN COMPUT. SCI. 1:281

    Article  Google Scholar 

  19. Laurens van der Maaten GH (2008) Visualizing data using t-sne. J Mach Learn Res 9:8

    MATH  Google Scholar 

  20. Nguyen T, Chevaleyre Y, Prifti E, Sokolovska N, Zucker J (2017) Deep learning for metagenomic data: using 2D embeddings and convolutional neural networks. ar**v: 1712.00244

  21. Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer Topics in Signal Processing, vol 2. Springer, Berlin, Heidelberg. from https://doi.org/10.1007/978-3-642-00296-0_5

  22. Correlation Test Between Two Variables in R. (n.d.). Retrieved October 13, 2020, from http://www.sthda.com/english/wiki/correlation-test-between-two-variables-in-r

  23. The ‘K’ in K-fold cross-validation: davide anguita, Luca Ghelardoni, Alessandro Ghio, Luca Oneto and Sandro Ridella https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2012-62.pdf

  24. Overfitting (2020) Retrieved 13 October 2020, from https://en.wikipedia.org/wiki/Overfitting

  25. Selection bias (2020) Retrieved 13 October 2020, from https://en.wikipedia.org/wiki/Selection_bias

  26. Sokol H, Leducq V, Aschard H et al (2017) Gut 66:1039–1048

    Article  Google Scholar 

  27. Fioravanti D et al (2018) Phylogenetic convolutional neural networks in metagenomics. BMC Bioinformatics 19.S2 (2018): n. pag. Crossref. Web

    Google Scholar 

  28. Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric. PLoS ONE 12(6): e0177678. https://doi.org/10.1371/journal.pone.0177678

  29. En.wikipedia.org (2020) Matthews correlation coefficient. Retrieved October 12, 2020, https://en.wikipedia.org/wiki/Matthews_correlation_coefficient

  30. Huang J, Ling CX (2005) Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 17(3):299–310. https://doi.org/10.1109/TKDE.2005.50

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huong Hoang Luong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luong, H.H. et al. (2022). Feature Selection Using Correlation Matrix on Metagenomic Data with Pearson Enhancing Inflammatory Bowel Disease Prediction. In: Ibrahim, R., K. Porkumaran, Kannan, R., Mohd Nor, N., S. Prabakar (eds) International Conference on Artificial Intelligence for Smart Community. Lecture Notes in Electrical Engineering, vol 758. Springer, Singapore. https://doi.org/10.1007/978-981-16-2183-3_102

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-2183-3_102

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-2182-6

  • Online ISBN: 978-981-16-2183-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation