Comparison of KNN and SVM Methods for the Accuracy of Individual Race Classification Prediction Based on SNP Genetic Data

  • Conference paper
  • First Online:
Proceeding of the 3rd International Conference on Electronics, Biomedical Engineering, and Health Informatics

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 1008))

  • 294 Accesses

Abstract

A single nucleotide polymorphism (SNP) is a DNA sequence variation in a population. SNP is just a single nucleotide difference in the genome. Many statistical methods have been proposed to predict the racial classification of individuals based on SNP genetic data. The selection of the right classification method is very important because it will determine the accuracy of the classification results. This research aims to identify the highest average accuracy between two popular classification methods in Machine Learning (ML), including K-Nearest Neighborhood (KNN) and Support Vector Machine (SVM). This study used SNP genetic data for 120 samples from 2 CEU-European races and Yoruba-African races, where for each sample 10 SNPs were selected with the same location identity. The experiment was carried out by testing each classification method with variations in the percentage of test data 10, 20, 30, 40 and 50, which was combined with Euclidean distance for the KNN classification method. Based on the results of the study, the accuracy of the prediction of the classification of individual races based on SNP genetic data, the classification using KNN has an average prediction accuracy that is better than the SVM classification if the SNP location used tests has a high correlation with the sample class. In this case, the highest average accuracy value of KNN is 98.906% and SVM is 98.779%. There is a significant difference between the highest average accuracy of KNN and SVM based on the Wilcoxon statistical test with a significance level of α = 0.05. Benefits of this research are to find the right classification method for predictions of individual racial classification based on SNP genetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (France)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 181.89
Price includes VAT (France)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 232.09
Price includes VAT (France)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
EUR 232.09
Price includes VAT (France)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Mahendran N, Durai Raj Vincent PM, Srinivasan K, Chang CY (2020) Machine learning based computational gene selection models: a survey, performance evaluation, open issues, and future research directions. Front Genet 11:1–25. https://doi.org/10.3389/fgene.2020.603808

  2. Kaur S, Ali A, Ahmad U, Siahbalaei Y, Pandey AK, Singh B (2019) Role of single nucleotide polymorphisms (SNPs) in common migraine. Egypt J Neurol, Psychiatry Neurosurg 55(1). https://doi.org/10.1186/s41983-019-0093-8

  3. Sampson J, Kidd KK, Kidd JR, Zhao H (2011) Selecting SNPs to identify ancestry. Nat Inst Health 539–553. https://doi.org/10.1111/j.1469-1809.2011.00656.x

  4. Lippert C et al (2017) Identification of individuals by trait prediction using whole-genome sequencing data. Proceed Nat Acad Sci USA 114(38):10166–10171. https://doi.org/10.1073/pnas.1711125114

    Article  Google Scholar 

  5. Batnyam N, Gantulga A, Oh S (2013) An efficient classification for single nucleotide polymorphism (SNP) dataset. Stud Comput Intell 493:171–185. https://doi.org/10.1007/978-3-319-00804-2_13

    Article  Google Scholar 

  6. Nababan AA, Khairi M, Harahap BS (2022) Implementation of K-Nearest Neighbors (KNN) algorithm in classification of data water quality. J Mantik 6(36):30–35

    Google Scholar 

  7. Tiro MA (2022) Comparison of k-Nearest Neighbor (k-NN) and support vector machine (SVM) methods for classification of poverty data in Papua. ARRUS J Math Appl Sci 2(2):83–91. https://doi.org/10.35877/mathscience741

  8. Maria Navin JR, Pankaja R (2016) Performance analysis of text classification algorithm using confusion matrix. Int J Eng Tech Res (IJETR) 6(4):75–78

    Google Scholar 

  9. Bhavsar H, Panchal MH (2012) A review on support vector machine for data classification. Int J Adv Res Comput Eng Technol (IJARCET) 1(10):185–189

    Google Scholar 

  10. Michie D, Speigelhalter DJ, Taylor CC (1994) Machine learning: neural and statistical classification. Overseas Press

    Google Scholar 

  11. AlZoman RM, Alenazi MJF (2021) A comparative study of traffic classification techniques for smart city networks. Sens J 21(14):43–63. https://doi.org/10.1007/978-3-319-61313-0

  12. Wu X et al (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37. https://doi.org/10.1007/s10115-007-0114-2

  13. Bramer M (2013) Principles of data mining, 2nd ed., Springer. https://doi.org/10.1007/978-1-4471-4884-5

  14. Ul Hassan CA, Khan MS, Shah MA (2018) Comparison of machine learning algorithms in data classification. In: Proceedings of the 24th international conference on automation & computing 2018, pp 1–6, Newcastle University. https://doi.org/10.23919/IConAC.2018.8748995

  15. Khan A, Baharudin B, Lee LH, Khan K (2010) A review of machine learning algorithms for text-documents classification. J Adv Inf Technol 1(1):4–20. https://doi.org/10.4304/jait.1.1.4-20

    Article  Google Scholar 

  16. Chawla S, Kumara R, Aggarwal E, Swain S (2018) Breast cancer detection using K-nearest neighbour algorithm. In: Proceedings of international conference on computational intelligence and internet of things 2(4). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3361553

  17. Hatem MQ (2022) Skin lesion classification system using a K-nearest neighbor algorithm. Vis Comput Ind Biomed Art 5(1). https://doi.org/10.1186/s42492-022-00103-6

  18. Fu Y, He HS, Hawbaker TJ, Henne PD, Zhu Z, Larsen DR (2019) Evaluating k-nearest neighbor (kNN) imputation models for species-level aboveground forest biomass map** in Northeast China. Remote Sens J 11(17). https://doi.org/10.3390/rs11172005

  19. Ustuner M, Sanli FB, Dixon B (2015) Application of support vector machines for landuse classification using high-resolution rapideye images: a sensitivity analysis. Eur J Remote Sens 48(1):403–422. https://doi.org/10.5721/EuJRS20154823

    Article  Google Scholar 

  20. Hamilton D, Pacheco R, Myers B, Peltzer B (2018) kNN vs. SVM: a comparison of algorithms. In: Proceedings of the fire continuum—preparing for the future of wildland fire, Missoula, Montana, United States Department of Agriculture, pp 95–110

    Google Scholar 

  21. Vivian-Griffiths T et al (2019) Predictive modeling of schizophrenia from genomic data: Comparison of polygenic risk score with kernel support vector machines approach. Am J Med Gene Part B Neuropsychiatric Gene 180(1):80–85. https://doi.org/10.1002/ajmg.b.32705

  22. Lestari W, Sumarlinda S (2022) Implementation of K-nearest neighbor (KNN) and suport vector machine (SVM) for Classification cardiovascular disease. Int J MultiSci 2(10). https://archive.ics.uci.edu/ml/datasets/heart+disease

  23. Ghosh S, Singh A, Kavita Z, Jhanjhi NZ, Masud M, Aljahdali S (2022) SVM and KNN based CNN architectures for plant classification. Comput Mater Continua 71(3):4257–4274. https://doi.org/10.32604/cmc.2022.023414

  24. Veena K, Meena K, Teekaraman Y, Kuppusamy R, Radhakrishnan A (2022) C SVM classification and KNN techniques for cyber crime detection. Hindawi Wirel Commun Mob Comput 2022. https://doi.org/10.1155/2022/3640017

    Article  Google Scholar 

  25. Desiani A, Lestari AA, Al-Ariq M, Amran A, Andriani Y (2022) Comparison of support vector machine and K-nearest neighbors in breast cancer classification. Pattimura Int J Mathe (PIJMath) 1(1):33–42. https://doi.org/10.30598/pijmathvol1iss1pp33-42

  26. Enoma DO, Bishung J, Abiodun T, Ogunlana O, Osamor VC (2022) Machine learning approaches to genome-wide association studies. J King Saud Univ Sci 34(8). https://doi.org/10.1016/j.jksus.2022.101847

  27. Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM (2022) A review of feature selection methods for machine learning-based disease risk prediction. Front Bioinform 2:1–17. https://doi.org/10.3389/fbinf.2022.927312

  28. Azizzadeh-Roodpish S, Garzon MH, Mainali S (2021) Classifying single nucleotide polymorphisms in humans. Mol Gene Genomics 296(5):1161–1173. https://doi.org/10.1007/s00438-021-01805-x

  29. Romagnoni A et al (2019) Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genoty** data. Sci Rep 9(1):1–18. https://doi.org/10.1038/s41598-019-46649-z

  30. Díaz FD, Lasheras FS, Moreno V, Moratalla-Navarro F, de la Torre AJM, Sánchez VM (2021) GASVeM: a new machine learning methodology for multi-SNP analysis of GWAS data based on genetic algorithms and support vector machines. Mathematics 9(6). https://doi.org/10.3390/math9060654

  31. Phogat M, Kumar D (2021) Disease single nucleotide polymorphism selection using hybrid feature selection technique. J Phys Conf Ser (ICMAI 2021) 1950(1). https://doi.org/10.1088/1742-6596/1950/1/012079

  32. Bhuvaneswari P, Therese AB (2015) Detection of cancer in lung with K-NN classification using genetic algorithm. In: 2nd international conference on nanomaterials and technologies (CNT2014) procedia materials science, vol 10, pp 433–440. https://doi.org/10.1016/j.mspro.2015.06.077

  33. Alchamlat SA, Farnir F (2017) KNN-MDR: a learning approach for improving interactions map** performances in genome wide association studies. BMC Bioinform 18(1):1–12. https://doi.org/10.1186/s12859-017-1599-7

  34. Bzdok D, Krzywinski M, Altman N (2018) Machine learning: supervised methods, SVM and kNN. Nat Methods 15(1):1–6. https://doi.org/10.1038/nmeth.4551

  35. Ma X et al (2020) Comparison and development of machine learning tools for the prediction of chronic obstructive pulmonary disease in the Chinese population. J Transl Med 18(1):1–14. https://doi.org/10.1186/s12967-020-02312-0

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Prihanto Ngesti Basuki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Basuki, P.N., Sri Yulianto, J.P., Setiawan, A. (2023). Comparison of KNN and SVM Methods for the Accuracy of Individual Race Classification Prediction Based on SNP Genetic Data. In: Triwiyanto, T., Rizal, A., Caesarendra, W. (eds) Proceeding of the 3rd International Conference on Electronics, Biomedical Engineering, and Health Informatics. Lecture Notes in Electrical Engineering, vol 1008. Springer, Singapore. https://doi.org/10.1007/978-981-99-0248-4_28

Download citation

Publish with us

Policies and ethics

Navigation