Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning

  • Conference paper
  • First Online:
Information Management and Big Data (SIMBig 2023)

Abstract

This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike sequences in a distributed way, without data sharing, to detect different variants of this rapidly mutating coronavirus. Our method maintains the confidentiality of local data (that could be stored in different locations) yet allows us to reliably detect and identify different known and unknown variants of the novel coronavirus SARS-CoV-2. Using the proposed approach, we achieve an overall accuracy of \(93\%\) on the coronavirus variant identification task. We also provide details regarding how the proposed model follows the main laws of federated learning, such as Laws of data ownership, data privacy, model aggregation, and model heterogeneity. Since the proposed model is distributed, it could scale on “Big Data” easily. We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.gisaid.org/.

References

  1. Ahmad, M., Ali, S., Tariq, J., Khan, I., Shabbir, M., Zaman, A.: Combinatorial trace method for network immunization. Inf. Sci. 519, 215–228 (2020)

    Article  MathSciNet  Google Scholar 

  2. Ahmad, M., Tariq, J., Shabbir, M., Khan, I.: Spectral methods for immunization of large networks. ar**v preprint ar**v:1711.00791 (2017)

  3. Aledhari, M., Razzak, R., Parizi, R.M., Saeed, F.: Federated learning: A survey on enabling technologies, protocols, and applications. IEEE Access 8, 140699–140725 (2020)

    Article  Google Scholar 

  4. Ali, S., Ali, T.E., Khan, M.A., Khan, I., Patterson, M.: Effective and scalable clustering of sars-cov-2 sequences. In: 2021 the 5th International Conference on Big Data Research (ICBDR). pp. 42–49 (2021)

    Google Scholar 

  5. Ali, S., Bello, B., Chourasia, P., Punathil, R.T., Zhou, Y., Patterson, M.: Pwm2vec: An efficient embedding approach for viral host specification from coronavirus spike sequences. MDPI Biology (2022)

    Google Scholar 

  6. Ali, S., Patterson, M.: Spike2vec: An efficient and scalable embedding approach for covid-19 spike sequences. In: IEEE International Conference on Big Data (Big Data). pp. 1533–1540 (2021)

    Google Scholar 

  7. Ali, S., Sahoo, B., Khan, M.A., Zelikovsky, A., Khan, I.U., Patterson, M.: Efficient approximate kernel based spike sequence classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2022)

    Google Scholar 

  8. Ali, S., Sahoo, B., Ullah, N., Zelikovskiy, A., Patterson, M., Khan, I.: A k-mer based approach for sars-cov-2 variant identification. In: International Symposium on Bioinformatics Research and Applications. pp. 153–164 (2021)

    Google Scholar 

  9. Boscarino, N., Cartwright, R.A., Fox, K., Tsosie, K.S.: Federated learning and indigenous genomic data sovereignty. Nature machine intelligence 4(11), 909–911 (2022)

    Article  Google Scholar 

  10. Brandes, N., Ofer, D., Peleg, Y., Rappoport, N., Linial, M.: ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38(8), 2102–2110 (02 2022)

    Google Scholar 

  11. Buch, V., Zhong, A., Li, X., Rockenbach, M.A.B.C., Wu, D., Ren, H., Guan, J., Liteplo, A., Dutta, S., Dayan, I., et al.: Development and validation of a deep learning model for prediction of severe outcomes in suspected covid-19 infection. ar**v preprint ar**v:2103.11269 (2021)

  12. Chourasia, P., Tayebi, Z., Ali, S., Patterson, M.: Empowering pandemic response with federated learning for protein sequence data analysis. In: 2023 International Joint Conference on Neural Networks (IJCNN). pp. 01–08. IEEE (2023)

    Google Scholar 

  13. Chowdhury, A., Kassem, H., Padoy, N., Umeton, R., Karargyris, A.: A review of medical federated learning: Applications in oncology and cancer research. In: International MICCAI Brainlesion Workshop. pp. 3–24. Springer (2021)

    Google Scholar 

  14. Coccia, M.: The impact of lockdown on public health during the first wave of covid-19 pandemic: lessons learned for designing effective containment measures to cope with second wave. medRxiv (2020)

    Google Scholar 

  15. Dayan, I., Roth, H.R., Zhong, A., Harouni, A., Gentili, A., Abidin, A.Z., Liu, A., Costa, A.B., Wood, B.J., Tsai, C.S., et al.: Federated learning for predicting clinical outcomes in patients with covid-19. Nat. Med. 27(10), 1735–1743 (2021)

    Article  Google Scholar 

  16. Devijver, P., Kittler, J.: Pattern recognition: A statistical approach. In: London, GB: Prentice-Hall. pp. 1–448 (1982)

    Google Scholar 

  17. Farhan, M., Tariq, J., Zaman, A., Shabbir, M., Khan, I.U.: Efficient approximation algorithms for strings kernel based sequence classification. Advances in neural information processing systems 30 (2017)

    Google Scholar 

  18. Galloway, S., Paul, P., MacCannell, D., Johansson, M., Brooks, J., MacNeil, A., Slayton, R., Tong, S., Silk, B., Armstrong, G., et al.: Emergence of sars-cov-2 b. 1.1. 7 lineage united states, december 29, 2020–january 12, 2021. Morbidity and Mortality Weekly Report 70(3), 95 (2021)

    Google Scholar 

  19. GISAID Website: https://www.gisaid.org/ (2021), [Online; accessed 29-December-2021]

  20. Hadfield, J., Megill, C., Bell, S.M., Huddleston, J., Potter, B., Callender, C., Sagulenko, P., Bedford, T., Neher, R.A.: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23), 4121–4123 (2018)

    Article  Google Scholar 

  21. Hoffmann, H.: Kernel pca for novelty detection. Pattern Recogn. 40(3), 863–874 (2007)

    Article  Google Scholar 

  22. Jiménez-Sánchez, A., Tardy, M., Ballester, M.A.G., Mateus, D., Piella, G.: Memory-aware curriculum federated learning for breast cancer classification. ar**v preprint ar**v:2107.02504 (2021)

  23. Kaimann, D., Tanneberg, I.: What containment strategy leads us through the pandemic crisis? an empirical analysis of the measures against the covid-19 pandemic. PLoS ONE 16(6), e0253237 (2021)

    Article  Google Scholar 

  24. Kairouz, P., McMahan, B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14(1–2), 1–210 (2021)

    Google Scholar 

  25. Kisa, S., Kisa, A.: Under-reporting of covid-19 cases in turkey. The International journal of health planning and management 35(5), 1009–1013 (2020)

    Article  Google Scholar 

  26. Kuzmin, K., et al.: Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochem. Biophys. Res. Commun. 533(3), 553–558 (2020)

    Article  Google Scholar 

  27. Lee, R., Herigon, J., Benedetti, A., Pollock, N., Denkinger, C.: Performance of saliva, oropharyngeal swabs, and nasal swabs for sars-cov-2 molecular detection: a systematic review and meta-analysis. J. Clin. Microbiol. 59(5), e02881-20 (2021)

    Article  Google Scholar 

  28. Li, Q., He, B., Song, D.: Model-contrastive federated learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10713–10722 (2021)

    Google Scholar 

  29. Li, X., Gu, Y., Dvornek, N., Staib, L.H., Ventola, P., Duncan, J.S.: Multi-site fmri analysis using privacy-preserving federated learning and domain adaptation: Abide results. Med. Image Anal. 65, 101765 (2020)

    Article  Google Scholar 

  30. Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research 9(11) (2008)

    Google Scholar 

  31. Majumder, J., Minko, T.: Recent developments on therapeutic and diagnostic approaches for covid-19. The AAPS Journal 23(1), 1–22 (2021)

    Article  Google Scholar 

  32. McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. pp. 1273–1282. PMLR (2017)

    Google Scholar 

  33. Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., Von Haeseler, A., Lanfear, R.: Iq-tree 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37(5), 1530–1534 (2020)

    Article  Google Scholar 

  34. Nasser, N., Fadlullah, Z.M., et al.: A lightweight federated learning based privacy preserving b5g pandemic response network using unmanned aerial vehicles: A proof-of-concept. Comput. Netw. 205, 108672 (2022)

    Article  Google Scholar 

  35. Naveca, F., Nascimento, V., Souza, V., Corado, A., Nascimento, F., Silva, G., Costa, A., Duarte, D., Pessoa, K., Gonçalves, L., et al.: Phylogenetic relationship of sars-cov-2 sequences from amazonas with emerging brazilian variants harboring mutations e484k and n501y in the spike protein. Virological. org 1, 1–8 (2021)

    Google Scholar 

  36. Panwar, H., Gupta, P., Siddiqui, M.K., Morales-Menendez, R., Singh, V.: Application of deep learning for fast detection of covid-19 in x-rays using ncovnet. Chaos, Solitons & Fractals 138, 109944 (2020)

    Article  MathSciNet  Google Scholar 

  37. Shaheen, M., Farooq, M.S., Umer, T., Kim, B.S.: Applications of federated learning; taxonomy, challenges, and research trends. Electronics 11(4), 670 (2022)

    Article  Google Scholar 

  38. Shen, J., Qu, Y., Zhang, W., Yu, Y.: Wasserstein distance guided representation learning for domain adaptation. In: AAAI (2018)

    Google Scholar 

  39. Solis-Reyes, S., Avino, M., Poon, A., Kari, L.: An open-source k-mer based machine learning tool for fast and accurate subty** of hiv-1 genomes. Plos One (2018)

    Google Scholar 

  40. Tariq, J., Ahmad, M., Khan, I., Shabbir, M.: Scalable approximation algorithm for network immunization. In: Pacific Asia Conference on Information Systems (PACIS). p. 200 (2017)

    Google Scholar 

  41. Tayebi, Z., Ali, S., Patterson, M.: Robust representation and efficient feature selection allows for effective clustering of sars-cov-2 variants. Algorithms 14(12), 348 (2021)

    Article  Google Scholar 

  42. Udugama, B., Kadhiresan, P., Kozlowski, H.N., Malekjahani, A., Osborne, M., Li, V.Y., Chen, H., Mubareka, S., Gubbay, J.B., Chan, W.C.: Diagnosing covid-19: the disease and tools for detection. ACS Nano 14(4), 3822–3835 (2020)

    Article  Google Scholar 

  43. West Jr, A., Wertheim, J., Wang, J., Vasylyeva, T., Havens, J., Chowdhury, M., Gonzalez, E., Fang, C., Di Lonardo, S., Hughes, S., et al.: Detection and characterization of the sars-cov-2 lineage b. 1.526 in new york. Nature communications 12(1), 4886 (2021)

    Google Scholar 

  44. WHO Website: https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/

  45. Wood, D., Salzberg, S.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), 1–12 (2014)

    Article  Google Scholar 

  46. World Health Organization: Who coronavirus (covid-19) dashboard. https://covid19.who.int/, [Online; accessed 20-July-2022]

  47. Xu, W., Wu, J., Cao, L.: Covid-19 pandemic in china: Context, experience and lessons. Health policy and technology 9(4), 639–648 (2020)

    Article  Google Scholar 

  48. Yadav, P., et al.: Neutralization potential of covishield vaccinated individuals sera against b. 1.617. 1. bioRxiv 1 (2021)

    Google Scholar 

  49. Zhang, W., Zhou, T., Lu, Q., Wang, X., Zhu, C., Sun, H., Wang, Z., Lo, S.K., Wang, F.Y.: Dynamic-fusion-based federated learning for covid-19 detection. IEEE Internet of Things Journal 8(21), 15884–15891 (2021)

    Article  Google Scholar 

  50. Zhang, W., Davis, B.D., et al.: Emergence of a novel sars-cov-2 variant in southern california. JAMA 325(13), 1324–1326 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sarwan Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chourasia, P., Murad, T., Tayebi, Z., Ali, S., Khan, I.U., Patterson, M. (2024). Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning. In: Lossio-Ventura, J.A., et al. Information Management and Big Data. SIMBig 2023. Communications in Computer and Information Science, vol 2142. Springer, Cham. https://doi.org/10.1007/978-3-031-63616-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-63616-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-63615-8

  • Online ISBN: 978-3-031-63616-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation