Abstract
This paper presents a federated learning (FL) approach to train an AI model for SARS-Cov-2 variant classification. We analyze the SARS-CoV-2 spike sequences in a distributed way, without data sharing, to detect different variants of this rapidly mutating coronavirus. Our method maintains the confidentiality of local data (that could be stored in different locations) yet allows us to reliably detect and identify different known and unknown variants of the novel coronavirus SARS-CoV-2. Using the proposed approach, we achieve an overall accuracy of \(93\%\) on the coronavirus variant identification task. We also provide details regarding how the proposed model follows the main laws of federated learning, such as Laws of data ownership, data privacy, model aggregation, and model heterogeneity. Since the proposed model is distributed, it could scale on “Big Data” easily. We plan to use this proof-of-concept to implement a privacy-preserving pandemic response strategy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Ahmad, M., Ali, S., Tariq, J., Khan, I., Shabbir, M., Zaman, A.: Combinatorial trace method for network immunization. Inf. Sci. 519, 215–228 (2020)
Ahmad, M., Tariq, J., Shabbir, M., Khan, I.: Spectral methods for immunization of large networks. ar**v preprint ar**v:1711.00791 (2017)
Aledhari, M., Razzak, R., Parizi, R.M., Saeed, F.: Federated learning: A survey on enabling technologies, protocols, and applications. IEEE Access 8, 140699–140725 (2020)
Ali, S., Ali, T.E., Khan, M.A., Khan, I., Patterson, M.: Effective and scalable clustering of sars-cov-2 sequences. In: 2021 the 5th International Conference on Big Data Research (ICBDR). pp. 42–49 (2021)
Ali, S., Bello, B., Chourasia, P., Punathil, R.T., Zhou, Y., Patterson, M.: Pwm2vec: An efficient embedding approach for viral host specification from coronavirus spike sequences. MDPI Biology (2022)
Ali, S., Patterson, M.: Spike2vec: An efficient and scalable embedding approach for covid-19 spike sequences. In: IEEE International Conference on Big Data (Big Data). pp. 1533–1540 (2021)
Ali, S., Sahoo, B., Khan, M.A., Zelikovsky, A., Khan, I.U., Patterson, M.: Efficient approximate kernel based spike sequence classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2022)
Ali, S., Sahoo, B., Ullah, N., Zelikovskiy, A., Patterson, M., Khan, I.: A k-mer based approach for sars-cov-2 variant identification. In: International Symposium on Bioinformatics Research and Applications. pp. 153–164 (2021)
Boscarino, N., Cartwright, R.A., Fox, K., Tsosie, K.S.: Federated learning and indigenous genomic data sovereignty. Nature machine intelligence 4(11), 909–911 (2022)
Brandes, N., Ofer, D., Peleg, Y., Rappoport, N., Linial, M.: ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38(8), 2102–2110 (02 2022)
Buch, V., Zhong, A., Li, X., Rockenbach, M.A.B.C., Wu, D., Ren, H., Guan, J., Liteplo, A., Dutta, S., Dayan, I., et al.: Development and validation of a deep learning model for prediction of severe outcomes in suspected covid-19 infection. ar**v preprint ar**v:2103.11269 (2021)
Chourasia, P., Tayebi, Z., Ali, S., Patterson, M.: Empowering pandemic response with federated learning for protein sequence data analysis. In: 2023 International Joint Conference on Neural Networks (IJCNN). pp. 01–08. IEEE (2023)
Chowdhury, A., Kassem, H., Padoy, N., Umeton, R., Karargyris, A.: A review of medical federated learning: Applications in oncology and cancer research. In: International MICCAI Brainlesion Workshop. pp. 3–24. Springer (2021)
Coccia, M.: The impact of lockdown on public health during the first wave of covid-19 pandemic: lessons learned for designing effective containment measures to cope with second wave. medRxiv (2020)
Dayan, I., Roth, H.R., Zhong, A., Harouni, A., Gentili, A., Abidin, A.Z., Liu, A., Costa, A.B., Wood, B.J., Tsai, C.S., et al.: Federated learning for predicting clinical outcomes in patients with covid-19. Nat. Med. 27(10), 1735–1743 (2021)
Devijver, P., Kittler, J.: Pattern recognition: A statistical approach. In: London, GB: Prentice-Hall. pp. 1–448 (1982)
Farhan, M., Tariq, J., Zaman, A., Shabbir, M., Khan, I.U.: Efficient approximation algorithms for strings kernel based sequence classification. Advances in neural information processing systems 30 (2017)
Galloway, S., Paul, P., MacCannell, D., Johansson, M., Brooks, J., MacNeil, A., Slayton, R., Tong, S., Silk, B., Armstrong, G., et al.: Emergence of sars-cov-2 b. 1.1. 7 lineage united states, december 29, 2020–january 12, 2021. Morbidity and Mortality Weekly Report 70(3), 95 (2021)
GISAID Website: https://www.gisaid.org/ (2021), [Online; accessed 29-December-2021]
Hadfield, J., Megill, C., Bell, S.M., Huddleston, J., Potter, B., Callender, C., Sagulenko, P., Bedford, T., Neher, R.A.: Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34(23), 4121–4123 (2018)
Hoffmann, H.: Kernel pca for novelty detection. Pattern Recogn. 40(3), 863–874 (2007)
Jiménez-Sánchez, A., Tardy, M., Ballester, M.A.G., Mateus, D., Piella, G.: Memory-aware curriculum federated learning for breast cancer classification. ar**v preprint ar**v:2107.02504 (2021)
Kaimann, D., Tanneberg, I.: What containment strategy leads us through the pandemic crisis? an empirical analysis of the measures against the covid-19 pandemic. PLoS ONE 16(6), e0253237 (2021)
Kairouz, P., McMahan, B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and open problems in federated learning. Foundations and Trends® in Machine Learning 14(1–2), 1–210 (2021)
Kisa, S., Kisa, A.: Under-reporting of covid-19 cases in turkey. The International journal of health planning and management 35(5), 1009–1013 (2020)
Kuzmin, K., et al.: Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochem. Biophys. Res. Commun. 533(3), 553–558 (2020)
Lee, R., Herigon, J., Benedetti, A., Pollock, N., Denkinger, C.: Performance of saliva, oropharyngeal swabs, and nasal swabs for sars-cov-2 molecular detection: a systematic review and meta-analysis. J. Clin. Microbiol. 59(5), e02881-20 (2021)
Li, Q., He, B., Song, D.: Model-contrastive federated learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10713–10722 (2021)
Li, X., Gu, Y., Dvornek, N., Staib, L.H., Ventola, P., Duncan, J.S.: Multi-site fmri analysis using privacy-preserving federated learning and domain adaptation: Abide results. Med. Image Anal. 65, 101765 (2020)
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. Journal of machine learning research 9(11) (2008)
Majumder, J., Minko, T.: Recent developments on therapeutic and diagnostic approaches for covid-19. The AAPS Journal 23(1), 1–22 (2021)
McMahan, B., Moore, E., Ramage, D., Hampson, S., y Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial intelligence and statistics. pp. 1273–1282. PMLR (2017)
Minh, B.Q., Schmidt, H.A., Chernomor, O., Schrempf, D., Woodhams, M.D., Von Haeseler, A., Lanfear, R.: Iq-tree 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37(5), 1530–1534 (2020)
Nasser, N., Fadlullah, Z.M., et al.: A lightweight federated learning based privacy preserving b5g pandemic response network using unmanned aerial vehicles: A proof-of-concept. Comput. Netw. 205, 108672 (2022)
Naveca, F., Nascimento, V., Souza, V., Corado, A., Nascimento, F., Silva, G., Costa, A., Duarte, D., Pessoa, K., Gonçalves, L., et al.: Phylogenetic relationship of sars-cov-2 sequences from amazonas with emerging brazilian variants harboring mutations e484k and n501y in the spike protein. Virological. org 1, 1–8 (2021)
Panwar, H., Gupta, P., Siddiqui, M.K., Morales-Menendez, R., Singh, V.: Application of deep learning for fast detection of covid-19 in x-rays using ncovnet. Chaos, Solitons & Fractals 138, 109944 (2020)
Shaheen, M., Farooq, M.S., Umer, T., Kim, B.S.: Applications of federated learning; taxonomy, challenges, and research trends. Electronics 11(4), 670 (2022)
Shen, J., Qu, Y., Zhang, W., Yu, Y.: Wasserstein distance guided representation learning for domain adaptation. In: AAAI (2018)
Solis-Reyes, S., Avino, M., Poon, A., Kari, L.: An open-source k-mer based machine learning tool for fast and accurate subty** of hiv-1 genomes. Plos One (2018)
Tariq, J., Ahmad, M., Khan, I., Shabbir, M.: Scalable approximation algorithm for network immunization. In: Pacific Asia Conference on Information Systems (PACIS). p. 200 (2017)
Tayebi, Z., Ali, S., Patterson, M.: Robust representation and efficient feature selection allows for effective clustering of sars-cov-2 variants. Algorithms 14(12), 348 (2021)
Udugama, B., Kadhiresan, P., Kozlowski, H.N., Malekjahani, A., Osborne, M., Li, V.Y., Chen, H., Mubareka, S., Gubbay, J.B., Chan, W.C.: Diagnosing covid-19: the disease and tools for detection. ACS Nano 14(4), 3822–3835 (2020)
West Jr, A., Wertheim, J., Wang, J., Vasylyeva, T., Havens, J., Chowdhury, M., Gonzalez, E., Fang, C., Di Lonardo, S., Hughes, S., et al.: Detection and characterization of the sars-cov-2 lineage b. 1.526 in new york. Nature communications 12(1), 4886 (2021)
WHO Website: https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/
Wood, D., Salzberg, S.: Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15(3), 1–12 (2014)
World Health Organization: Who coronavirus (covid-19) dashboard. https://covid19.who.int/, [Online; accessed 20-July-2022]
Xu, W., Wu, J., Cao, L.: Covid-19 pandemic in china: Context, experience and lessons. Health policy and technology 9(4), 639–648 (2020)
Yadav, P., et al.: Neutralization potential of covishield vaccinated individuals sera against b. 1.617. 1. bioRxiv 1 (2021)
Zhang, W., Zhou, T., Lu, Q., Wang, X., Zhu, C., Sun, H., Wang, Z., Lo, S.K., Wang, F.Y.: Dynamic-fusion-based federated learning for covid-19 detection. IEEE Internet of Things Journal 8(21), 15884–15891 (2021)
Zhang, W., Davis, B.D., et al.: Emergence of a novel sars-cov-2 variant in southern california. JAMA 325(13), 1324–1326 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chourasia, P., Murad, T., Tayebi, Z., Ali, S., Khan, I.U., Patterson, M. (2024). Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning. In: Lossio-Ventura, J.A., et al. Information Management and Big Data. SIMBig 2023. Communications in Computer and Information Science, vol 2142. Springer, Cham. https://doi.org/10.1007/978-3-031-63616-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-63616-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-63615-8
Online ISBN: 978-3-031-63616-5
eBook Packages: Computer ScienceComputer Science (R0)