Abstract
Genomic research has been highly prominent in recent times. Society has witnessed huge progress in genomic research in the last decade. The amount of data generated due activities like genome sequencing is huge. It is important to analyse such huge amount of data for acquiring meaningful insight so that such knowledge finds application in real-life scenarios. However, analysing such huge volume of data is extremely difficult because of the unique characteristics and complexities of these data. Big data analytic approaches are possible to explore for analytic purpose, and there have been researching efforts in that direction. In this paper, the relationship between genomic data and big data analytics has been explored. Challenges in processing of genomic data are analysed. The issue like how big data analytics concepts can be applied in genomic data processing is addressed. Future trends in combined research direction in the area of genomics and big data analytics are outlined.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Furtado, R.N.: Gene editing: the risks and benefits of modifying human DNA. Rev. Bioét. 27(2) (2019). https://doi.org/10.1590/1983-80422019272304; On-line version ISSN 1983–8034
He, K.Y., Ge, D., He, M.M.: Big data analytics for genomic medicine. Int. J. Mol. Sci. 18(2), 412 (2017). https://doi.org/10.3390/ijms18020412
Gullapalli, R.R., Lyons-Weiler, M., Petrosko, P., Dhir, R., Becich, M.J., LaFramboise, W.A.: Clinical integration of next-generation sequencing technology. Clinics Laborat. Med. 32(4), 585–599 (2012)
Robison, R.J.: How big is the human genome? Precision Med (2014)
Ritchie, M.D., Holzinger, E.R., Li, R., Pendergrass, S.A., Kim, D.: Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet. 16, 85–97 (2015). https://doi.org/10.1038/nrg3868
Navarro, F.C.P., Mohsen, H., Yan, C., et al.: Genomics and data science: an application within an umbrella. Genome Biol 20, 109 (2019). https://doi.org/10.1186/s13059-019-1724-1
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A System for Large-Scale Graph Processing, SIGMOD’10, June 6–11, 2010,, pp. 135–145. Indianapolis, Indiana, USA (2010)
Sakr, S., Orakzai, F. M., Abdelaziz, I., Khayyat, Z.: Large-Scale Graph Processing Using Apache Giraph. Springer (2016). ISBN 978-3-319-47430-4
Ceri, S., Pinoli, P.: Data science for genomic data management: challenges, resources experiences. SN Comput. Sci. 1, 5 (2020). https://doi.org/10.1007/s42979-019-0005-0
Kashyap, H., Ahmed, H.A., Hoque, N., Roy, S., Bhattacharyya, D.K.: Big Data Analytics in Bioinformatics: A Machine Learning Perspective. (2015) ar**v preprint ar**v:1506.05101
Hulsen, T., Jamuar, S.S., Moody, A.R., Karnes, J.H., Varga, O., Hedensted, S., Spreafico, R., Hafler, D.A., McKinney, E.F.: From big data to precision medicine. Front. Med. 6, 34 (2019). https://doi.org/10.3389/fmed.2019.00034
Sarma, H.K.D., Dwivedi Y.K., Rana N.P., Slade E.L.: A MapReduce based distributed framework for similarity search in healthcare big data environment. In: Janssen, M., et al. (eds.) Open and Big Data Management and Innovation. I3E 2015. Lecture Notes in Computer Science, vol. 9373. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25013-7_14
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a New Framework for Parallel Machine Learning (2014). ar**v preprint ar**v:1408.2041, 2014.
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high performance, portable implementation of the mpi message passing interface standard. Parallel Comput. 22(6), 789–828 (1996)
Grossman, R.L., Heath, A.P., Ferretti, V., Varmus, H.E., Lowy, D.R., Kibbe, W.A., Staudt, L.M.: Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375(12), 1109–1112 (2016)
Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., Hsu, J., Liang, Y., Rivkin, E., Wang, J., Whitty, B., Wong-Erasmus, M., Yao, L., Kasprzyk, A.: International Cancer Genome Consortium Data Portal—A One-Stop Shop for Cancer Genomics Data. Database (2011); 2011:bar026.
Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M.: Cancer Genome Atlas Research Network. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)
Sarma H.K.D.: Security issues in big data. In: Sarma H.K.D., Bhuyan B., Borah S., Dutta N. (eds.) Trends in Communication, Cloud, and Big Data. Lecture Notes in Networks and Systems, vol. 99. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1624-5_7
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Croft, D., OKelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., et al.: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. gkq1018 (2010)
Cerami, E.C., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., Sander, C.: Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 39(1). D685–D690 (2011)
Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., et al. (2015) Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195. https://doi.org/10.1371/journal.pbio.1002195
Lander, E., et al.: Initial sequencing and analysis of the human genome”. Nature 409, 860–921 (2001). https://doi.org/10.1038/35057062. International Human Genome Sequencing Consortium, Whitehead Institute for Biomedical Research, Center for Genome Research
Lander, E.S., Waterman, M.S.: Genomic map** by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988). https://doi.org/10.1016/0888-7543(88)90007-9
Sims, D., Sudbery, I., Ilott, N.E., Heger, A., Ponting, C.P.: Sequencing depth and coverage: Key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014). https://doi.org/10.1038/nrg3642
Schatz, M.C.: Cloudburst: Highly sensitive read map** with mapreduce. Bioinformatics 25, 1363–1369 (2009). https://doi.org/10.1093/bioinformatics/btp236
Langmead, B., Schatz, M.C., Lin, J., Pop, M., Salzberg, S.L.: Searching for SNPS with cloud computing. Genome Biol. 10, R134 (2009). https://doi.org/10.1186/gb-2009-10-11-r134
Pireddu, L., Leo, S., Zanetti, G.: Seal: A distributed short read map** and duplicate removal tool. Bioinformatics 27, 2159–2160 (2011). https://doi.org/10.1093/bioinformatics/btr325
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.: The sequence alignment/map format and samtools. Bioinformatics 25, 2078–2079 (2009). https://doi.org/10.1093/bioinformatics/btp352
De Pristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., et al.: A framework for variation discovery and genoty** using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). https://doi.org/10.1038/ng.806
Garrison, E., Marth, G.: Haplotype-based variant detection from short-read sequencing. Available online: http://arxiv.org/abs/1207.3907
Evani, U.S., Challis, D., Yu, J., Jackson, A.R., Paithankar, S., Bainbridge, M.N., Jakkamsetti, A., Pham, P., Coarfa, C., Milosavljevic, A., et al.: Atlas2 Cloud: a framework for personal genome analysis in the cloud. BMC Genom. 13(Suppl. 6), S19 (2012). https://doi.org/10.1186/1471-2164-13-S6-S19
McCarthy, D.J., Humburg, P., Kanapin, A., et al.: Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014). https://doi.org/10.1186/gm543
Wang, K., Li, M., Hakonarson, H.: Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, 164 (2010). https://doi.org/10.1093/nar/gkq603
Cingolani, P., Platts, A., le Wang, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., Ruden, D.M.: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012). https://doi.org/10.4161/fly.19695
McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R., Thormann, A., Flicek, P., Cunningham, F.: The ensemble variant effect predictor. Genome Biol. 17, 122 (2016). https://doi.org/10.1186/s13059-016-0974-4
He, M., Person, T.N., Hebbring, S.J., Heinzen, E., Ye, Z., Schrodi, S.J., McPherson, E.W., Lin, S.M., Peissig, P.L., Brilliant, M.H., et al.: Seqhbase: A big data toolset for family based sequencing data analysis. J. Med. Genet. 52, 282–288 (2015). https://doi.org/10.1136/jmedgenet-2014-102907
Libbrecht, M.W., Noble, W.S.: Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015). https://doi.org/10.1038/nrg3920
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sarma, H.K.D. (2022). Genomic Data and Big Data Analytics. In: Sarma, H.K.D., Balas, V.E., Bhuyan, B., Dutta, N. (eds) Contemporary Issues in Communication, Cloud and Big Data Analytics. Lecture Notes in Networks and Systems, vol 281. Springer, Singapore. https://doi.org/10.1007/978-981-16-4244-9_15
Download citation
DOI: https://doi.org/10.1007/978-981-16-4244-9_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4243-2
Online ISBN: 978-981-16-4244-9
eBook Packages: EngineeringEngineering (R0)