Genomic Data and Big Data Analytics

Sarma, Hiren Kumar Deva

doi:10.1007/978-981-16-4244-9_15

Hiren Kumar Deva Sarma¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 281))

888 Accesses

Abstract

Genomic research has been highly prominent in recent times. Society has witnessed huge progress in genomic research in the last decade. The amount of data generated due activities like genome sequencing is huge. It is important to analyse such huge amount of data for acquiring meaningful insight so that such knowledge finds application in real-life scenarios. However, analysing such huge volume of data is extremely difficult because of the unique characteristics and complexities of these data. Big data analytic approaches are possible to explore for analytic purpose, and there have been researching efforts in that direction. In this paper, the relationship between genomic data and big data analytics has been explored. Challenges in processing of genomic data are analysed. The issue like how big data analytics concepts can be applied in genomic data processing is addressed. Future trends in combined research direction in the area of genomics and big data analytics are outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 149.79; Price includes VAT (Germany)

Softcover Book: EUR 192.59; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Big Data Analysis in Computational Biology and Bioinformatics

Genomics in Big Data Bioinformatics

Trends and Application of Data Science in Bioinformatics

References

Furtado, R.N.: Gene editing: the risks and benefits of modifying human DNA. Rev. Bioét. 27(2) (2019). https://doi.org/10.1590/1983-80422019272304; On-line version ISSN 1983–8034
He, K.Y., Ge, D., He, M.M.: Big data analytics for genomic medicine. Int. J. Mol. Sci. 18(2), 412 (2017). https://doi.org/10.3390/ijms18020412
Gullapalli, R.R., Lyons-Weiler, M., Petrosko, P., Dhir, R., Becich, M.J., LaFramboise, W.A.: Clinical integration of next-generation sequencing technology. Clinics Laborat. Med. 32(4), 585–599 (2012)
Google Scholar
Robison, R.J.: How big is the human genome? Precision Med (2014)
Google Scholar
Ritchie, M.D., Holzinger, E.R., Li, R., Pendergrass, S.A., Kim, D.: Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet. 16, 85–97 (2015). https://doi.org/10.1038/nrg3868
Article Google Scholar
Navarro, F.C.P., Mohsen, H., Yan, C., et al.: Genomics and data science: an application within an umbrella. Genome Biol 20, 109 (2019). https://doi.org/10.1186/s13059-019-1724-1
Article Google Scholar
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A System for Large-Scale Graph Processing, SIGMOD’10, June 6–11, 2010,, pp. 135–145. Indianapolis, Indiana, USA (2010)
Google Scholar
Sakr, S., Orakzai, F. M., Abdelaziz, I., Khayyat, Z.: Large-Scale Graph Processing Using Apache Giraph. Springer (2016). ISBN 978-3-319-47430-4
Google Scholar
Ceri, S., Pinoli, P.: Data science for genomic data management: challenges, resources experiences. SN Comput. Sci. 1, 5 (2020). https://doi.org/10.1007/s42979-019-0005-0
Article Google Scholar
Kashyap, H., Ahmed, H.A., Hoque, N., Roy, S., Bhattacharyya, D.K.: Big Data Analytics in Bioinformatics: A Machine Learning Perspective. (2015) ar**v preprint ar**v:1506.05101
Hulsen, T., Jamuar, S.S., Moody, A.R., Karnes, J.H., Varga, O., Hedensted, S., Spreafico, R., Hafler, D.A., McKinney, E.F.: From big data to precision medicine. Front. Med. 6, 34 (2019). https://doi.org/10.3389/fmed.2019.00034
Article Google Scholar
Sarma, H.K.D., Dwivedi Y.K., Rana N.P., Slade E.L.: A MapReduce based distributed framework for similarity search in healthcare big data environment. In: Janssen, M., et al. (eds.) Open and Big Data Management and Innovation. I3E 2015. Lecture Notes in Computer Science, vol. 9373. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25013-7_14
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a New Framework for Parallel Machine Learning (2014). ar**v preprint ar**v:1408.2041, 2014.
Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high performance, portable implementation of the mpi message passing interface standard. Parallel Comput. 22(6), 789–828 (1996)
Article Google Scholar
Grossman, R.L., Heath, A.P., Ferretti, V., Varmus, H.E., Lowy, D.R., Kibbe, W.A., Staudt, L.M.: Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375(12), 1109–1112 (2016)
Article Google Scholar
Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., Hsu, J., Liang, Y., Rivkin, E., Wang, J., Whitty, B., Wong-Erasmus, M., Yao, L., Kasprzyk, A.: International Cancer Genome Consortium Data Portal—A One-Stop Shop for Cancer Genomics Data. Database (2011); 2011:bar026.
Google Scholar
Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M.: Cancer Genome Atlas Research Network. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)
Google Scholar
Sarma H.K.D.: Security issues in big data. In: Sarma H.K.D., Bhuyan B., Borah S., Dutta N. (eds.) Trends in Communication, Cloud, and Big Data. Lecture Notes in Networks and Systems, vol. 99. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1624-5_7
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)
Article Google Scholar
Croft, D., OKelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., et al.: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. gkq1018 (2010)
Google Scholar
Cerami, E.C., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., Sander, C.: Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 39(1). D685–D690 (2011)
Google Scholar
NASA. https://earthdata.nasa.gov
Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., et al. (2015) Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195. https://doi.org/10.1371/journal.pbio.1002195
Lander, E., et al.: Initial sequencing and analysis of the human genome”. Nature 409, 860–921 (2001). https://doi.org/10.1038/35057062. International Human Genome Sequencing Consortium, Whitehead Institute for Biomedical Research, Center for Genome Research
Lander, E.S., Waterman, M.S.: Genomic map** by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988). https://doi.org/10.1016/0888-7543(88)90007-9
Article Google Scholar
Sims, D., Sudbery, I., Ilott, N.E., Heger, A., Ponting, C.P.: Sequencing depth and coverage: Key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014). https://doi.org/10.1038/nrg3642
Article Google Scholar
Schatz, M.C.: Cloudburst: Highly sensitive read map** with mapreduce. Bioinformatics 25, 1363–1369 (2009). https://doi.org/10.1093/bioinformatics/btp236
Article Google Scholar
Langmead, B., Schatz, M.C., Lin, J., Pop, M., Salzberg, S.L.: Searching for SNPS with cloud computing. Genome Biol. 10, R134 (2009). https://doi.org/10.1186/gb-2009-10-11-r134
Pireddu, L., Leo, S., Zanetti, G.: Seal: A distributed short read map** and duplicate removal tool. Bioinformatics 27, 2159–2160 (2011). https://doi.org/10.1093/bioinformatics/btr325
Article Google Scholar
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.: The sequence alignment/map format and samtools. Bioinformatics 25, 2078–2079 (2009). https://doi.org/10.1093/bioinformatics/btp352
Article Google Scholar
De Pristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., et al.: A framework for variation discovery and genoty** using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). https://doi.org/10.1038/ng.806
Article Google Scholar
Garrison, E., Marth, G.: Haplotype-based variant detection from short-read sequencing. Available online: http://arxiv.org/abs/1207.3907
Evani, U.S., Challis, D., Yu, J., Jackson, A.R., Paithankar, S., Bainbridge, M.N., Jakkamsetti, A., Pham, P., Coarfa, C., Milosavljevic, A., et al.: Atlas2 Cloud: a framework for personal genome analysis in the cloud. BMC Genom. 13(Suppl. 6), S19 (2012). https://doi.org/10.1186/1471-2164-13-S6-S19
Article Google Scholar
McCarthy, D.J., Humburg, P., Kanapin, A., et al.: Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014). https://doi.org/10.1186/gm543
Article Google Scholar
Wang, K., Li, M., Hakonarson, H.: Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, 164 (2010). https://doi.org/10.1093/nar/gkq603
Article Google Scholar
Cingolani, P., Platts, A., le Wang, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., Ruden, D.M.: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012). https://doi.org/10.4161/fly.19695
Article Google Scholar
McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R., Thormann, A., Flicek, P., Cunningham, F.: The ensemble variant effect predictor. Genome Biol. 17, 122 (2016). https://doi.org/10.1186/s13059-016-0974-4
Article Google Scholar
He, M., Person, T.N., Hebbring, S.J., Heinzen, E., Ye, Z., Schrodi, S.J., McPherson, E.W., Lin, S.M., Peissig, P.L., Brilliant, M.H., et al.: Seqhbase: A big data toolset for family based sequencing data analysis. J. Med. Genet. 52, 282–288 (2015). https://doi.org/10.1136/jmedgenet-2014-102907
Article Google Scholar
Libbrecht, M.W., Noble, W.S.: Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015). https://doi.org/10.1038/nrg3920
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Sikkim Manipal Institute of Technology, Majitar, Sikkim, 737136, India
Hiren Kumar Deva Sarma

Authors

Hiren Kumar Deva Sarma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Technology, Sikkim Manipal Institute of Technology, Majitar, Sikkim, India
Hiren Kumar Deva Sarma
Department of Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas
Department of Information Technology, Sikkim Manipal Institute of Technology, Majitar, Sikkim, India
Bhaskar Bhuyan
Department of Computer Science and Engineering, Marwadi University, Rajkot, Gujarat, India
Nitul Dutta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarma, H.K.D. (2022). Genomic Data and Big Data Analytics. In: Sarma, H.K.D., Balas, V.E., Bhuyan, B., Dutta, N. (eds) Contemporary Issues in Communication, Cloud and Big Data Analytics. Lecture Notes in Networks and Systems, vol 281. Springer, Singapore. https://doi.org/10.1007/978-981-16-4244-9_15

Download citation

DOI: https://doi.org/10.1007/978-981-16-4244-9_15
Published: 01 December 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4243-2
Online ISBN: 978-981-16-4244-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Genomic Data and Big Data Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Big Data Analysis in Computational Biology and Bioinformatics

Genomics in Big Data Bioinformatics

Trends and Application of Data Science in Bioinformatics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Genomic Data and Big Data Analytics

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Big Data Analysis in Computational Biology and Bioinformatics

Genomics in Big Data Bioinformatics

Trends and Application of Data Science in Bioinformatics

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation