Genomic Data and Big Data Analytics

  • Conference paper
  • First Online:
Contemporary Issues in Communication, Cloud and Big Data Analytics

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 281))

  • 888 Accesses

Abstract

Genomic research has been highly prominent in recent times. Society has witnessed huge progress in genomic research in the last decade. The amount of data generated due activities like genome sequencing is huge. It is important to analyse such huge amount of data for acquiring meaningful insight so that such knowledge finds application in real-life scenarios. However, analysing such huge volume of data is extremely difficult because of the unique characteristics and complexities of these data. Big data analytic approaches are possible to explore for analytic purpose, and there have been researching efforts in that direction. In this paper, the relationship between genomic data and big data analytics has been explored. Challenges in processing of genomic data are analysed. The issue like how big data analytics concepts can be applied in genomic data processing is addressed. Future trends in combined research direction in the area of genomics and big data analytics are outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 149.79
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 192.59
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Furtado, R.N.: Gene editing: the risks and benefits of modifying human DNA. Rev. Bioét. 27(2) (2019). https://doi.org/10.1590/1983-80422019272304; On-line version ISSN 1983–8034

  2. He, K.Y., Ge, D., He, M.M.: Big data analytics for genomic medicine. Int. J. Mol. Sci. 18(2), 412 (2017). https://doi.org/10.3390/ijms18020412

  3. Gullapalli, R.R., Lyons-Weiler, M., Petrosko, P., Dhir, R., Becich, M.J., LaFramboise, W.A.: Clinical integration of next-generation sequencing technology. Clinics Laborat. Med. 32(4), 585–599 (2012)

    Google Scholar 

  4. Robison, R.J.: How big is the human genome? Precision Med (2014)

    Google Scholar 

  5. Ritchie, M.D., Holzinger, E.R., Li, R., Pendergrass, S.A., Kim, D.: Methods of integrating data to uncover genotype-phenotype interactions. Nat Rev Genet. 16, 85–97 (2015). https://doi.org/10.1038/nrg3868

    Article  Google Scholar 

  6. Navarro, F.C.P., Mohsen, H., Yan, C., et al.: Genomics and data science: an application within an umbrella. Genome Biol 20, 109 (2019). https://doi.org/10.1186/s13059-019-1724-1

    Article  Google Scholar 

  7. Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A System for Large-Scale Graph Processing, SIGMOD’10, June 6–11, 2010,, pp. 135–145. Indianapolis, Indiana, USA (2010)

    Google Scholar 

  8. Sakr, S., Orakzai, F. M., Abdelaziz, I., Khayyat, Z.: Large-Scale Graph Processing Using Apache Giraph. Springer (2016). ISBN 978-3-319-47430-4

    Google Scholar 

  9. Ceri, S., Pinoli, P.: Data science for genomic data management: challenges, resources experiences. SN Comput. Sci. 1, 5 (2020). https://doi.org/10.1007/s42979-019-0005-0

    Article  Google Scholar 

  10. Kashyap, H., Ahmed, H.A., Hoque, N., Roy, S., Bhattacharyya, D.K.: Big Data Analytics in Bioinformatics: A Machine Learning Perspective. (2015) ar**v preprint ar**v:1506.05101

  11. Hulsen, T., Jamuar, S.S., Moody, A.R., Karnes, J.H., Varga, O., Hedensted, S., Spreafico, R., Hafler, D.A., McKinney, E.F.: From big data to precision medicine. Front. Med. 6, 34 (2019). https://doi.org/10.3389/fmed.2019.00034

    Article  Google Scholar 

  12. Sarma, H.K.D., Dwivedi Y.K., Rana N.P., Slade E.L.: A MapReduce based distributed framework for similarity search in healthcare big data environment. In: Janssen, M., et al. (eds.) Open and Big Data Management and Innovation. I3E 2015. Lecture Notes in Computer Science, vol. 9373. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25013-7_14

  13. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  14. Low, Y., Gonzalez, J.E., Kyrola, A., Bickson, D., Guestrin, C.E., Hellerstein, J.: Graphlab: a New Framework for Parallel Machine Learning (2014). ar**v preprint ar**v:1408.2041, 2014.

  15. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high performance, portable implementation of the mpi message passing interface standard. Parallel Comput. 22(6), 789–828 (1996)

    Article  Google Scholar 

  16. Grossman, R.L., Heath, A.P., Ferretti, V., Varmus, H.E., Lowy, D.R., Kibbe, W.A., Staudt, L.M.: Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375(12), 1109–1112 (2016)

    Article  Google Scholar 

  17. Zhang, J., Baran, J., Cros, A., Guberman, J.M., Haider, S., Hsu, J., Liang, Y., Rivkin, E., Wang, J., Whitty, B., Wong-Erasmus, M., Yao, L., Kasprzyk, A.: International Cancer Genome Consortium Data Portal—A One-Stop Shop for Cancer Genomics Data. Database (2011); 2011:bar026.

    Google Scholar 

  18. Weinstein, J.N., Collisson, E.A., Mills, G.B., Shaw, K.R., Ozenberger, B.A., Ellrott, K., Shmulevich, I., Sander, C., Stuart, J.M.: Cancer Genome Atlas Research Network. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113 (2013)

    Google Scholar 

  19. Sarma H.K.D.: Security issues in big data. In: Sarma H.K.D., Bhuyan B., Borah S., Dutta N. (eds.) Trends in Communication, Cloud, and Big Data. Lecture Notes in Networks and Systems, vol. 99. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1624-5_7

  20. Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28(1), 27–30 (2000)

    Article  Google Scholar 

  21. Croft, D., OKelly, G., Wu, G., Haw, R., Gillespie, M., Matthews, L., Caudy, M., Garapati, P., Gopinath, G., Jassal, B., et al.: Reactome: a database of reactions, pathways and biological processes. Nucleic Acids Res. gkq1018 (2010)

    Google Scholar 

  22. Cerami, E.C., Gross, B.E., Demir, E., Rodchenkov, I., Babur, O., Anwar, N., Schultz, N., Bader, G.D., Sander, C.: Pathway commons, a web resource for biological pathway data. Nucleic Acids Res. 39(1). D685–D690 (2011)

    Google Scholar 

  23. NASA. https://earthdata.nasa.gov

  24. Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., et al. (2015) Big data: astronomical or genomical? PLoS Biol. 13(7), e1002195. https://doi.org/10.1371/journal.pbio.1002195

  25. Lander, E., et al.: Initial sequencing and analysis of the human genome”. Nature 409, 860–921 (2001). https://doi.org/10.1038/35057062. International Human Genome Sequencing Consortium, Whitehead Institute for Biomedical Research, Center for Genome Research

  26. Lander, E.S., Waterman, M.S.: Genomic map** by fingerprinting random clones: a mathematical analysis. Genomics 2, 231–239 (1988). https://doi.org/10.1016/0888-7543(88)90007-9

    Article  Google Scholar 

  27. Sims, D., Sudbery, I., Ilott, N.E., Heger, A., Ponting, C.P.: Sequencing depth and coverage: Key considerations in genomic analyses. Nat. Rev. Genet. 15, 121–132 (2014). https://doi.org/10.1038/nrg3642

    Article  Google Scholar 

  28. Schatz, M.C.: Cloudburst: Highly sensitive read map** with mapreduce. Bioinformatics 25, 1363–1369 (2009). https://doi.org/10.1093/bioinformatics/btp236

    Article  Google Scholar 

  29. Langmead, B., Schatz, M.C., Lin, J., Pop, M., Salzberg, S.L.: Searching for SNPS with cloud computing. Genome Biol. 10, R134 (2009). https://doi.org/10.1186/gb-2009-10-11-r134

  30. Pireddu, L., Leo, S., Zanetti, G.: Seal: A distributed short read map** and duplicate removal tool. Bioinformatics 27, 2159–2160 (2011). https://doi.org/10.1093/bioinformatics/btr325

    Article  Google Scholar 

  31. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R.: The sequence alignment/map format and samtools. Bioinformatics 25, 2078–2079 (2009). https://doi.org/10.1093/bioinformatics/btp352

    Article  Google Scholar 

  32. De Pristo, M.A., Banks, E., Poplin, R., Garimella, K.V., Maguire, J.R., Hartl, C., Philippakis, A.A., del Angel, G., Rivas, M.A., Hanna, M., et al.: A framework for variation discovery and genoty** using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011). https://doi.org/10.1038/ng.806

    Article  Google Scholar 

  33. Garrison, E., Marth, G.: Haplotype-based variant detection from short-read sequencing. Available online: http://arxiv.org/abs/1207.3907

  34. Evani, U.S., Challis, D., Yu, J., Jackson, A.R., Paithankar, S., Bainbridge, M.N., Jakkamsetti, A., Pham, P., Coarfa, C., Milosavljevic, A., et al.: Atlas2 Cloud: a framework for personal genome analysis in the cloud. BMC Genom. 13(Suppl. 6), S19 (2012). https://doi.org/10.1186/1471-2164-13-S6-S19

    Article  Google Scholar 

  35. McCarthy, D.J., Humburg, P., Kanapin, A., et al.: Choice of transcripts and software has a large effect on variant annotation. Genome Med. 6, 26 (2014). https://doi.org/10.1186/gm543

    Article  Google Scholar 

  36. Wang, K., Li, M., Hakonarson, H.: Annovar: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, 164 (2010). https://doi.org/10.1093/nar/gkq603

    Article  Google Scholar 

  37. Cingolani, P., Platts, A., le Wang, L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., Ruden, D.M.: A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly 6, 80–92 (2012). https://doi.org/10.4161/fly.19695

    Article  Google Scholar 

  38. McLaren, W., Gil, L., Hunt, S.E., Riat, H.S., Ritchie, G.R., Thormann, A., Flicek, P., Cunningham, F.: The ensemble variant effect predictor. Genome Biol. 17, 122 (2016). https://doi.org/10.1186/s13059-016-0974-4

    Article  Google Scholar 

  39. He, M., Person, T.N., Hebbring, S.J., Heinzen, E., Ye, Z., Schrodi, S.J., McPherson, E.W., Lin, S.M., Peissig, P.L., Brilliant, M.H., et al.: Seqhbase: A big data toolset for family based sequencing data analysis. J. Med. Genet. 52, 282–288 (2015). https://doi.org/10.1136/jmedgenet-2014-102907

    Article  Google Scholar 

  40. Libbrecht, M.W., Noble, W.S.: Machine learning applications in genetics and genomics. Nat. Rev. Genet. 16, 321–332 (2015). https://doi.org/10.1038/nrg3920

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarma, H.K.D. (2022). Genomic Data and Big Data Analytics. In: Sarma, H.K.D., Balas, V.E., Bhuyan, B., Dutta, N. (eds) Contemporary Issues in Communication, Cloud and Big Data Analytics. Lecture Notes in Networks and Systems, vol 281. Springer, Singapore. https://doi.org/10.1007/978-981-16-4244-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-4244-9_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-4243-2

  • Online ISBN: 978-981-16-4244-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Navigation