Big Data Analytics: Views from Statistical and Computational Perspectives

  • Chapter
  • First Online:
Big Data Analytics

Abstract

Without any doubt, the most discussed current trend in computer science and statistics is BIG DATA. Different people think of different things when they hear about big data. For the statistician, the issues are how to get usable information out of datasets that are too huge and complex for many of the traditional or classical methods to handle. For the computer scientist, big data poses problems of data storage and management, communication, and computation. For the citizen, big data brings up questions of privacy and confidentiality. This introductory chapter touches some key aspects of big data and its analysis. Far from being an exhaustive overview of this fast emerging field, this is a discussion on statistical and computational views that the authors owe to many researchers, organizations, and online sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 87.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Kennedy R, King G, Lazer D, Vespignani A (2014) The parable of google flu. Traps in big data analysis. Science 343:1203–1205

    Google Scholar 

  2. Fokoue E (2015) A taxonomy of Big Data for optimal predictive machine learning and data mining. ar**v.1501.0060v1 [stat.ML] 3 Jan 2015

  3. Chandrasekaran V, Jodan MI (2013) Computational and statistical tradeoffs via convex relaxation. Proc Natl Acad Sci USA 110:E1181–E1190

    Article  MathSciNet  MATH  Google Scholar 

  4. Matloff N (2016) Big n versus big p in Big data. In: Bühlmann P, Drineas P (eds) Handbook of Big Data. CRC Press, Boca Raton, pp 21–32

    Google Scholar 

  5. Portnoy S (1988) Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann Stat 16:356–366

    Article  MathSciNet  MATH  Google Scholar 

  6. Tibshirani R (1996) Regression analysis and selection via the lasso. J R Stat Soc Ser B 58:267–288

    MathSciNet  MATH  Google Scholar 

  7. Report of National Research Council (2013) Frontiers in massive data analysis. National Academies Press, Washington D.C

    Google Scholar 

  8. Gama J (2010) Knowledge discovery from data streams. Chapman Hall/CRC, Boca Raton

    Book  MATH  Google Scholar 

  9. Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithms 55:58–75

    Article  MathSciNet  MATH  Google Scholar 

  10. Aggarwal C (2007) Data streams: models and algorithms. Springer, Berlin

    Google Scholar 

  11. Rastogi R, Guha S, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD, pp 73–84

    Google Scholar 

  12. Ma H, Zhao W, He C (2009) Parallel k-means clustering based on MapReduce. CloudCom, pp 674–679

    Google Scholar 

  13. Aflalo Y, Kimmel R (2013) Spectral multidimensional scaling. Proc Natl Acad Sci USA 110:18052–18057

    Article  MathSciNet  MATH  Google Scholar 

  14. Johnson WB, Lindenstrauss J (1984) Extensions of lipschitz map**s into a hilbert space. Contemp Math 26:189–206

    Article  MathSciNet  MATH  Google Scholar 

  15. Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the ICML, pp 186–193

    Google Scholar 

  16. Zimek A (2015) Clustering high-dimensional data. In: Data clustering: algorithms and applications. CRC Press, Boca Raton

    Google Scholar 

  17. University of California at Berkeley AMP Lab. https://amplab.cs.berkeley.edu/. Accessed April 2016

  18. Pyne S, Vullikanti A, Marathe M (2015) Big data applications in health sciences and epidemiology. In: Raghavan VV, Govindaraju V, Rao CR (eds) Handbook of statistics, vol 33. Big Data analytics. Elsevier, Oxford, pp 171–202

    Google Scholar 

  19. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives and prospects. Science 349(255–60):26

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saumyadipta Pyne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this chapter

Cite this chapter

Pyne, S., Prakasa Rao, B.L.S., Rao, S.B. (2016). Big Data Analytics: Views from Statistical and Computational Perspectives. In: Pyne, S., Rao, B., Rao, S. (eds) Big Data Analytics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3628-3_1

Download citation

Publish with us

Policies and ethics

Navigation