Abstract
Without any doubt, the most discussed current trend in computer science and statistics is BIG DATA. Different people think of different things when they hear about big data. For the statistician, the issues are how to get usable information out of datasets that are too huge and complex for many of the traditional or classical methods to handle. For the computer scientist, big data poses problems of data storage and management, communication, and computation. For the citizen, big data brings up questions of privacy and confidentiality. This introductory chapter touches some key aspects of big data and its analysis. Far from being an exhaustive overview of this fast emerging field, this is a discussion on statistical and computational views that the authors owe to many researchers, organizations, and online sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kennedy R, King G, Lazer D, Vespignani A (2014) The parable of google flu. Traps in big data analysis. Science 343:1203–1205
Fokoue E (2015) A taxonomy of Big Data for optimal predictive machine learning and data mining. ar**v.1501.0060v1 [stat.ML] 3 Jan 2015
Chandrasekaran V, Jodan MI (2013) Computational and statistical tradeoffs via convex relaxation. Proc Natl Acad Sci USA 110:E1181–E1190
Matloff N (2016) Big n versus big p in Big data. In: Bühlmann P, Drineas P (eds) Handbook of Big Data. CRC Press, Boca Raton, pp 21–32
Portnoy S (1988) Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann Stat 16:356–366
Tibshirani R (1996) Regression analysis and selection via the lasso. J R Stat Soc Ser B 58:267–288
Report of National Research Council (2013) Frontiers in massive data analysis. National Academies Press, Washington D.C
Gama J (2010) Knowledge discovery from data streams. Chapman Hall/CRC, Boca Raton
Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithms 55:58–75
Aggarwal C (2007) Data streams: models and algorithms. Springer, Berlin
Rastogi R, Guha S, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD, pp 73–84
Ma H, Zhao W, He C (2009) Parallel k-means clustering based on MapReduce. CloudCom, pp 674–679
Aflalo Y, Kimmel R (2013) Spectral multidimensional scaling. Proc Natl Acad Sci USA 110:18052–18057
Johnson WB, Lindenstrauss J (1984) Extensions of lipschitz map**s into a hilbert space. Contemp Math 26:189–206
Fern XZ, Brodley CE (2003) Random projection for high dimensional data clustering: a cluster ensemble approach. In: Proceedings of the ICML, pp 186–193
Zimek A (2015) Clustering high-dimensional data. In: Data clustering: algorithms and applications. CRC Press, Boca Raton
University of California at Berkeley AMP Lab. https://amplab.cs.berkeley.edu/. Accessed April 2016
Pyne S, Vullikanti A, Marathe M (2015) Big data applications in health sciences and epidemiology. In: Raghavan VV, Govindaraju V, Rao CR (eds) Handbook of statistics, vol 33. Big Data analytics. Elsevier, Oxford, pp 171–202
Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives and prospects. Science 349(255–60):26
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this chapter
Cite this chapter
Pyne, S., Prakasa Rao, B.L.S., Rao, S.B. (2016). Big Data Analytics: Views from Statistical and Computational Perspectives. In: Pyne, S., Rao, B., Rao, S. (eds) Big Data Analytics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3628-3_1
Download citation
DOI: https://doi.org/10.1007/978-81-322-3628-3_1
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-3626-9
Online ISBN: 978-81-322-3628-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)