Abstract
Big data is no more “all just hype” but widely applied in nearly all aspects of our business, governments, and organizations with the technology stack of AI. Its influences are far beyond a simple technique innovation but involves all rears in the world. This chapter will first have historical review of big data; followed by discussion of characteristics of big data, i.e. from the 3V’s to up 10V’s of big data. The chapter then introduces technology stacks for an organization to build a big data application, from infrastructure/platform/ecosystem to constructional units and components. Finally, we provide some big data online resources for reference.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The term “Information overload” has been popularized by Alvin Toffler in his book, Future Shock, 1971.
- 2.
Containers are packages of software that includes everything that it needs to run, such as code, dependencies, libraries, and more. Container differs from Virtual Machines because container shares OS kernel rather than have a full copy of OS kernel for each VM.
- 3.
- 4.
- 5.
To avoid confusion, we always use AM to represent Application Master, which is per application based, and use full name of Application Manager, which is component of Resource manager.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
Mob: Medium-sized Objects.
- 16.
- 17.
- 18.
- 19.
- 20.
- 21.
- 22.
- 23.
References
Dumbill, E. (2012). Planning for big data. Sebastopol: O’Reilly Media, Inc.
Emrouznejad, A. (2016). Big data optimization: Recent developments and challenges (Studies in big data) (Vol. 18). Switzerland: Springer.
World Wide Web Consortium, Internet Live Stats. [01-04-2020]. Available from: http://www.internetlivestats.com/one-second/
Gantz, J., & Reinsel, D. (2012). The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze the future, 2007(2012), 1–16.
Hudy, A. C. (2015). Turning the big data crush into an advantage. Information Management Journal, 49(1), 38–41.
Lammerant, H., & De Hert, P. (2016). Visions of technology. In Data protection on the move (pp. 163–194). Switzerland: Springer.
Partners, N., Big data executive survey 2016: Big data business impact: Achieving business results through innovation and disruption. 2017.
Chamorro-Premuzic, T. (2014). How the web distorts reality and impairs our judgement skills. The Guardian.
Rogers, P., Puryear, R., & Root, J. (2013). Infobesity: The enemy of good decisions (Vol. 11). Insights: Bain Brief.
Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of business intelligence technology. Communications of the ACM, 54(8), 88–98.
Gantz, J., & Reinsel, D. (2010). The digital universe decade-are you ready (pp. 1–16). External publication of IDC (Analyse the Future) information and data.
Joa, D., et al. (2012). Unstructured data integration with a data warehouse. Google Patents.
Tien, J. M. (2013). Big data: Unleashing information. Journal of Systems Science and Systems Engineering, 22(2), 127–151.
Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. New York: Sage.
Gartner. (2015). Gartner survey shows more than 75 percent of companies are investing or planning to invest in big data in the next two years. Gartner Newsroom. [11/06/2017].
Hill, L., et al. (2015). Data-driven innovation for growth and well-being. Paris: OECD.
Meneghello, J., et al. (2020). Unlocking social media and user generated content as a data source for knowledge management. International Journal of Knowledge Management (IJKM), 16(1), 101–122.
Abu-Salih, B., et al. (2020). Time-aware domain-based social influence prediction. Journal of Big Data, 7(1), 10.
Abu-Salih, B., et al. (2020). Relational learning analysis of social politics using knowledge graph embedding. ar**v, preprint ar**v:2006.01626.
Abu-Salih, B., et al. (2019). Social credibility incorporating semantic analysis and machine learning: A survey of the state-of-the-art and future research directions. Cham: Springer.
Sallam, R., et al. (2017). Magic quadrant for business intelligence and analytics platforms. Stamford: Gartner.
Phillipps, T. (2013). The analytics advantage we’re just getting started. New York: Deloitte.
Ghorbanian, M., Dolatabadi, S. H., & Siano, P. (2019). Big data issues in smart grids: A survey. IEEE Systems Journal, 13(4), 4158–4168.
Chang, W.L. and N. Grady, NIST big data interoperability framework: Volume 1, big data definitions. 2015.
Favaretto, M., et al. (2020). What is your definition of big data? Researchers’ understanding of the phenomenon of the decade. PLoS One, 15(2), e0228987.
Diebold, F. (2012). The origin (s) and development of “big data”: the phenomenon, the term, and the discipline. [línea]. Disponible en https://economics.sas.upenn.edu/sites/economics.sas.upenn.edu/files/12–037.pdf [última consulta: 16 de marzo de 2016].
Diebold, F. (2003). Big data dynamic factor models. In Advances in economics and econometrics: Theory and applications, eighth world congress. Cambridge: Cambridge University Press.
Commission, E. (2015). The EU data protection reform and Big Data [Fact sheet].
A short history of Big Data. Where does ‘Big Data’ come from? (2019). Available from: https://www.bigdataframework.org/short-history-of-big-data/
Ward, J. S., & Barker, A. (2013). Undefined by data: a survey of big data definitions. ar**v, preprint ar**v:1309.5821.
De Mauro, A., Greco, M., & Grimaldi, M. (2015). What is big data? A consensual definition and a review of key research topics. In AIP conference proceedings. College Park: American Institute of Physics.
Chan, K. Y., et al. (2018). Affective design using machine learning: A survey and its prospect of conjoining big data. International Journal of Computer Integrated Manufacturing, 1–25.
Abu-Salih, B., et al. (2018). CredSaT: Credibility ranking of users in big social data incorporating semantic analysis and temporal factor. Journal of Information Science, 45(2), 259–280.
Abu-Salih, B., Wongthongtham, P., & Chan, K. Y. (2018). Twitter mining for ontology-based domain discovery incorporating machine learning. Journal of Knowledge Management, 22(5), 949–981.
Abu-Salih, B. (2020). Domain-specific knowledge graphs: A survey. ar**v, preprint ar**v:2011.00235.
Wongthongtham, P., & Abu-Salih, B. (2015). Ontology and trust based data warehouse in new generation of business intelligence: State-of-the-art, challenges, and opportunities. In Industrial Informatics (INDIN), 2015 IEEE 13th International Conference on. Cambridge: IEEE.
Firican, G. (2017). The 10 Vs of Big Data. [30-03-2020]. Available from: https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx
Khan, N., et al. (2018). The 10 Vs, issues and challenges of big data. In Proceedings of the 2018 International Conference on Big Data and Education.
Abu-Salih, B., Alsawalqah, H., Elshqeirat, B., Issa, T., & Wongthongtham, P. (2019). Toward a knowledge-based personalised recommender system for mobile app development. ar**v, preprint ar**v:1909.03733.
Wongthongtham, P., et al. (2018). State-of-the-art ontology annotation for personalised teaching and learning and prospects for smart learning recommender based on multiple intelligence and fuzzy ontology. International Journal of Fuzzy Systems, 20(4), 1357–1372.
Wongthongtham, P., & Abu-Salih, B. (2018). Ontology-based approach for identifying the credibility domain in social big data. Journal of Organizational Computing and Electronic Commerce, 28(4), 354–377.
Nabipourshiri, R., Abu-Salih, B., & Wongthongtham, P. (2018). Tree-based classification to users’ trustworthiness in OSNs. In Proceedings of the 2018 10th International Conference on Computer and Automation Engineering (pp. 190–194). Brisbane: ACM.
Chan, K. Y., et al. (2018). Affective design using machine learning: A survey and its prospect of conjoining big data. International Journal of Computer Integrated Manufacturing, 33(7), 645–669.
Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. In 2013 International conference on collaboration technologies and systems (CTS). San Diego: IEEE.
Manyika, J., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. Washington: McKinsey Global Institute.
Jacobson, R. (2013). 2.5 quintillion bytes of data created every day. How does CPG & Retail manage it. In IBM.
Furht, B., & Villanustre, F. (2016). Introduction to big data. In Big data technologies and applications (pp. 3–11). Switzerland: Springer.
Hofmann, E. (2017). Big data and supply chain decisions: The impact of volume, variety and velocity properties on the bullwhip effect. International Journal of Production Research, 55(17), 5108–5126.
Rubin, V., & Lukoianova, T. (2013). Veracity roadmap: Is big data objective, truthful and credible? Advances in Classification Research Online, 24(1), 4.
Demchenko, Y., et al. (2013). Addressing big data issues in scientific data infrastructure. In Collaboration Technologies and Systems (CTS), 2013 International Conference on. San Diego: IEEE.
Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144.
Fan, W., & Bifet, A. (2013). Mining big data. ACM SIGKDD Explorations Newsletter, 14(2), 1.
Jukić, N., et al. (2015). Augmenting data warehouses with big data. Information Systems Management, 32(3), 200–209.
Kacfah Emani, C., Cullot, N., & Nicolle, C. (2015). Understandable big data: A survey. Computer Science Review, 17, 70–81.
Hitzler, P., & Janowicz, K. (2013). Linked data, big data, and the 4th paradigm. Semantic Web, 4(3), 233–235.
Wasser, T., et al. (2015). Using ‘big data’to validate claims made in the pharmaceutical approval process. Journal of Medical Economics, 18(12), 1013–1019.
Uddin, M. F., & Gupta, N. (2014). Seven V’s of Big Data understanding Big Data to extract value. In Proceedings of the 2014 zone 1 conference of the American Society for Engineering Education. Bridgeport: IEEE.
Hackenberger, B. K. (2019). Data by data, Big Data. Croatian Medical Journal, 60(3), 290.
Quick, M., et al. (2017, April). World’s biggest data breaches. 16, 2017. https://Informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks. Accessed April, 2017.
Armerding, T. (2018). The 17 biggest data breaches of the 21st century. CSO online, 26.
Asokan, G., & Asokan, V. (2015). Leveraging “big data” to enhance the effectiveness of “one health” in an era of health informatics. Journal of Epidemiology and Global Health, 5(4), 311–314.
Sun, G., Li, F., & Jiang, W. (2019). Brief talk about big data graph analysis and visualization. Journal on Big Data, 1(1), 25.
Elgendy, N., & Elragal, A. (2014). Big data analytics: A literature review paper. In Industrial conference on data mining. Shenzhen: Springer.
Armbrust, M., et al. (2010). A view of cloud computing. Communication of the ACM, 53(4), 50–58.
Mell, P., & Grance, T. (2011). The NIST definition of cloud computing (p. 7). Gaithersburg: Information Technology Laboratory National Institute of Standards and Technology.
Modi, R. (2017). Azure for architects. Birmingham, Mumbai: Packt.
Vidwans, R., & Wessler, M. (2013). IDaaS for dummies – A Wiley brand. Hoboken: Wiley.
Carey, S. (2020). AWS vs Azure vs Google Cloud: What’s the best cloud platform for enterprise? In Computer World. New York: IDG Communications Ltd.
Baum, D. (2020). Could data lakes for dummies – Snowflake special edition (p. 44). Hoboken: Wiley.
Codd, E. F. (1970). A relational model of data for large shared data banks. Communication of the ACM, 13(6), 377–387.
Joe, K., & Baum, D. (2020). Cloud data warehousing for dummies – 2nd snowflake special edition. Hoboken: Wiley.
Aslett, M. (2019). The rise of the enterprise intelligence platform (vol. 451, p. 4). Research, LLC: WWW.451RESEARCH.COM
Cloudera. (2019). Overview of CDP Data Center. [March 19, 2020]. Available from: https://docs.cloudera.com/cdpdc/7.0/overview/cdpdc-overview.pdf
White, T. (2015). Hadoop: The definitive guide (4th ed., p. 727). Sebastopol: O’Reilly Media, Inc.
Engle, C., et al. (2020). Shark: Fast data analysis using coarse-grained distributed memory. In SIGMOD ‘12: Proceedings of the 2012 ACM SIGMOD international conference on management of data (pp. 689–692). Scottsdale: ACM.
Karau, H., et al. (2015). Learning spark – Lighting-fast data analysis (1st ed.). Sebastopol: O’Reilly Media, Inc.
Armbrust, M., et al. (2015). Spark SQL: Relational data processing in spark. In SIGMOD ‘15: Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp. 1383–1394). Melbourne: ACM.
George, L. (2011). HBase: The definitive guide (p. 522). Sebastopol: O’Reilly Media, Inc.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Abu-Salih, B., Wongthongtham, P., Zhu, D., Chan, K.Y., Rudra, A. (2021). Introduction to Big Data Technology. In: Social Big Data Analytics. Springer, Singapore. https://doi.org/10.1007/978-981-33-6652-7_2
Download citation
DOI: https://doi.org/10.1007/978-981-33-6652-7_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6651-0
Online ISBN: 978-981-33-6652-7
eBook Packages: Business and ManagementBusiness and Management (R0)