Introduction to Big Data Technology

  • Chapter
  • First Online:
Social Big Data Analytics

Abstract

Big data is no more “all just hype” but widely applied in nearly all aspects of our business, governments, and organizations with the technology stack of AI. Its influences are far beyond a simple technique innovation but involves all rears in the world. This chapter will first have historical review of big data; followed by discussion of characteristics of big data, i.e. from the 3V’s to up 10V’s of big data. The chapter then introduces technology stacks for an organization to build a big data application, from infrastructure/platform/ecosystem to constructional units and components. Finally, we provide some big data online resources for reference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The term “Information overload” has been popularized by Alvin Toffler in his book, Future Shock, 1971.

  2. 2.

    Containers are packages of software that includes everything that it needs to run, such as code, dependencies, libraries, and more. Container differs from Virtual Machines because container shares OS kernel rather than have a full copy of OS kernel for each VM.

  3. 3.

    https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html

  4. 4.

    https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

  5. 5.

    To avoid confusion, we always use AM to represent Application Master, which is per application based, and use full name of Application Manager, which is component of Resource manager.

  6. 6.

    https://spark.apache.org/docs/3.0.0-preview2/

  7. 7.

    https://spark.apache.org/graphx/

  8. 8.

    https://mesos.apache.org/

  9. 9.

    https://kubernetes.io/

  10. 10.

    https://www.alluxio.io/

  11. 11.

    https://cassandra.apache.org/

  12. 12.

    https://spark.apache.org/streaming/

  13. 13.

    https://giraph.apache.org/

  14. 14.

    https://en.wikipedia.org/wiki/GraphLab

  15. 15.

    Mob: Medium-sized Objects.

  16. 16.

    https://cloud.google.com/bigtable/docs/schema-design

  17. 17.

    https://hbase.apache.org/apache_hbase_reference_guide.pdf

  18. 18.

    http://nutch.apache.org/

  19. 19.

    https://en.wikipedia.org/wiki/CNET

  20. 20.

    https://lucene.apache.org/

  21. 21.

    https://tika.apache.org/

  22. 22.

    https://lucene.apache.org/solr/guide/8_5/a-quick-overview.html#a-quick-overview

  23. 23.

    https://ant.apache.org/

References

  1. Dumbill, E. (2012). Planning for big data. Sebastopol: O’Reilly Media, Inc.

    Google Scholar 

  2. Emrouznejad, A. (2016). Big data optimization: Recent developments and challenges (Studies in big data) (Vol. 18). Switzerland: Springer.

    Google Scholar 

  3. World Wide Web Consortium, Internet Live Stats. [01-04-2020]. Available from: http://www.internetlivestats.com/one-second/

  4. Gantz, J., & Reinsel, D. (2012). The digital universe in 2020: Big data, bigger digital shadows, and biggest growth in the far east. IDC iView: IDC Analyze the future, 2007(2012), 1–16.

    Google Scholar 

  5. Hudy, A. C. (2015). Turning the big data crush into an advantage. Information Management Journal, 49(1), 38–41.

    Google Scholar 

  6. Lammerant, H., & De Hert, P. (2016). Visions of technology. In Data protection on the move (pp. 163–194). Switzerland: Springer.

    Google Scholar 

  7. Partners, N., Big data executive survey 2016: Big data business impact: Achieving business results through innovation and disruption. 2017.

    Google Scholar 

  8. Chamorro-Premuzic, T. (2014). How the web distorts reality and impairs our judgement skills. The Guardian.

    Google Scholar 

  9. Rogers, P., Puryear, R., & Root, J. (2013). Infobesity: The enemy of good decisions (Vol. 11). Insights: Bain Brief.

    Google Scholar 

  10. Chaudhuri, S., Dayal, U., & Narasayya, V. (2011). An overview of business intelligence technology. Communications of the ACM, 54(8), 88–98.

    Article  Google Scholar 

  11. Gantz, J., & Reinsel, D. (2010). The digital universe decade-are you ready (pp. 1–16). External publication of IDC (Analyse the Future) information and data.

    Google Scholar 

  12. Joa, D., et al. (2012). Unstructured data integration with a data warehouse. Google Patents.

    Google Scholar 

  13. Tien, J. M. (2013). Big data: Unleashing information. Journal of Systems Science and Systems Engineering, 22(2), 127–151.

    Article  Google Scholar 

  14. Kitchin, R. (2014). The data revolution: Big data, open data, data infrastructures and their consequences. New York: Sage.

    Google Scholar 

  15. Gartner. (2015). Gartner survey shows more than 75 percent of companies are investing or planning to invest in big data in the next two years. Gartner Newsroom. [11/06/2017].

    Google Scholar 

  16. Hill, L., et al. (2015). Data-driven innovation for growth and well-being. Paris: OECD.

    Google Scholar 

  17. Meneghello, J., et al. (2020). Unlocking social media and user generated content as a data source for knowledge management. International Journal of Knowledge Management (IJKM), 16(1), 101–122.

    Google Scholar 

  18. Abu-Salih, B., et al. (2020). Time-aware domain-based social influence prediction. Journal of Big Data, 7(1), 10.

    Article  Google Scholar 

  19. Abu-Salih, B., et al. (2020). Relational learning analysis of social politics using knowledge graph embedding. ar**v, preprint ar**v:2006.01626.

    Google Scholar 

  20. Abu-Salih, B., et al. (2019). Social credibility incorporating semantic analysis and machine learning: A survey of the state-of-the-art and future research directions. Cham: Springer.

    Google Scholar 

  21. Sallam, R., et al. (2017). Magic quadrant for business intelligence and analytics platforms. Stamford: Gartner.

    Google Scholar 

  22. Phillipps, T. (2013). The analytics advantage we’re just getting started. New York: Deloitte.

    Google Scholar 

  23. Ghorbanian, M., Dolatabadi, S. H., & Siano, P. (2019). Big data issues in smart grids: A survey. IEEE Systems Journal, 13(4), 4158–4168.

    Article  Google Scholar 

  24. Chang, W.L. and N. Grady, NIST big data interoperability framework: Volume 1, big data definitions. 2015.

    Google Scholar 

  25. Favaretto, M., et al. (2020). What is your definition of big data? Researchers’ understanding of the phenomenon of the decade. PLoS One, 15(2), e0228987.

    Article  Google Scholar 

  26. Diebold, F. (2012). The origin (s) and development of “big data”: the phenomenon, the term, and the discipline. [línea]. Disponible en https://economics.sas.upenn.edu/sites/economics.sas.upenn.edu/files/12–037.pdf [última consulta: 16 de marzo de 2016].

  27. Diebold, F. (2003). Big data dynamic factor models. In Advances in economics and econometrics: Theory and applications, eighth world congress. Cambridge: Cambridge University Press.

    Google Scholar 

  28. Commission, E. (2015). The EU data protection reform and Big Data [Fact sheet].

    Google Scholar 

  29. A short history of Big Data. Where does ‘Big Data’ come from? (2019). Available from: https://www.bigdataframework.org/short-history-of-big-data/

  30. Ward, J. S., & Barker, A. (2013). Undefined by data: a survey of big data definitions. ar**v, preprint ar**v:1309.5821.

    Google Scholar 

  31. De Mauro, A., Greco, M., & Grimaldi, M. (2015). What is big data? A consensual definition and a review of key research topics. In AIP conference proceedings. College Park: American Institute of Physics.

    Google Scholar 

  32. Chan, K. Y., et al. (2018). Affective design using machine learning: A survey and its prospect of conjoining big data. International Journal of Computer Integrated Manufacturing, 1–25.

    Google Scholar 

  33. Abu-Salih, B., et al. (2018). CredSaT: Credibility ranking of users in big social data incorporating semantic analysis and temporal factor. Journal of Information Science, 45(2), 259–280.

    Article  Google Scholar 

  34. Abu-Salih, B., Wongthongtham, P., & Chan, K. Y. (2018). Twitter mining for ontology-based domain discovery incorporating machine learning. Journal of Knowledge Management, 22(5), 949–981.

    Article  Google Scholar 

  35. Abu-Salih, B. (2020). Domain-specific knowledge graphs: A survey. ar**v, preprint ar**v:2011.00235.

    Google Scholar 

  36. Wongthongtham, P., & Abu-Salih, B. (2015). Ontology and trust based data warehouse in new generation of business intelligence: State-of-the-art, challenges, and opportunities. In Industrial Informatics (INDIN), 2015 IEEE 13th International Conference on. Cambridge: IEEE.

    Google Scholar 

  37. Firican, G. (2017). The 10 Vs of Big Data. [30-03-2020]. Available from: https://tdwi.org/articles/2017/02/08/10-vs-of-big-data.aspx

  38. Khan, N., et al. (2018). The 10 Vs, issues and challenges of big data. In Proceedings of the 2018 International Conference on Big Data and Education.

    Google Scholar 

  39. Abu-Salih, B., Alsawalqah, H., Elshqeirat, B., Issa, T., & Wongthongtham, P. (2019). Toward a knowledge-based personalised recommender system for mobile app development. ar**v, preprint ar**v:1909.03733.

    Google Scholar 

  40. Wongthongtham, P., et al. (2018). State-of-the-art ontology annotation for personalised teaching and learning and prospects for smart learning recommender based on multiple intelligence and fuzzy ontology. International Journal of Fuzzy Systems, 20(4), 1357–1372.

    Article  Google Scholar 

  41. Wongthongtham, P., & Abu-Salih, B. (2018). Ontology-based approach for identifying the credibility domain in social big data. Journal of Organizational Computing and Electronic Commerce, 28(4), 354–377.

    Article  Google Scholar 

  42. Nabipourshiri, R., Abu-Salih, B., & Wongthongtham, P. (2018). Tree-based classification to users’ trustworthiness in OSNs. In Proceedings of the 2018 10th International Conference on Computer and Automation Engineering (pp. 190–194). Brisbane: ACM.

    Chapter  Google Scholar 

  43. Chan, K. Y., et al. (2018). Affective design using machine learning: A survey and its prospect of conjoining big data. International Journal of Computer Integrated Manufacturing, 33(7), 645–669.

    Article  Google Scholar 

  44. Sagiroglu, S., & Sinanc, D. (2013). Big data: A review. In 2013 International conference on collaboration technologies and systems (CTS). San Diego: IEEE.

    Google Scholar 

  45. Manyika, J., et al. (2011). Big data: The next frontier for innovation, competition, and productivity. Washington: McKinsey Global Institute.

    Google Scholar 

  46. Jacobson, R. (2013). 2.5 quintillion bytes of data created every day. How does CPG & Retail manage it. In IBM.

    Google Scholar 

  47. Furht, B., & Villanustre, F. (2016). Introduction to big data. In Big data technologies and applications (pp. 3–11). Switzerland: Springer.

    Google Scholar 

  48. Hofmann, E. (2017). Big data and supply chain decisions: The impact of volume, variety and velocity properties on the bullwhip effect. International Journal of Production Research, 55(17), 5108–5126.

    Article  Google Scholar 

  49. Rubin, V., & Lukoianova, T. (2013). Veracity roadmap: Is big data objective, truthful and credible? Advances in Classification Research Online, 24(1), 4.

    Google Scholar 

  50. Demchenko, Y., et al. (2013). Addressing big data issues in scientific data infrastructure. In Collaboration Technologies and Systems (CTS), 2013 International Conference on. San Diego: IEEE.

    Google Scholar 

  51. Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137–144.

    Article  Google Scholar 

  52. Fan, W., & Bifet, A. (2013). Mining big data. ACM SIGKDD Explorations Newsletter, 14(2), 1.

    Article  Google Scholar 

  53. Jukić, N., et al. (2015). Augmenting data warehouses with big data. Information Systems Management, 32(3), 200–209.

    Article  Google Scholar 

  54. Kacfah Emani, C., Cullot, N., & Nicolle, C. (2015). Understandable big data: A survey. Computer Science Review, 17, 70–81.

    Article  Google Scholar 

  55. Hitzler, P., & Janowicz, K. (2013). Linked data, big data, and the 4th paradigm. Semantic Web, 4(3), 233–235.

    Article  Google Scholar 

  56. Wasser, T., et al. (2015). Using ‘big data’to validate claims made in the pharmaceutical approval process. Journal of Medical Economics, 18(12), 1013–1019.

    Article  Google Scholar 

  57. Uddin, M. F., & Gupta, N. (2014). Seven V’s of Big Data understanding Big Data to extract value. In Proceedings of the 2014 zone 1 conference of the American Society for Engineering Education. Bridgeport: IEEE.

    Google Scholar 

  58. Hackenberger, B. K. (2019). Data by data, Big Data. Croatian Medical Journal, 60(3), 290.

    Article  Google Scholar 

  59. Quick, M., et al. (2017, April). World’s biggest data breaches. 16, 2017. https://Informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks. Accessed April, 2017.

  60. Armerding, T. (2018). The 17 biggest data breaches of the 21st century. CSO online, 26.

    Google Scholar 

  61. Asokan, G., & Asokan, V. (2015). Leveraging “big data” to enhance the effectiveness of “one health” in an era of health informatics. Journal of Epidemiology and Global Health, 5(4), 311–314.

    Article  Google Scholar 

  62. Sun, G., Li, F., & Jiang, W. (2019). Brief talk about big data graph analysis and visualization. Journal on Big Data, 1(1), 25.

    Google Scholar 

  63. Elgendy, N., & Elragal, A. (2014). Big data analytics: A literature review paper. In Industrial conference on data mining. Shenzhen: Springer.

    Google Scholar 

  64. Armbrust, M., et al. (2010). A view of cloud computing. Communication of the ACM, 53(4), 50–58.

    Article  Google Scholar 

  65. Mell, P., & Grance, T. (2011). The NIST definition of cloud computing (p. 7). Gaithersburg: Information Technology Laboratory National Institute of Standards and Technology.

    Google Scholar 

  66. Modi, R. (2017). Azure for architects. Birmingham, Mumbai: Packt.

    Google Scholar 

  67. Vidwans, R., & Wessler, M. (2013). IDaaS for dummies – A Wiley brand. Hoboken: Wiley.

    Google Scholar 

  68. Carey, S. (2020). AWS vs Azure vs Google Cloud: What’s the best cloud platform for enterprise? In Computer World. New York: IDG Communications Ltd.

    Google Scholar 

  69. Baum, D. (2020). Could data lakes for dummies – Snowflake special edition (p. 44). Hoboken: Wiley.

    Google Scholar 

  70. Codd, E. F. (1970). A relational model of data for large shared data banks. Communication of the ACM, 13(6), 377–387.

    Article  Google Scholar 

  71. Joe, K., & Baum, D. (2020). Cloud data warehousing for dummies – 2nd snowflake special edition. Hoboken: Wiley.

    Google Scholar 

  72. Aslett, M. (2019). The rise of the enterprise intelligence platform (vol. 451, p. 4). Research, LLC: WWW.451RESEARCH.COM

  73. Cloudera. (2019). Overview of CDP Data Center. [March 19, 2020]. Available from: https://docs.cloudera.com/cdpdc/7.0/overview/cdpdc-overview.pdf

  74. White, T. (2015). Hadoop: The definitive guide (4th ed., p. 727). Sebastopol: O’Reilly Media, Inc.

    Google Scholar 

  75. Engle, C., et al. (2020). Shark: Fast data analysis using coarse-grained distributed memory. In SIGMOD ‘12: Proceedings of the 2012 ACM SIGMOD international conference on management of data (pp. 689–692). Scottsdale: ACM.

    Google Scholar 

  76. Karau, H., et al. (2015). Learning spark – Lighting-fast data analysis (1st ed.). Sebastopol: O’Reilly Media, Inc.

    Google Scholar 

  77. Armbrust, M., et al. (2015). Spark SQL: Relational data processing in spark. In SIGMOD ‘15: Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp. 1383–1394). Melbourne: ACM.

    Chapter  Google Scholar 

  78. George, L. (2011). HBase: The definitive guide (p. 522). Sebastopol: O’Reilly Media, Inc.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Abu-Salih, B., Wongthongtham, P., Zhu, D., Chan, K.Y., Rudra, A. (2021). Introduction to Big Data Technology. In: Social Big Data Analytics. Springer, Singapore. https://doi.org/10.1007/978-981-33-6652-7_2

Download citation

Publish with us

Policies and ethics

Navigation