Abstract
Although many attempts have been made to define data science, such a definition has not yet been reached. One reason for the difficulty to reach a single, consensus definition for data science is its multifaceted nature: it can be described as a science, as a research method, as a discipline, as a workflow, or as a profession. One single definition just cannot capture this diverse essence of data science. In this chapter, we first take an interdisciplinary perspective and review the background for the development of data science (Sect. 2.1). Then we present data science from several perspectives: data science as a science (Sect. 2.2), data science as a research method (Sect. 2.3), data science as a discipline (Sect. 2.4), data science as a workflow (Sect. 2.5), and data science as a profession (Sect. 2.6). We conclude by highlighting three main characteristics of data science: interdisciplinarity, learner diversity, and its research-oriented nature (Sect. 2.7).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Earth image was originally posted to Flickr by DonkeyHotey at https://flickr.com/photos/47422005@N04/5679642883. It was reviewed on 4 December 2020 by FlickreviewR 2 and was confirmed to be licensed under the terms of the cc-by-2.0.
References
Al-Hashedi, K. G., & Magalingam, P. (2021). Financial fraud detection applying data mining techniques: A comprehensive review from 2009 to 2019. Computer Science Review, 40, 100402.
Alvargonza, D. (2011). Multidisciplinarity interdisciplinarity transdisciplinarity and the sciences. International Studies in the Philosophy Science, 25(4), 387–403.
Berman, F. (co-chair), Rutenbar, R. (co-chair), Christensen, H., Davidson, S., Estrin, D., Franklin, M., Hailpern, B., Martonosi, M., Raghavan, P., Stodden, V., & Szalay, A. (2016). Realizing the potential of data science: Final report from the national science foundation computer and information science and engineering advisory committee data science working group. National Science Foundation Computer and Information Science and Engineering Advisory Committee Report, December 2016; https://www.nsf.gov/cise/ac-data-science-report/CISEACDataScienceReport1.19.17.pdf
Berman, F., Rutenbar, R., Hailpern, B., Christensen, H., Davidson, S., Estrin, D., Franklin, M., Martonosi, M., Raghavan, P., Stodden, V., & Szalay, A. S. (2018). Realizing the potential of data science. Communications of the ACM, 61(4), 67–72. https://doi.org/10.1145/3188721
Cassel, B., & Topi, H. (2015). Strengthening data science education through collaboration: Workshop report 7-27-2016. Arlington, VA.
Chang, W., & Grady, N. (2019). NIST big data interoperability framework: Volume 1, Definitions, Special Publication (NIST SP). National Institute of Standards and Technology, [online], https://doi.org/10.6028/NIST.SP.1500-1r2
Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review, 69(1), 21–26.
Conway, D. (2010). The data science venn diagram. Datist. http://www.dataists.com/2010/09/the-data-science-venn-diagram/
Cox, M., & Ellsworth, D. (1997). Managing big data for scientific visualization. ACM Siggraph, 97(1), 21–38.
Danyluk, A., & Leidig, P. (2021). Computing competencies for undergraduate data science curricula. https://www.acm.org/binaries/content/assets/education/curricula-recommendations/dstf_ccdsc2021.pdf
Davenport, T. H., & Patil, D. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(5), 70–76.
Donoho, D. (2017). 50 years of data science. Journal of Computational and Graphical Statistics, 26(4), 745–766.
Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37–37.
Gray, J. (2007). EScience—A transformed scientific method. http://research.microsoft.com/en-us/um/people/gray/talks/NRC-CSTB_eScience.ppt
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2021). Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 395–419.
Harris, H., Murphy, S., & Vaisman, M. (2013). Analyzing the analyzers: An introspective survey of data scientists and their work. O’Reilly Media, Inc.
Hey, T., Tansley, S., Tolle, K., & Gray, J. (2009). The fourth paradigm: Data-intensive scientific discovery (vol. 1). Microsoft research Redmond.
Irizarry, R. A. (2020). The role of academia in data science education. Harvard Data Science Review, 2(1). https://doi.org/10.1162/99608f92.dd363929
Ishaq, A., Sadiq, S., Umer, M., Ullah, S., Mirjalili, S., Rupapara, V., & Nappi, M. (2021). Improving the prediction of heart failure patients’ survival using SMOTE and effective data mining techniques. IEEE Access, 9, 39707–39716.
Jeff Wu, C. F. (2021). In Wikipedia. https://en.wikipedia.org/w/index.php?title=C._F._Jeff_Wu&oldid=1049935836
Johnstone, I., & Roberts, F. (2014). Data science at NSF. https://www.nsf.gov/attachments/129788/public/Final_StatSNSFJan14.pdf
Lovell, M. C. (1983). Data mining. The Review of Economics and Statistics, 65(1), 1–12.
Mohebbi, M., Vanderkam, D., Kodysh, J., Schonberger, R., Choi, H., & Kumar, S. (2011). Google correlate whitepaper.
Naur, P. (1966). The science of datalogy. Communications of the ACM, 9(7), 485.
National Science Board. (2005). Long-Lived digital data collections: Enabling research and education in the 21st century. National Science Foundation Report NSB-05-04, September 2005. http://www.nsf.gov/pubs/2005/nsb05040
Piatetsky-Shapiro, G. (1990). Knowledge discovery in real databases: A report on the IJCAI-89 workshop. AI Magazine, 11(4), 68–68.
Piatetsky-Shapiro, G. (2000). Knowledge discovery in databases: 10 years after. Acm Sigkdd Explorations Newsletter, 1(2), 59–61.
Prebor, G. (2021). When feminism meets social networks. Library Hi Tech.
Provost, F., & Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc.
Shearer, C. (2000). The CRISP-DM model: The new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22.
Skiena, S. S. (2017). The data science design manual. Springer.
Su, Y.-S., & Wu, S.-Y. (2021). Applying data mining techniques to explore user behaviors and watching video patterns in converged IT environments. Journal of Ambient Intelligence and Humanized Computing, 1–8.
Taylor, D. (2016). Battle of the data science venn diagrams. KDnuggets. https://www.kdnuggets.com/battle-of-the-data-science-venn-diagrams.html/
Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical Statistics, 33(1), 1–67.
Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E., Lee, G., Li, B., Madabhushi, A., Shah, P., Spitzer, M., & Zhao, S. (2019). Applications of machine learning in drug discovery and development. Nature Reviews Drug Discovery, 18(6), 463–477.
Wu, J. (1997). Statistics = Data Science? http://www2.isye.gatech.edu/~jeffwu/presentations/datascience.pdf
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hazzan, O., Mike, K. (2023). What is Data Science?. In: Guide to Teaching Data Science. Springer, Cham. https://doi.org/10.1007/978-3-031-24758-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-24758-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24757-6
Online ISBN: 978-3-031-24758-3
eBook Packages: Computer ScienceComputer Science (R0)