Abstract
Selecting data sources is a crucial step in providing a useful information base to support decision-makers. While any data source can represent a potential added value in decision making, it’s integration always implies a representative effort. For decision-makers, data sources must contain relevant information in an appropriate scope. The data scientist must assess whether the integration of the data sources is technically possible and how much effort is required. Therefore, a taxonomy was developed to identify the relevant data sources for the decision-maker and minimize the data integration effort. The taxonomy was developed and evaluated with real data sources and six companies from different industries. The final taxonomy consists of sixteen dimensions that support the data scientist and decision-maker in selecting data sources for the big data integration process. An efficient and effective big data integration process can be carried out with a minimum of data sources to be integrated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Powered by Crunchbase: https://data.crunchbase.com/docs/open-data-map.
- 6.
Crunchbase 2013 Snapshot ©, Creative Commons Attribution License [CC-BY], https://data.crunchbase.com/docs/2013-snapshot.
- 7.
- 8.
- 9.
At the time of access still freely available: https://public.enigma.com/browse/collection/stock-exchanges-company-listings/50a2457d-6407-4581-8f14-5d37a9410fa9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
References
Stonebraker, M., Ilyas, I.: Data integration: the current status and the way forward. IEEE Data Eng. Bull. 41, 3–9 (2018)
Dong, X.L., Srivastava, D.: Big data integration. Synth. Lect. Data Manag. 7, 1–198 (2015)
Abbasi, A., Sarker, S., Chiang, R.: Big data research in information systems: toward an inclusive research agenda. J. Assoc. Inf. Syst. 17(2), I–XXXII (2016). https://doi.org/10.17705/1jais.00423
Lin, Y., Wang, H., Li, J., Gao, H.: Data Source Selection for Information Integration in Big Data Era (2016)
Christen, P.: Data linkage: the big picture. Harvard Data Sci. Rev. (2019). https://doi.org/10.1162/99608f92.84deb5c4
Wirth, R.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, pp. 29–39 (2000)
Kruse, F., Dmitriyev, V., Marx Gómez, J.: Building a connection between decision maker and data-driven decision process. Archives of Data Science, Series A (Online First) 4, 16 S. online (2018)
Nickerson, R.C., Varshney, U., Muntermann, J.: A method for taxonomy development and its application in information systems. Eur. J. Inf. Syst. 22, 336–359 (2013)
Kruse, F., Hassan, A.P., Awick, J.-P., Marx Gómez, J.: A qualitative literature review on linkage techniques for data integration. In: Bui, T. (ed.) 53rd Hawaii International Conference on System Sciences, HICSS 2020, Grand Wailea, Maui, Hawaii, USA, January 7–10, 2020, pp. 1063–1073. ScholarSpace/AIS Electronic Library (AISeL) (2020)
Konda, P., et al.: Magellan: toward building entity matching management systems. Proc. VLDB Endow. 9(12), 1197–1208 (2016). https://doi.org/10.14778/2994509.2994535
Konda, P., Subramanian Seshadri, S., Segarra, E., Hueth, B., Doan, A.: Executing entity matching end to end: a case study. In: Herschel, M., Galhardas, H., Reinwald, B., Fundulaki, I., Binnig, C., Kaoudi, Z. (eds.) Advances in Database Technology – 22nd International Conference on Extending Database Technology, EDBT 2019, pp. 489–500. OpenProceedings.org, Lisbon, Portugal, 26–29 March 2019 (2019)
Govind, Y., et al.: Entity Matching Meets Data Science: A Progress Report from the Magellan Project (2019)
Safhi, H.M., Frikh, B., Ouhbi, B.: Data source selection in big data context. In: Indrawan-Santiago, M., Pardede, E., Salvadori, I.L., Steinbauer, M., Khalil, I., Anderst-Kotsis, G. (eds.) Proceedings of the 21st International Conference on Information Integration and Web-based Applications and Services, pp. 611–616. ACM, New York, NY, USA (2019)
Assaf, A., Senart, A., Troncy, R.: Towards an objective assessment framework for linked data quality. Int. J. Semant. Web Inf. Syst. 12, 111–133 (2016)
Dong, X.L., Saha, B., Srivastava, D.: Less is more: selecting sources wisely for integration. Proc. VLDB Endow. 6(2), 37–48 (2012). https://doi.org/10.14778/2535568.2448938
Rekatsinas, T., Dong, X.L., Srivastava, D.: Characterizing and selecting fresh data sources. In: Dyreson, C., Li, F., Özsu, M.T. (eds.) Proceedings of the 2014 ACM SIGMOD international conference on Management of data – SIGMOD ‘14, pp. 919–930. ACM Press, New York, New York, USA (2014)
Zrenner, J., Hassan, A.P., Otto, B., Marx Gómez, J.C.: Data source taxonomy for supply network structure visibility. Epubli (2017)
Li, L., Peng, T., Kennedy, J.: A rule based taxonomy of dirty data. GSTF Int. J. Comput. 1 (2011)
Roeder, J., Muntermann, J., Kneib, T.: Towards a taxonomy of data heterogeneity. In: Gronau, N., Heine, M., Poustcchi, K., Krasnova, H. (eds.) WI2020 Zentrale Tracks, pp. 293–308. GITO Verlag (2020)
Szopinski, D., Schoormann, T., Kundisch, D.: Because your taxonomy is worth it: towards a framework for taxonomy evaluation. In: Proceedings of the Twenty-Seventh European Conference on Information Systems (ECIS) (2019)
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manag. Inf. Syst. 12, 5–33 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kruse, F., Schröer, C., Gómez, J.M. (2021). Data Source Selection Support in the Big Data Integration Process – Towards a Taxonomy. In: Ahlemann, F., Schütte, R., Stieglitz, S. (eds) Innovation Through Information Systems. WI 2021. Lecture Notes in Information Systems and Organisation, vol 48. Springer, Cham. https://doi.org/10.1007/978-3-030-86800-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-86800-0_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86799-7
Online ISBN: 978-3-030-86800-0
eBook Packages: Computer ScienceComputer Science (R0)