Abstract
Data quality management is a great challenge in today’s world due to increasing proliferation of abundant and heterogeneous datasets. All organizations that realize and maintain data intensive advanced applications should deal with data quality related problems on a daily basis. In these organization data quality related problems are registered in natural languages and subsequently the organizations rely on ad-hoc, non-systematic, and expensive solutions to categorize and resolve registered problems. In this contribution we present a formal description of an innovative data quality resolving architecture to semantically and dynamically map the descriptions of data quality related problems to data quality attributes. Through this map**, we reduce complexity – as the dimensionality of data quality attributes is far smaller than that of the natural language space – and enable data analysts to directly use the methods and tools proposed in literature. Another challenge in data quality management is to choose appropriate solutions for addressing data quality problems due to lack of insight in the long-term or broader effects of candidate solutions. This difficulty becomes particularly prominent in flexible architectures where loosely linked data are integrated (e.g., data spaces or in open data settings). We present also a decision support framework for the solution choosing process to evaluate cost-benefit values of candidate solutions. The paper reports on a proof of concept tool of the proposed architecture and its evaluation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Choenni, S., Leertouwer, E.: Public safety mashups to support policy makers. In: Andersen, K.N., Francesconi, E., Grönlund, Å., Engers, T.M. (eds.) EGOVIS 2010. LNCS, vol. 6267, pp. 234–248. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15172-9_22
Netten, N., van den Braak, S., Choenni, S., Leertouwer, E.: Elapsed times in criminal justice systems. In: Proceedings of the 8th International Conference on Theory and Practice of Electronic Governance (ICEGOV), pp. 99–108. ACM (2014)
van Dijk, J., Kalidien, S., Choenni, S.: Smart monitoring of the criminal justice system. Government Information Quarterly. Elsevier (2016). doi:10.1016/j.giq.2015.11.005
Christoulakis, M., Spruit, M., van Dijk, J.: Data quality management in the public domain: a case study within the Dutch justice system. Int. J. Inf. Qual. 4(1), 1–17 (2015)
Birman, K.P.: Consistency in distributed systems. In: Guide to Reliable Distributed Systems, pp. 457–470. Springer, Heidelberg (2012)
Davenport, T.H., Glaser, J.: Just-in-time delivery comes to knowledge management. Harvard Bus. Rev. 80(7), 107–111 (2002)
Bargh, M.S., van Dijk, J., Choenni, S.: Dynamic data quality management using issue tracking systems. IADIS Int. J. Comput. Sci. Inf. Syst. 10(2), 32–51 (2015). Isaias, P., Paprzycki, M. (eds.)
Bargh, M.S., Mbgong, F., Dijk, J. van, Choenni, S.: A framework for dynamic data quality management. In: Proceedings of the IADIS International Conference Information Systems Post-Implementation and Change Management, pp. 134–142 (2015)
Bargh, M., van Dijk, J., Choenni, S.: Management of data quality related problems - exploiting operational knowledge. In: Proceedings of the 5th International Conference on Data Management Technologies and Applications (DATA), pp. 31–42. SciTePress (2016)
Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Comput. Surv. 41(3), 16–52 (2009). Article no. 16
Wand, Y., Wang, R.Y.: Anchoring data quality dimensions in ontological foundations. Commun. ACM 39(11), 86–95 (1996). ACM
Davoudi, S., Dooling, J.A., Glondys, B., Jones, T.D., Kadlec, L., Overgaard, S.M., Ruben, K., Wendicke, A.: Data quality management model (2015 Update). J. AHIMA 86(10), 62–65 (2015). expanded web version
Knowledgent: White Paper Series: Building a Successful Data Quality Management Program. http://knowledgent.com/whitepaper/building-successful-data-quality-management-program/. Accessed 31 Oct 2015
Halevy, A., Rajaraman, A., Ordille, J.: Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 9–16. VLDB Endowment (2006)
Wang, R.Y., Strong, D.M.: Beyond accuracy: what data quality means to data consumers. J. Manage. Inf. Syst. 12(4), 5–33 (1996)
Price, R., Shanks, G.: A semiotic information quality framework. In: Proceedings of International Conference on Decision Support Systems (DSS), pp. 658–672 (2004)
Woodall, P., Borek, A., Parlikad, A.K.: Data quality assessment: the hybrid approach. Inf. Manage. 50, 369–382 (2013)
Bargh, M.S., Choenni, S., Meijer, R.: Privacy and information sharing in a judicial setting: a wicked problem. In: Proceedings of the 16th Annual International Conference on Digital Government Research, pp. 97–106. ACM, New York (2015)
Jiang, L., Barone, D., Borgida, A., Mylopoulos, J.: Measuring and comparing effectiveness of data quality techniques. In: Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 171–185. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02144-2_17
Bugzilla Website. https://www.bugzilla.org. Accessed 31 Oct 2015
JIRA Software Website. https://www.atlassian.com/software/jira. Accessed 31 Oct 2015
H2desk Website, https://www.h2desk.com. Accessed 31 Oct 2015
TOPdesk Website. http://www.topdesk.nl. Accessed 31 Oct 2015
Canovas Izquierdo, J.L., Cosentino, V., Rolandi, B., Bergel, A., Cabot, J.: GiLA: GitHub label analyzer. In: IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering, pp. 479–483, Montreal, Canada (2015)
Environmental protection agency: data quality assessment: a reviewer’s guide, Technical report EPA/240/B-06/002, EPA QA/G-9R (2006)
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Commun. ACM 45(4), 211–218 (2012). ACM
Eppler, M.J., Wittig, D.: Conceptualizing information quality: a review of information quality frameworks from the last ten years. In: Proceedings of the Conference on Info Quality, pp. 83–96 (2000)
Lee, Y.: Crafting rules: context-reflective data quality problem solving. J. Manage. Inf. Syst. 20(3), 93–119 (2003)
Ryu, K.S., Park, J.S., Park, J.H.: A data quality management maturity model. ETRI J. 28(2), 191–204 (2006)
Kornai, A.: The algebra of lexical semantics. In: Mathematics of Language, pp. 174–199. Springer, Heidelberg (2010)
Mooney, R.J.: Learning for semantic parsing. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 311–324. Springer, Heidelberg (2007). doi:10.1007/978-3-540-70939-8_28
Acknowledgements
Partial results of this work were presented earlier in [9]. Tables, figures and equations have their origin in this paper, unless stated otherwise.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
van Dijk, J., Bargh, M.S., Choenni, S., Spruit, M. (2017). Maturing Pay-as-you-go Data Quality Management: Towards Decision Support for Paying the Larger Bills. In: Francalanci, C., Helfert, M. (eds) Data Management Technologies and Applications. DATA 2016. Communications in Computer and Information Science, vol 737. Springer, Cham. https://doi.org/10.1007/978-3-319-62911-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-62911-7_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62910-0
Online ISBN: 978-3-319-62911-7
eBook Packages: Computer ScienceComputer Science (R0)