Abstract
Event logs are invaluable for conducting process mining projects, offering insights into process improvement and data-driven decision-making. However, data quality issues affect the correctness and trustworthiness of these insights, making preprocessing tasks a necessity. Despite the recognized importance, the execution of preprocessing tasks remains ad-hoc, lacking support. This paper presents a systematic literature review that establishes a comprehensive repository of preprocessing tasks and their usage in case studies. We identify six high-level and 20 low-level preprocessing tasks in case studies. Log filtering, transformation, and abstraction are commonly used, while log enriching, integration, and reduction are less frequent. These results can be considered a first step in contributing to more structured, transparent event log preprocessing, enhancing process mining reliability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
van der Aalst, W.M.P.: Process Mining. Springer, Heidelberg (2011)
Benevento, E., Aloini, D., van der Aalst, W.M.: How can interactive process discovery address data quality issues in real business settings? Evidence from a case study in healthcare. J. Biomed. Inform. 130, 104083 (2022)
Birk, A., Wilhelm, Y., Dreher, S., Flack, C., Reimann, P., Gröger, C.: A real-world application of process mining for data-driven analysis of multi-level interlinked manufacturing processes. Procedia CIRP 104, 417–422 (2021)
Cenka, B.A.N., Santoso, H.B., Junus, K.: Analysing student behaviour in a learning management system using a process mining approach. Knowl. Manage. E-Learn.: Int. J. 14, 62–80 (2022)
Chen, L., Klasky, H.B.: Six machine-learning methods for predicting hospital-stay duration for patients with sepsis: a comparative study. In: SoutheastCon 2022. IEEE (2022)
Chen, Q., Lu, Y., Tam, C.S., Poon, S.K.: A multi-view framework to detect redundant activity labels for more representative event logs in process mining. Future Internet 14(6), 181 (2022)
Cho, M., Park, G., Song, M., Lee, J., Lee, B., Kum, E.: Discovery of resource-oriented transition systems for yield enhancement in semiconductor manufacturing. IEEE Trans. Semicond. Manuf. 34(1), 17–24 (2020)
Dogan, O.: A process-centric performance management in a call center. Appl. Intell. 53(3), 3304–3317 (2022)
Du, L., Cheng, L., Liu, C.: Process mining for wind turbine maintenance process analysis: a case study. In: 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2). IEEE (2021)
van Eck, M.L., Lu, X., Leemans, S.J.J., van der Aalst, W.M.P.: PM\(^2\): a process mining project methodology. In: Zdravkovic, J., Kirikova, M., Johannesson, P. (eds.) CAiSE 2015. LNCS, vol. 9097, pp. 297–313. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19069-3_19
Esposito, L., Leotta, F., Mecella, M., Veneruso, S.: Unsupervised segmentation of smart home logs for human habit discovery. In: 2022 18th International Conference on Intelligent Environments (IE). IEEE (2022)
Fahland, D.: Extracting and pre-processing event logs (2022)
Fahrenkrog-Petersen, S.A., et al.: Fire now, fire later: alarm-based systems for prescriptive process monitoring. Knowl. Inf. Syst. 64(2), 559–587 (2021)
Gao, W., Wu, C., Huang, W., Lin, B., Su, X.: A data structure for studying 3D modeling design behavior based on event logs. Autom. Constr. 132, 103967 (2021)
Goel, K., Leemans, S., Wynn, M.T., ter Hofstede, A., Barnes, J.: Improving PhD student journeys with process mining: insights from a higher education institution. In: Proceedings of the Industry Forum (BPM IF 2021) Co-located with 19th International Conference on Business Process Management (BPM 2021), pp. 39–49 (2021)
Han, J., Pei, J., Tong, H.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2022)
Huda, S., Aripin, Naufal, M.F., Yudianingtias, V.M.: Identification of fraud attributes for detecting fraud based online sales transaction. Indian J. Comput. Sci. Eng. 12(5), 1409–1424 (2021)
van Hulzen, G.A., Li, C.Y., Martin, N., van Zelst, S.J., Depaire, B.: Mining context-aware resource profiles in the presence of multitasking. Artif. Intell. Med. 134, 102434 (2022)
Kitchenham, B., Brereton, O.P., Budgen, D., Turner, M., Bailey, J., Linkman, S.: Systematic literature reviews in software engineering - a systematic literature review. Inf. Softw. Technol. 51(1), 7–15 (2009)
Lamghari, Z.: Process mining: a new approach for simplifying the process model control flow visualization. Transdisc. J. Eng. Sci. 13 (2022)
de Leoni, M., Pellattiero, L.: The benefits of sensor-measurement aggregation in discovering IoT process models: a smart-house case study. In: Marrella, A., Weber, B. (eds.) BPM 2021. LNBIP, vol. 436, pp. 403–415. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-94343-1_31
Lim, J., et al.: Assessment of the feasibility of develo** a clinical pathway using a clinical order log. J. Biomed. Inform. 128, 104038 (2022)
Liu, Y., Dani, V.S., Beerepoot, I., Lu, X.: Turning logs into lumber: preprocessing tasks in process mining. CoRR abs/2309.17100 (2023). https://doi.org/10.48550/ARXIV.2309.17100
Marin-Castro, H.M., Tello-Leal, E.: Event log preprocessing for process mining: a review. Appl. Sci. 11(22), 10556 (2021)
Mivule, K.: Utilizing noise addition for data privacy, an overview (2013)
Pan, Y., Zhang, L.: Automated process discovery from event logs in BIM construction projects. Autom. Constr. 127, 103713 (2021)
Pang, J., et al.: Process mining framework with time perspective for understanding acute care: a case study of AIS in hospitals. BMC Med. Inform. Decis. Making 21(1), 1–10 (2021)
Petersen, K., Feldt, R., Mujtaba, S., Mattsson, M.: Systematic map** studies in software engineering. In: EASE (2008)
Pradana, M.I.A., Kurniati, A.P., Wisudiawan, G.A.A.: Inductive miner implementation to improve healthcare efficiency on Indonesia national health insurance data. In: 2022 International Conference on Data Science and Its Applications (ICoDSA). IEEE (2022)
Ramos-Gutiérrez, B., Varela-Vaca, Á.J., Galindo, J.A., Gómez-López, M.T., Benavides, D.: Discovering configuration workflows from existing logs using process mining. Empir. Softw. Eng. 26(1), 1–41 (2021)
Ridwanah, R.D., Andreswari, R., Fauzi, R.: Analysis and implementation of TELKOM university lecture business processes evaluation on heuristic miner algorithm: a process mining approach. In: ISMODE. IEEE (2022)
Rismanchian, F., Kassani, S.H., Shavarani, S.M., Lee, Y.H.: A data-driven approach to support the understanding and improvement of patients’ journeys: a case study using electronic health records of an emergency department. Value Health 26(1), 18–27 (2023)
Sohail, S.A., Bukhsh, F.A., van Keulen, M.: Multilevel privacy assurance evaluation of healthcare metadata. Appl. Sci. 11(22), 10686 (2021)
Stein Dani, V., et al.: Towards understanding the role of the human in event log extraction. In: Marrella, A., Weber, B. (eds.) BPM 2021. LNBIP, vol. 436, pp. 86–98. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-94343-1_7
Stephan, S., Lahann, J., Fettke, P.: A case study on the application of process mining in combination with journal entry tests for financial auditing (2021)
Suriadi, S., Andrews, R., ter Hofstede, A.H.M., Wynn, M.T.: Event log imperfection patterns for process mining: towards a systematic approach to cleaning event logs. Inf. Syst. 64, 132–150 (2017)
Tang, J., Liu, Y., Lin, K., Li, L.: Process bottlenecks identification and its root cause analysis using fusion-based clustering and knowledge graph. Adv. Eng. Inform. 55, 101862 (2023)
Tariq, Z., Charles, D., McClean, S., McChesney, I., Taylor, P.: Anomaly detection for service-oriented business processes using conformance analysis. Algorithms 15(8), 257 (2022)
Tavakoli-Zaniani, M., Gholamian, M.R., Hashemi-Golpayegani, S.A.: Improving heuristics miners for healthcare applications by discovering optimal dependency graphs. J. Supercomput. 78(18), 19628–19661 (2022)
van Zelst, S.J., Mannhardt, F., de Leoni, M., Koschmider, A.: Event abstraction in process mining: literature review and taxonomy. Granular Comput. 6(3), 719–736 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, Y., Stein Dani, V., Beerepoot, I., Lu, X. (2024). Turning Logs into Lumber: Preprocessing Tasks in Process Mining. In: De Smedt, J., Soffer, P. (eds) Process Mining Workshops. ICPM 2023. Lecture Notes in Business Information Processing, vol 503. Springer, Cham. https://doi.org/10.1007/978-3-031-56107-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-56107-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56106-1
Online ISBN: 978-3-031-56107-8
eBook Packages: Computer ScienceComputer Science (R0)