ABC in Root Cause Analysis: Discovering Missing Information and Repairing System Failures

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13810))

  • 929 Accesses

Abstract

Root-cause analysis (RCA) is a crucial task in software system maintenance, where system logs play an essential role in capturing system behaviours and describing failures. Automatic RCA approaches are desired, which face the challenge that the knowledge model (KM) extracted from system logs can be faulty when logs are not correctly representing some information. When unrepresented information is required for successful RCA, it is called missing information (MI). Although much work has focused on automatically finding root causes of system failures based on the given logs, automated RCA with MI remains under-explored. This paper proposes using the Abduction, Belief Revision and Conceptual Change (ABC) system to automate RCA after repairing the system’s KM to contain MI. First, we show how ABC can be used to discover MI and repair the KM. Then we demonstrate how ABC automatically finds and repairs root causes. Based on automated reasoning, ABC considers the effect of changing a cause when repairing a system failure: the root cause is the one whose change leaves the fewest failures. Although ABC outputs multiple possible solutions for experts to choose from, it hugely reduces manual work in discovering MI and analysing root causes, especially in large-scale system management, where any reduction in manual work is very beneficial. This is the first application of an automatic theory repair system to RCA tasks: KM is not only used, it will be improved because our approach can guide engineers to produce KM/higher-quality logs that contain the spotted MI, thus improving the maintenance of complex software systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The damage caused by MI in RCA is described in Fig. 2 and further discussed in the next section.

  2. 2.

    ABC’s code is available on GitHub https://github.com/XuerLi/ABC_Datalog.

  3. 3.

    Kee** \(\implies \) is required by the inference of refutation.

  4. 4.

    For example, a triple represents an alarm about a system failure.

  5. 5.

    A cause may be missing while its logical consequence exists in a KM, e.g., only the latter is recorded in the log.

References

  1. Ceri, S., Gottlob, G., Tanca, L.: Logic Programming and Databases. Surveys in Computer Science, Springer, Berlin (1990). https://doi.org/10.1007/978-3-642-83952-8

  2. Chapman, A., et al.: Dataset search: a survey. VLDB J. 29(1), 251–272 (2020)

    Article  Google Scholar 

  3. Cherrared, S., Imadali, S., Fabre, E., Gössler, G.: SFC self-modeling and active diagnosis. IEEE Trans. Network Serv. Manage. 18, 2515–2530 (2021)

    Article  Google Scholar 

  4. Dalal, S., Chhillar, R.S.: Empirical study of root cause analysis of software failure. ACM SIGSOFT Software Engineering Notes 38(4), 1–7 (2013)

    Article  Google Scholar 

  5. Gallier, J.: SLD-Resolution and Logic Programming. Chapter 9 of Logic for Computer Science: Foundations of Automatic Theorem Proving (2003). originally published by Wiley 1986

    Google Scholar 

  6. He, P., Zhu, J., He, S., Li, J., Lyu, M.R.: An evaluation study on log parsing and its use in log mining. In: 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), pp. 654–661. IEEE (2016)

    Google Scholar 

  7. He, S., Zhu, J., He, P., Lyu, M.R.: Loghub: a large collection of system log datasets towards automated log analytics. ar**v preprint ar**v:2008.06448 (2020)

  8. Jia, T., Chen, P., Yang, L., Li, Y., Meng, F., Xu, J.: An approach for anomaly diagnosis based on hybrid graph model with logs for distributed services. In: 2017 IEEE International Conference on Web Services (ICWS), pp. 25–32. IEEE (2017)

    Google Scholar 

  9. Kowalski, R.A., Kuehner, D.: Linear resolution with selection function. Artif. Intell. 2, 227–60 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  10. Li, X.: Automating the Repair of Faulty Logical Theories. Ph.D. thesis, School of Informatics, University of Edinburgh (2021)

    Google Scholar 

  11. Lin, Q., Zhang, H., Lou, J.G., Zhang, Y., Chen, X.: Log clustering based problem identification for online service systems. In: 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C), pp. 102–111. IEEE (2016)

    Google Scholar 

  12. Lu, J., Dousson, C., Krief, F.: A self-diagnosis algorithm based on causal graphs. In: The Seventh International Conference on Autonomic and Autonomous Systems, ICAS, vol. 2011 (2011)

    Google Scholar 

  13. Pfenning, F.: Datalog. Lecture 26, 15–819K: Logic Programming (2006). https://www.cs.cmu.edu/~fp/courses/lp/lectures/26-datalog.pdf

  14. Qiu, J., Du, Q., Yin, K., Zhang, S.L., Qian, C.: A causality mining and knowledge graph based method of root cause diagnosis for performance anomaly in cloud applications. Appl. Sci. 10(6), 2166 (2020)

    Article  Google Scholar 

  15. Shima, K.: Length matters: clustering system log messages using length of words. ar**v preprint ar**v:1611.03213 (2016)

  16. Smaill, A., Li, X., Bundy, A.: ABC repair system for Datalog-like theories. In: KEOD, pp. 333–340 (2018)

    Google Scholar 

  17. Solé, M., Muntés-Mulero, V., Rana, A.I., Estrada, G.: Survey on models and techniques for root-cause analysis. ar**v preprint ar**v:1701.08546 (2017)

  18. Urbonas, M., Bundy, A., Casanova, J., Li, X.: The use of max-sat for optimal choice of automated theory repairs. In: Bramer, M., Ellis, R. (eds.) SGAI 2020. LNCS (LNAI), vol. 12498, pp. 49–63. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63799-6_4

    Chapter  Google Scholar 

  19. Wang, F., et al.: LEKG: a system for constructing knowledge graphs from log extraction. In: The 10th International Joint Conference on Knowledge Graphs (2021)

    Google Scholar 

  20. Zawawy, H., Kontogiannis, K., Mylopoulos, J.: Log filtering and interpretation for root cause analysis. In: 2010 IEEE International Conference on Software Maintenance, pp. 1–5. IEEE (2010)

    Google Scholar 

  21. Zhou, Q., Gray, A.J., McLaughlin, S.: Seanet-towards a knowledge graph based autonomic management of software defined networks. ar**v preprint ar**v:2106.13367 (2021)

  22. Zhu, R., et al.: TREAT: automated construction and maintenance of probabilistic knowledge bases from logs (extended abstract). In: The 8th Annual Conference on machine Learning, Optimization and Data Science (LOD) (2022)

    Google Scholar 

Download references

Acknowledgment

The authors would like to thank Huawei for supporting the research and providing data on which this paper was based under grant CIENG4721/LSC. Also we gratefully acknowledge UKRI grant EP/V026607/1 and the support of ELIAI (The Edinburgh Laboratory for Integrated Artificial Intelligence) EPSRC (grant no EP/W002876/1). Thanks are also due to Zhenhao Zhou for the valuable discussions around network software systems. In addition, anonymous reviewers also gave us very useful feedback that improved the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xue Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X. et al. (2023). ABC in Root Cause Analysis: Discovering Missing Information and Repairing System Failures. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2022. Lecture Notes in Computer Science, vol 13810. Springer, Cham. https://doi.org/10.1007/978-3-031-25599-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25599-1_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25598-4

  • Online ISBN: 978-3-031-25599-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation