A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences

Graziani, Mara; Dutkiewicz, Lidia; Calvaresi, Davide; Amorim, José Pereira; Yordanova, Katerina; Vered, Mor; Nair, Rahul; Abreu, Pedro Henriques; Blanke, Tobias; Pulignano, Valeria; Prior, John O.; Lauwaert, Lode; Reijers, Wessel; Depeursinge, Adrien; Andrearczyk, Vincent; Müller, Henning

doi:10.1007/s10462-022-10256-8

A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences

Open access
Published: 06 September 2022

Volume 56, pages 3473–3504, (2023)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences

Download PDF

Mara Graziani ORCID: orcid.org/0000-0003-3456-945X^3,4,
Lidia Dutkiewicz¹¹,
Davide Calvaresi³,
José Pereira Amorim^1,2,
Katerina Yordanova¹¹,
Mor Vered¹²,
Rahul Nair¹⁰,
Pedro Henriques Abreu¹,
Tobias Blanke⁸,
Valeria Pulignano⁶,
John O. Prior⁹,
Lode Lauwaert¹³,
Wessel Reijers⁷,
Adrien Depeursinge^3,9,
Vincent Andrearczyk³ &
…
Henning Müller^3,5

12k Accesses
24 Citations
24 Altmetric
Explore all metrics

Abstract

Since its emergence in the 1960s, Artificial Intelligence (AI) has grown to conquer many technology products and their fields of application. Machine learning, as a major part of the current AI solutions, can learn from the data and through experience to reach high performance on various tasks. This growing success of AI algorithms has led to a need for interpretability to understand opaque models such as deep neural networks. Various requirements have been raised from different domains, together with numerous tools to debug, justify outcomes, and establish the safety, fairness and reliability of the models. This variety of tasks has led to inconsistencies in the terminology with, for instance, terms such as interpretable, explainable and transparent being often used interchangeably in methodology papers. These words, however, convey different meanings and are “weighted" differently across domains, for example in the technical and social sciences. In this paper, we propose an overarching terminology of interpretability of AI systems that can be referred to by the technical developers as much as by the social sciences community to pursue clarity and efficiency in the definition of regulations for ethical and reliable AI development. We show how our taxonomy and definition of interpretable AI differ from the ones in previous research and how they apply with high versatility to several domains and use cases, proposing a—highly needed—standard for the communication among interdisciplinary areas of AI.

Applying a Principle of Explicability to AI Research in Africa: Should We Do It?

AI, Explainability and Public Reason: The Argument from the Limitations of the Human Mind

Article Open access 23 August 2021

Ethics as a Service: A Pragmatic Operationalisation of AI Ethics

Article Open access 19 June 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The last decade saw a sharp increase in research papers concerning interpretability for Artificial Intelligence (AI), also referred to as eXplainable AI (XAI). In 2020, the number of papers containing “interpretable AI", “explainable AI", “XAI", “explainability", or “interpretability" has increased to more than three times that of 2010, following the trend shown in Fig. 1.

Being applied to an increasingly large number of applications and domains, AI solutions mostly divide into the two approaches illustrated in Fig. 2. On the one side, we have Symbolic AI, symbolic reasoning on knowledge bases as an important element of automated intelligent agents, which reflect the humans’ social constructs into the virtual world (Russell and Norvig 2002). To communicate intuitions and results, humans (henceforth agents) tend to construct and share rational explanations, which are means to match intuitive and analytical cognition (Omicini 2020). On the other side, Machine Learning (ML) and Deep Learning (DL) models reach high performance by learning from the data and through experience. The complexity of the tasks in both approaches has increased over time, together with the complexity of the models being used and their opacity. A rising interest in interpretability came with the increasing opacity of the systems and with the frequent adoption of "black-box" methods such as DL, as documented by multiple studies (Miller 2019; Lipton 2018; Tjoa and Guan 2020; Murdoch et al. 2019; Adadi and Berrada 2018; Arya et al.

4.2 A global definition of interpretable AI

As an important contribution of this work, we derive a multidisciplinary definition of interpretable AI that may be adopted in both the social and the legal sciences.

In daily language, an instance, or an object of interest, is defined as interpretable if it is possible to find its interpretation, hence if we can find its meaning (Simpson 2009). Interpretability can thus be conceived as the capability to characterize something as interpretable. A formal definition of interpretability exists in the field of mathematical logic, and it can be summarized as the possibility of interpreting, or translating, one formal theory into another while preserving the validity of each theorem in the original theory during the translation (Tarski et al. 1953. The translated theory as such assigns meaning to the original theory and it is an interpretation of it. The translation may be needed, for instance, to move into a simplified space where the original theory is easier to understand and can be presented in a different language.

From these explicit definitions, we can derive a multidisciplinary definition of interpretability that embraces both technical and social aspects: “Interpretability is the capability of assigning meaning to an instance by a translation that does not change its original validity”. The definition of interpretable AI can then be derived by clarifying what should be translated: “An AI system is interpretable if it is possible to translate its working principles and outcomes in human-understandable language without affecting the validity of the system”. This definition represents the shared goal that several technical approaches aim to obtain when applied to AI. In some cases, as we discuss in Sec. 4.4, the definition is relaxed to include approximations of the AI system that maintain its validity as much as possible. Interpretability is needed to make the output generation process of an AI system explainable and understandable to humans and it is often obtained as a translation process. Such a process may be introduced directly at the design stage as an additional task of the system. If not available by design, interpretability may be obtained by post-hoc explanations that aim at improving the understandability of how the outcome was generated. Interpretability can thus be sought through iterations and in multiple forms (e.g. graphical visualizations, natural language, or tabular data) which can be adapted to the receiver. This fosters the auditability and accountability of the system.

4.3 A global taxonomy

In what follows we present a global taxonomy for interpretable AI, and summarize the multiple viewpoints and perspectives gathered in this work. Table 4 presents the taxonomy with further detail on domain-specific definitions used in each of the eight fields studied in this work, namely law, ethics, cognitive psychology, machine learning, symbolic AI, sociology, labour rights, and healthcare research. Brackets specify the domain in which each definition applies. If a term applies to both social and technical experts it is provided first and marked by the (global) identifier. Otherwise it is marked as the domain specific identified, i.e. EU law, sociology, etc. This table may be resorted to by practitioners in any of the above-mentioned fields to obtain a common definition for each term in the taxonomy and to inspect all the exceptions and variations of the same term in the literature. Our objective is not to impose one taxonomy above another, rather to raise awareness on the multiple definitions of each word in each domain, and to create a common terminology that researchers may refer to in order to reduce misinterpretations.

Table 4 Taxonomy of Interpretable AI for the social and technical sciences

Full size table

The following subsections explain how the proposed taxonomy adapts to the fields with their respective needs, challenges and goals in terms of ML interpretability.

4.4 Use of the proposed terminology to classify interpretability techniques

In this section, we show how the terminology in Table 3 can be used to classify ML interpretability techniques. To do so, we group popular interpretability techniques into the families shown in Table 5. On the basis of this, Table 6 summarizes how each family of techniques can provide the properties described in Table 3. In the following, we give more insights concerning the classifications provided in Tables 5 and 6.

Due to their low complexity, models such as decision trees and sparse linear models have inherent interpretability, meaning they can be interpreted without the use of additional interpretability techniques (Molnar 2019). These methods are intelligible, according to the definition in Table 3 ID 4. Black-box models, such as deep learning models, have surpassed the performance of traditional systems over complex problems such as image classification. However, due to their high complexity, they require techniques to interpret their decisions and behavior. These techniques often involve considering a close approximation of the model behavior that may be true in the locality of an instance (i.e. local interpretability) or for the entire set of inputs (i.e. global interpretability). They can be grouped according to the following criteria: (1) scope, (2) model-agnostic, and (3) result of explanation.

The scope of the technique shows the granularity of the decisions that are allowed as explanation, either global or local. Global interpretability techniques explain the behavior of the system as a whole, answering the question “How does the model make predictions?”, while local interpretability techniques explain an individual or group of predictions, answering the question “How did the model make a certain prediction or a group of predictions?” (Lipton 2018).

Model-agnostic techniques can be applied to any model class to extract explanations, unlike model-specific techniques that are restricted to a specific model class. Interpretability techniques can also be roughly divided by their result or the type of explanation they produce, creating multiple families of techniques. It is important to note that some types of explanations are strongly preferred, as half the studies using interpretability techniques in the oncological field use either saliency maps or feature importance (Amorim et al. 2021). These techniques can produce data points that explain the behavior of the model (Kim et al. 2016; Lapuschkin et al. 2015), visualizations of internal features (Olah et al. 2017) or produce simpler models that approximate the model (Ribeiro et al. 2016; Lakkaraju et al. 2016; Lundberg and Lee 2017). It is important to choose the right technique based on its scope and family to reach the desired objective. Table 5 presents the families of techniques, their definitions and important references (Molnar 2019).

Based on Tables 1, 2 and 4 we present Table 6 where we group families of interpretability techniques based on their scope and classify them based on their suitability to achieve each of the objectives mentioned in Tables 1 and 2. To achieve interpretability as intended in Table 3 (ID 1), local techniques are preferable since they allow users to interpret the outcomes of a system and thus increase its interpretability. Global techniques can be rather inaccurate at a local level, although they are more adequate to expose the mechanisms of a system in general. The decision-making process can become more transparent (ID 3) at the local or global level, depending on the scope of the interpretability techniques. Intelligibility (ID 4) is a characteristic of inherently interpretable models. It can be achieved for more complex models by approximating the decision function either locally or globally with an inherent interpretable model. It is also important to point out that even with the model being inherently interpretable, sometimes the features being used to train the models can be hard to understand, particularly for non-experts in feature engineering.

As for accountability, systems would need to justify their outcomes and behavior to be accountable, and thus the techniques that offer any interpretability or explainability can help to achieve this. Similarly, these techniques can also be used to examine the global behavior or reasoning of local decisions and provide auditability (ID 7). Finally, Robustness (ID 9) is not achievable by only understanding the behavior of the model. It would rather require finding or producing instances that make the model misbehave, limitations of the model or data points which are outside the training data distribution.

Table 5 Definitions of families of interpretability techniques

Full size table

Table 6 Classification of families of interpretability techniques

Full size table

At this point, we remark that interpretability techniques come with inherent risks. A desired property of interpretability is to help the end-user with creating the right mental model of an AI system. However, if one considers AI models to be lossy compression of data, then interpretability outcomes are a lossy compression of the model and are severely underspecified. In other words, it is possible to generate several different interpretations for the same observations. If used improperly, interpretability techniques can open new sources of risk. In some settings, interpretability outcomes can be arbitrarily changed. For example, (Aïvodji et al. 2019) demonstrate a case of “fair washing", where fair rules can be obtained that represent an underlying unfair model. It is also possible for an AI system that predicts grades to be gamed if the underlying logic is fully transparent. Model explanations can demonstrate an AI model criterion to be illegal or provide grounds for appeals (Weller 2019). Finally, transparency also conveys trade-offs involved in decisions in an explicit manner that may otherwise be hidden (Coyle and Weller 2020).

From these considerations, it follows that interpretability requires a context-based scientific evaluation. Two standard approaches for such evaluations are (a) to establish baselines based on domain insights to evaluate the quality of explanations, and (b) to leverage end-user studies to determine effectiveness. For instance, user experiments have been used for trust calibration (knowing when and when not to trust AI outputs) in joint decision-making (Zhang et al. 2020). In another interesting approach, (Lakkaraju et al. 2016) measured the teaching performance of end-users in establishing how effective explanations are in communicating model behavior with good teaching performance indicating better model understanding.

Several quantitative measures to assess explanation risks have also been proposed in the literature. A common measure using surrogates involves approximating a complex model with a simpler interpretable one. Properties of the simpler model can then help address questions on the extent of interpretability of the original model. Common measures include fidelity, the fraction of time the simpler model agrees with the complex one, or complexity, the number of elements in the simpler model a user needs to parse to understand an outcome. Faithfulness metrics measure the correlation between feature importance as deemed by an AI model versus deemed by an explanation. Sensitivity measures (Yeh et al. 2015) proposed two paths to achieve such an integration. Nevertheless, current research indicates that the forthcoming decades will focus on the full development of conversational informatics (Nishida 2014; Calvaresi et al. 2021). MAS are modeled after human societies and within MAS agents communicate with each other, sharing syntax and ontology. They interact via the Agent Communication Languages (ACL) standard [?] shaped around Searle’s theory of human communication based on speech acts (Searle et al. 1969). Therefore, multi-agent interpretability and explainability require multi-disciplinary efforts to capture all the diverse dimensions and nuances of human conversational acts, transposing such skills to conversational agents (Ciatto et al. 2019, 2020). Equip** virtual entities with explanation capabilities (either directed to humans or other virtual agents) fits into the view of socio-technical systems, where both humans and artificial components play the role of system components (Whitworth 2006). Ongoing international projects revolve around these concepts. For example, they are tackling intra- and inter-agent explainability (EXPECTATION), actualizing explainable assistive robots (COHERENT), countering information manipulation with knowledge graphs and semantics (CIMPLE), and relating action to effect via causal models of the environment (CausalXRL) ^{Footnote 6}. Explainable agents can leverage symbolic AI techniques to provide a rational and shareable representation of their own specific cognitive processes and results. Being able to manipulate such a representation allows building one or more personalized explanations to meet the explainee (human and virtual) background and boost the success of the explanation process and overall interaction.

5 A case study: the medical domain

In this Section, we present a case study in a medical scenario. We show how each of the perspectives from the multiple domains (i.e. from the legislation, cognitive, social, ethical, philosophical, rights at work, ML and symbolic AI) comes into play in a possible use case. As argued by Tonekaboni et al. (2019) and Banja et al. (2022), the application of ML to clinical settings represents a relevant use case for interpretability, motivated by the high stakes, the complexity of the modeling task and the need for reliability. From the legal perspective, clinicians are the sole people legally accountable for any diagnosis and decision-making, hence accepting ML suggestions is seen as taking an acknowledged risk that may affect the survival and life quality of the patient. As the cognitive sciences suggest, clinicians should be able to revise their mental model of the AI system to be able to understand the principles applied by the systems’ decision-making, ensuring the reliability of the systems. It is only through time and sustained use that a social relationship of trust between the physician and the automated system can be installed. Interpretability is to be sought in the medical application not only for the sake of the philosophical and epistemic value of explanations per se, but also as an ethical requirement to provide a factual, direct and clear explanation of the decision-making process, especially in the event of unwanted consequences" (Floridi et al. 2018; Robbins 2019). An AI-generated decision arguably needs to be interpretable if it can affect a human being. Given the high cost of making a mistake, the ML application cannot be allowed to take decisions independently, differently from other contexts where ML tools are used lightly, e.g. recommendation systems. This sets a major requirement to ensure the well-being of the physicians in the workplace, making sure that their confidence with the tools may increase over time and provide them with sufficient transparency to take the decisions on whether to rely or not on the AI system. To satisfy the requirements set by this analysis from the social sciences, the ML and symbolic-AI tools deployed for clinical use should interact with the experts for which technical solutions must be developed.

The interaction between humans and ML systems is a non-trivial task. Human reasoning is mostly based on high-level concepts that interact with each other to form a semantic representation. These interactions with semantic meaning are not necessarily represented by ML models that mostly operate on numeric features such as input pixel values, internal activations and model weights (Kim et al. 2018). When the features used by the model are expressed in clinical terms, the interaction of the clinicians with the system is enhanced and can lead to successful cooperation. An example is the case described in Caruana et al. (2015). Despite its high performance, the model for pneumonia risk detection had a hidden flaw. Cases of pneumonia with concurring asthma were assigned a lower risk of death than those without, despite the presence of this condition being known to worsen the severity of the cases. A correct prediction would have been the opposite diagnosis given the high risk of death. The misleading correlation (i.e. presence of asthma thus low risk of death from pneumonia) was rather a consequence of the effective care given to these patients by healthcare specialists that were promptly reacting to reduce the risk of death, and as a consequence lowering the recorded risk for these patients. The misleading feature “presence of asthma" was captured by the interpretability analysis and it was promptly understood by physicians since it was expressed as a clinical feature.

It is now worth pointing out that, as described by Asan et al., “maximizing the user’s trust does not necessarily yield the best decisions from a human-AI collaboration" and that the optimal trust level can be achieved when the user knows when the model makes errors. After recalling that the role of humans in the practical applications of AI has been overlooked (Asan et al. 2020; Verma et al. 2021), they suggest that achieving such an understanding of both strengths and weaknesses of the models requires a combination of three main elements: (i) increasing transparency, (ii) ensuring robustness (Briganti and Le Moine 2020) and (iii) encouraging fairness. Concerning (i), XAI was mentioned as the most promising approach to alleviate the black-box effects (Morin et al. 2018; Reyes et al. 2020; Verma et al. 2021). In addition, we believe that current AI model lifecycles are often too short for the user to acquire a sufficiently high confidence, where novel approaches, or even retrained versions of the same algorithm are constantly released, sometimes with only little quantitative performance improvement. This can be compared to a situation where drivers must flawlessly master their vehicle while the latter is continuously changing shape and characteristics. One must therefore foster patience to achieve an adequate level of trust, which involves an intimate relationship between the end-user and a particular instance of the model to seize the situations where the model is working well and where it does not. This was de facto encouraged by the U.S. Food and Drug Administration (FDA), which as of June 2021 only approved static algorithms. However, as pointed out by Pianykh et al. the performance of static AI algorithms tends to degrade over time, owing to the naturally occurring changes in local data and the environment (Pianykh et al. 2020). Furthermore, the access to a large collection of well-curated, expert-labeled data from a source that has high relevance to the studied population and the question asked is also a severe barrier for widespread adoption in the clinics (Willemink et al. 2020). We can conclude that an optimal model lifecycle has yet to be discovered to balance between model performance and robustness as well as adequate user trust and data access to optimally train AI models.

6 Conclusion

This work proposes an in-depth discussion of the terminology in interpretable AI, highlighting the risks of misunderstanding that exist if differing definitions are employed in the technical and social sciences. As noted by the experts, there are important gaps between how, for example, the legal legislation shows the notion of transparency and the meaning that is assigned to this word by ML experts and developers. While in the first case transparency is intended as a subjective property that is influenced by the receiver’s understanding and prior knowledge, in the technical sciences transparency is rather seen as an objective property that is not influenced by the receiver of the information. Similarly, the notion of interpretability is seen as the creation of a social contract of trust by social sciences, whereas this is yet too often intended as the explanation of the automated generation process of the AI system by most AI experts.

The taxonomy proposed in this paper has the objective to harmonize the terminology used by lawyers, philosophers, developers, physicians and sociologists, with the goal of building a solid basis for discussing the future of AI development in a multidisciplinary setting. We show how the proposed terminology is used in multiple domains and also its versatility to social and technical discussions. By discussing these points on the concrete application of the medical domain we show that the need for a common terminology is real and that further reflection is needed to define how effective human-machine cooperation can be established. Without the help of the social sciences, it would not be possible to obtain a sustainable human-machine partnership and further research needs to be pursued at the frontier of the social and technical sciences. This paper may then constitute a strong foundation for scientists and humanists to collaborate and interact on such matters.

Notes

https://taxonomyinterpretableai.wordpress.com/, as of October 2021.
https://www.youtube.com/watch?v=aVLCDORsqmo, as of February 2022.
In the original paper, this problem is formulated as that of “the inmates running the asylum".
https://www.europarl.europa.eu/cmsdata/196377/AI%20HLEG_Ethics%20Guidelines%20for%20Trustworthy%20AI.pdf, as of February 2022.
Intended in the scientific sense used in cognitive psychology (see Sect. 4.5).
Projects within the CHIST-ERA pathfinder programme for research on future and emerging information and communication technologies https://www.chistera.eu/projects.

References

Aïvodji U, Arai H, Fortineau O, Gambs S, Hara S, Tapp A (2019) Fairwashing: the risk of rationalization. In: International conference on machine learning. PMLR, pp 161–170
Adadi A, Berrada M (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (xai). IEEE Access 6:52138–52160
Article Google Scholar
Arya V, Bellamy RKE, Chen P-Y, Dhurandhar A, Hind M, Hoffman SC, Houde S, Liao QV, Luss R, Mojsilović A et al (2019) One explanation does not fit all: A toolkit and taxonomy of ai explainability techniques. ar**v preprint ar**v:1909.03012
Asan O, Bayrak AE, Choudhury A (2020) Artificial intelligence and human trust in healthcare: focus on clinicians. J Med Internet Res 22(6):e15154
Article Google Scholar
Ananny Mike, Crawford Kate (2018) Seeing without knowing: limitations of the transparency ideal and its application to algorithmic accountability. New Media Soc 20(3):973–989
Article Google Scholar
Arrieta AB, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, García S, Gil-López S, Molina D, Benjamins R et al (2020) Explainable artificial intelligence (xai): concepts, taxonomies, opportunities and challenges toward responsible AI. Inform Fusion 58:82–115
Article Google Scholar
Anjomshoae S, Najjar A, Calvaresi D, Främling K (2019) Explainable agents and robots: results from a systematic literature review. In: 18th International Conference on autonomous agents and multiagent systems (AAMAS 2019), Montreal, May 13–17, 2019. International Foundation for Autonomous Agents and Multiagent Systems, 2019, pp 1078–1088
Biran O, Cotton C (2017) Explanation and justification in machine learning: a survey. In: IJCAI-17 workshop on explainable AI (XAI) 8:8–13
Banja JD, Hollstein RD, Bruno MA (2022) When artificial intelligence models surpass physician performance: medical malpractice liability in an era of advanced artificial intelligence. J Am Coll Radiol
Besold TR, Kühnberger K-U (2015) Towards integrated neural-symbolic systems for human-level AI. Two research programs hel** to bridge the gaps. Biol Inspir Cognit Archit 14:97–110
Google Scholar
Bibal A, Lognoul M, de Streel A, Frénay B (2020) Impact of legal requirements on explainability in machine learning. ar**v preprint ar**v:2007.05479
Bibal A, Lognoul M, de Streel A, Frénay B (2020) Legal requirements on explainability in machine learning. Artif Intell Law 29:1–21
Google Scholar
Briganti G, Le Moine O (2020) Artificial intelligence in medicine: today and tomorrow. Front Med 7:27
Article Google Scholar
Chakraborty M, Biswas SK, Purkayastha B (2020) Rule extraction from neural network trained using deep belief network and back propagation. Knowl Inform Syst 62(9):3753–3781
Article Google Scholar
Calvaresi D, Ciatto G, Najjar A, Aydogan R, Van der Torre L, Omicini A, Schumacher M (2021) Expectation: personalized explainable artificial intelligence for decentralized agents with heterogeneous knowledge. In: international workshop on explainable and transparent AI and multi-agent systems, Springer
Ciatto G, Calegari R, Omicini A, Calvaresi D (2019) Towards XMAS: explainability through multi-agent systems. In: Claudio S, Giancarlo F, Giovanni C, and Andrea O, (eds). Proceedings of the 1st workshop on artificial intelligence and internet of things co-located with the 18th international conference of the italian association for artificial intelligence (AI*IA 2019), Rende (CS), November 22, 2019, volume 2502 of CEUR Workshop Proceedings, pp 40–53. CEUR-WS.org,
Clinciu M-A, Hastie H (2019) A survey of explainable ai terminology. In: Proceedings of the 1st workshop on interactive natural language technology for explainable artificial intelligence (NL4XAI 2019), pp 8–13
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1721–1730
Calvaresi D, Marinoni M, Sturm A, Schumacher M, Buttazzo G (2017) The challenge of real-time multi-agent systems for enabling IOT and CPS. In: Proceedings of the international conference on web intelligence, pp 356–364
Coeckelbergh Mark (2020) AI ethics. MIT Press
Chromik M, Schuessler M (2020) A taxonomy for human subject evaluation of black-box explanations in xai. In: ExSS-ATEC@ IUI, p 1
Ciatto G, Schumacher MI, Omicini A, Calvaresi D (2020) Agent-based explanations in ai: towards an abstract framework. In: International workshop on explainable, transparent autonomous agents and multi-agent systems. Springer, pp 3–20
Coyle D, Weller A (2020) “Explaining” machine learning reveals policy challenges. Science 3586498:1433–1434
Article Google Scholar
Dick S (2019) Artificial intelligence. Harvard Data Sci Rev 1(1):7
Google Scholar
De Raedt L, Manhaeve R, Dumancic S, Demeester T, Kimmig A (2019) Neuro-symbolic= neural+ logical+ probabilistic. In NeSy’19@ IJCAI, the 14th International Workshop on Neural-Symbolic Learning and Reasoning
Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. ar**v preprint ar**v:1702.08608
Edwards L, Veale M (2017) Slave to the algorithm: why a right to an explanation is probably not the remedy you are looking for. Duke L Tech Rev 16:18
Google Scholar
Floridi L, Cowls J, Beltrametti M, Chatila R, Chazerand P, Dignum V, Luetge C, Madelin R, Pagallo U, Rossi F et al (2018) Ai4people-an ethical framework for a good AI society: opportunities, risks, principles, and recommendations. Minds Mach 28(4):689–707
Article Google Scholar
Franklin S, Graesser A (1996) Is it an agent, or just a program?: A taxonomy for autonomous agents. In: International workshop on agent theories, architectures, and languages. Springer, pp 21–35
Frosst N, Hinton G (2017) Distilling a neural network into a soft decision tree. In: Proceedings of the first international workshop on comprehensibility and explanation in AI and ML 2017, Co-located with 16th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2017)
Graziani M, Andrearczyk V, Marchand-Maillet S, Müller H (2020) Concept attribution: explaining CNN decisions to physicians. Comput Biol Med 123:103865
Article Google Scholar
Goodman B, Flaxman S (2016) Eu regulations on algorithmic decision-making and a ”right to explanation”. In ICML workshop on human interpretability in machine learning (WHI 2016), New York. ar**v. org/abs/1606.08813 v1
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput Surv (CSUR) 51(5):1–42
Article Google Scholar
Graziani M (2021) Interpretability of deep learning for medical image classification: improved understandability and generalization. PhD thesis, University of Geneva
Goodfellow IJ, Shlens J, Szegedy C(2015) Explaining and Harnessing Adversarial Examples. In: Yoshua B and Yann L, (eds), 3rd International conference on learning representations, ICLR 2015, San Diego, May 7-9, 2015, Conference track proceedings, pp 1–11
Hilton DJ (1990) Conversational processes and causal explanation. Psychol Bull 107(1):65
Article Google Scholar
Hilton D (2017) Social attribution and explanation
Hamon R, Junklewitz H, Malgieri G, Hert PD, Beslay L, Sanchez I (2021) Impossible explanations? beyond explainable AI in the GDPR from a covid-19 use case scenario. In: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp 549–559
Hinton G, Vinyals O, Dean J (2015) Distilling the Knowledge in a Neural Network. In nips deep learning and representation learning workshop, pp 1–9
Kim B, Khanna R, Koyejo OO (2016) Examples are not enough, learn to criticize! criticism for interpretability. In: D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (eds). Advances in neural information processing systems, volume 29. Curran Associates, Inc, pp 1–9
Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. In: Doina P and Yee WT (eds). Proceedings of the 34th international conference on machine learning, volume 70 of Proceedings of Machine Learning Research. PMLR, 06–11 Aug, pp 1885–1894
Kaur H, Nori H, Jenkins S, Caruana R, Wallach H, Wortman VJ (2020) Interpreting interpretability: understanding data scientists’ use of interpretability tools for machine learning. In: Proceedings of the 2020 CHI conference on human factors in computing systems, pp 1–14
Kim B, Wattenberg M, Gilmer J, Cai C, Wexler J, Viegas F, Sayres R (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: Jennifer D and Andreas K (eds). Proceedings of the 35th international conference on machine learning, volume 80 of Proceedings of Machine Learning Research. PMLR, 10–15 Jul, pp 2668–2677
Lakkaraju H, Bach SH, Leskovec J (2016) Interpretable decision sets: a joint framework for description and prediction. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1675–1684
Lapuschkin S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10:07
Google Scholar
Lipton ZC (2018) The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57
Article Google Scholar
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Proceedings of the 31st international conference on neural information processing systems, NIPS’17, Red Hook. Curran Associates Inc, pp 4768–4777
Lombrozo T (2006) The structure and function of explanations. Trends Cognit Sci 10(10):464–470
Article Google Scholar
Miller T, Howe P, Sonenberg L (2017) Explainable AI: beware of inmates running the asylum or: How i learnt to stop worrying and love the social and behavioural sciences. ar**v preprint ar**v:1712.00547
Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38
Article MathSciNet MATH Google Scholar
Montavon G, Lapuschkin S, Binder A, Samek W, Müller KR (2017) Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit 65:211–222
Article Google Scholar
Molnar C (2019) Interpretable machine learning: a guide for making black box models explainable. Leanpub, https://christophm.github.io/interpretable-ml-book(visited 15 May 2021)
Mittelstadt B, Russell C, Wachter S (2019) Explaining explanations in AI. In: Proceedings of the conference on fairness, accountability, and transparency, pp 279–288
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Interpretable machine learning: definitions, methods, and applications. ar**v preprint ar**v:1901.04592
Montavon G, Samek W, Müller KR (2018) Methods for interpreting and understanding deep neural networks. Digital Signal Process 73:1–15
Article MathSciNet Google Scholar
Morin O, Vallières M, Jochems A, Woodruff HC, Valdes G, Braunstein SE, Wildberger JE, Villanueva-Meyer JE, Kearney V, Solberg TD, Lambin P (2018) A deep look into the future of quantitative imaging in oncology: a statement of working principles and proposal for change. Int J Radiat Oncol Biol Phys 102(4):1074–1082
Article Google Scholar
Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J (2016) Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Curran Associates Inc, Red Hook, pp 3395–3403
Nissenbaum H (2011) A contextual approach to privacy online. Daedalus 140(4):32–48
Article Google Scholar
Nguyen A, Martínez MR (2019) Mononet: towards interpretable models by learning monotonic features. Human-Centric Machine Learning workshop, NeurIPS
Nishida T, Atsushi N, Yoshimasa O, Yasser M (2014) Conversational informatics. Springer, New York
Book Google Scholar
Omicini A (2020) Not just for humans: explanation for agent-to-agent communication. In Giuseppe V, Matteo P, and Andrea Or, (eds). Proceedings of the AIxIA 2020 discussion papers workshop co-located with the the 19th international conference of the Italian Association for Artificial Intelligence (AIxIA2020), Anywhere, November 27th, 2020, volume 2776 of CEUR Workshop Proceedings. CEUR-WS.org, pp 1–11
Olah C, Mordvintsev A, Schubert L (2017) Feature visualization. Distill, https://distill.pub/2017/feature-visualization
Amorim JP, Abreu PH, Fernández A, Reyes M, Santos J, Abreu MH (2021) Interpreting deep machine learning models: an easy guide for oncologists. IEEE Rev Biomed Eng, pp. 1–16
Pianykh OS, Langs G, Dewey M, Enzmann DR, Herold CJ, Schoenberg SO, Brink JA (2020) Continuous learning AI in radiology: implementation principles and early applications. Radiology 297(1):6–14
Article Google Scholar
Palacio S, Lucieri A, Munir M, Hees J, Ahmed S, Dengel A (2021) Xai handbook: towards a unified framework for explainable AI. ar**v preprint ar**v:2105.06677
Reyes M, Meier R, Pereira S, Silva CA, Dahlweid FM, von Tengg-Kobligk H, Summers RM, Wiest R (2020) On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiology 2(3):e190043
Google Scholar
Russell S, Norvig P(2002) Artificial intelligence: a modern approach
Robbins S (2019) A misdirected principle with a catch: explicability for AI. Minds Mach 29(4):495–514
Article MathSciNet Google Scholar
Riveret R, Pitt JV, Korkinof D, Draief M (2015) Neuro-symbolic agents: Boltzmann machines and probabilistic abstract argumentation with sub-arguments. In AAMAS, pp 1481–1489
Rosenfeld A, Richardson A (2019) Explainability in human-agent systems. Autonom Agents Multi-Agent Syst 33(6):673–705
Article Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) "why should i trust you?": explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16, New York. Association for Computing Machinery, pp 1135–1144
Rudin Cynthia (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1(5):206–215
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: visual Explanations from deep networks via gradient-based localization. In: 2017 IEEE international conference on computer vision (ICCV), vol 128, pp 618–626
Schwartz HM (2014) Multi-agent machine learning: a reinforcement approach. Wiley, New York
Book MATH Google Scholar
Simpson J (2009) Oxford English dictionary
Selbst A, Powles J (2018) ”meaningful information” and the right to explanation. In: Conference on fairness, accountability and transparency. PMLR, pp 48–48
Stammer W, Schramowski P, Kersting K (2021) Right for the right concept: revising neuro-symbolic concepts by interacting with their explanations. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3619–3629
Searle JR, Searle PGW, Searle JR et al (1969) Speech acts: an essay in the philosophy of language, vol 626. Cambridge University Press
Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: International Conference on Machine Learning. PMLR, pp 3319–3328
Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: Visualising image classification models and saliency maps. In Yoshua B and Yann L (eds.), 2nd international conference on learning representations, ICLR 2014, Banff, April 14-16, 2014, Workshop Track Proceedings
Sarker MdK, Zhou L, Eberhart A, Hitzler P (2021) Neuro-symbolic artificial intelligence current trends. ar**v preprint ar**v:2105.05330
Tomsett R, Braines D, Harborne D, Preece A, Chakraborty S(2018) Interpretable to whom? A role-based model for analyzing interpretable machine learning systems. ICML Workshop on Human Interpretability in Machine Learning
Tjoa E, Guan C (2020) A survey on explainable artificial intelligence (XAI): towards medical XAI. IEEE transactions on neural networks and learning systems
Tonekaboni S, Joshi S, McCradden MD, Goldenberg A (2019) What clinicians want: contextualizing explainable machine learning for clinical end use. In: Machine learning for healthcare conference. PMLR, pp 359–380
Tarski A, Mostowski A, Robinson RM (1953) Undecidable theories, vol 13. Elsevier, Amsterdam
MATH Google Scholar
Vered M, Howe P, Miller T, Sonenberg L, Velloso E (2020) Demand-driven transparency for monitoring intelligent agents. IEEE Trans Hum-Mach Syst 50(3):264–275
Article Google Scholar
Vilone G, Longo L (2020) Explainable artificial intelligence: a systematic review. ar**v preprint ar**v:2006.00093
Verma H, Schaer R, Reichenbach J, Jreige M, Prior JO, Evéquoz F, Depeursinge A. (2021) On improving physicians’ trust in AI: Qualitative inquiry with imaging experts in the oncological domain. BMC Medical Imaging, in review
Ward J (2019) The student’s guide to cognitive neuroscience. Routledge
Weller A (2019) Transparency: motivations and challenges. In: Explainable AI: interpreting, explaining and visualizing deep learning, Springer, pp 23–40
Whitworth B (2006) Social-technical systems. In: Encyclopedia of human computer interaction, IGI Global, pp 533–541
Willemink MJ, Koszek WA, Hardell C, Wu J, Fleischmann D, Harvey H, Folio LR, Summers RM, Rubin DL, Lungren MP (2020) Preparing medical imaging data for machine learning. Radiology 295(1):4–15
Article Google Scholar
Wachter S, Mittelstadt B, Floridi L (2017) Why a right to explanation of automated decision-making does not exist in the general data protection regulation. Int Data Privacy Law 7(2):76–99
Article Google Scholar
Wachter S, Mittelstadt B, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv JL Tech 31:841
Google Scholar
Yeh C-K, Hsieh C-Y, Suggala AS, Inouye DI, Ravikumar P (2019) On the (in)fidelity and sensitivity for explanations. ar**v preprint ar**v:1901.09392
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929
Zhang Y, Liao QV, Bellamy RKE (2020) Effect of confidence and explanation on accuracy and trust calibration in AI-assisted decision making. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, pp 295–305

Download references

Acknowledgements

This work was supported by AI4Media of the European Union’s Horizon 2020 (EU-H2020) research and innovation program under grant agreement No. 951911 and the Hasler Foundation with project numbers 21042 and 21064. V. Pulignano would like to acknowledge the funding by European Research Council (ERC) under the EU-H2020 grant agreement No. 833577—-ResPecTMe project ”Resolving Precariousness: Advancing the Theory and Measurement of Precariousness across the paid/unpaid work continuum”. J. Amorim was supported by the FCT Research Grant SFRH/BD/136786/2018."

Funding

Open access funding provided by University of Applied Sciences and Arts Western Switzerland (HES-SO).

Author information

Authors and Affiliations

CISUC, Department of Informatics Engineering, University of Coimbra, Pólo II, Pinhal de Marrocos, Coimbra, 3030-790, Portugal
José Pereira Amorim & Pedro Henriques Abreu
IPO-Porto Research Centre, Rua Dr. António Bernardino de Almeida, Porto, 4200-072, Portugal
José Pereira Amorim
University of Applied Sciences of Western Switzerland (HES-SO Valais), Rue du Technopole 3, Sierre, 3960, Valais, Switzerland
Mara Graziani, Davide Calvaresi, Adrien Depeursinge, Vincent Andrearczyk & Henning Müller
Department of Computer Science, University of Geneva (UniGe), Route de Drize 7, Carouge, 1227, Vaud, Switzerland
Mara Graziani
Department of Radiology and Medical Informatics, University of Geneva (UniGe), Rue Gabrielle-Perret-Gentil 4, Geneva, 1211, Vaud, Switzerland
Henning Müller
Faculty of Social Science, Centre for Sociological Research, Parkstraat 45 bus, Leuven, 3000, Belgium
Valeria Pulignano
Robert Schuman Centre, European University Institute, Via Boccaccio 121, Florence, 50133, Italy
Wessel Reijers
Institute of Logic, Language and Computation, University of Amsterdam, Spui 21, Amsterdam, 1012WX, Netherlands
Tobias Blanke
Department of Nuclear Medicine and Molecular Imaging, Lausanne University Hospital, Rue du Bugnon 46, Lausanne, 1011, Vaud, Switzerland
John O. Prior & Adrien Depeursinge
IBM Research Europe, 3 Technology Campus, Dublin, D15 HN66, Ireland
Rahul Nair
Centre for IT and IP Law, KU Leuven, Sint-Michielsstraat 6, Leuven, 3000, Belgium
Lidia Dutkiewicz & Katerina Yordanova
Department of Data Science and AI, Monash University, Wellington Rd, Clayton VIC, Melbourne, 3800, Australia
Mor Vered
Institute of Philosophy, KU Leuven, Kardinaal Mercierplein 2, bus 3200, Leuven, 3000, Belgium
Lode Lauwaert

Authors

Mara Graziani
View author publications
You can also search for this author in PubMed Google Scholar
Lidia Dutkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Davide Calvaresi
View author publications
You can also search for this author in PubMed Google Scholar
José Pereira Amorim
View author publications
You can also search for this author in PubMed Google Scholar
Katerina Yordanova
View author publications
You can also search for this author in PubMed Google Scholar
Mor Vered
View author publications
You can also search for this author in PubMed Google Scholar
Rahul Nair
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Henriques Abreu
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Blanke
View author publications
You can also search for this author in PubMed Google Scholar
Valeria Pulignano
View author publications
You can also search for this author in PubMed Google Scholar
John O. Prior
View author publications
You can also search for this author in PubMed Google Scholar
Lode Lauwaert
View author publications
You can also search for this author in PubMed Google Scholar
Wessel Reijers
View author publications
You can also search for this author in PubMed Google Scholar
Adrien Depeursinge
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Andrearczyk
View author publications
You can also search for this author in PubMed Google Scholar
Henning Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mara Graziani.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Graziani, M., Dutkiewicz, L., Calvaresi, D. et al. A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences. Artif Intell Rev 56, 3473–3504 (2023). https://doi.org/10.1007/s10462-022-10256-8

Download citation

Published: 06 September 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s10462-022-10256-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences

Abstract

Similar content being viewed by others

Applying a Principle of Explicability to AI Research in Africa: Should We Do It?

AI, Explainability and Public Reason: The Argument from the Limitations of the Human Mind

Ethics as a Service: A Pragmatic Operationalisation of AI Ethics

1 Introduction

4.2 A global definition of interpretable AI

4.3 A global taxonomy

4.4 Use of the proposed terminology to classify interpretability techniques

5 A case study: the medical domain

6 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences

Abstract

Similar content being viewed by others

Applying a Principle of Explicability to AI Research in Africa: Should We Do It?

AI, Explainability and Public Reason: The Argument from the Limitations of the Human Mind

Ethics as a Service: A Pragmatic Operationalisation of AI Ethics

1 Introduction

4.2 A global definition of interpretable AI

4.3 A global taxonomy

4.4 Use of the proposed terminology to classify interpretability techniques

5 A case study: the medical domain

6 Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation