Abstract
One goal of Explainable Artificial Intelligence (XAI) is to interpret and explain the inferential process of data-driven machine-learned models to make it comprehensible for humans. To reach it, it is necessary to have a reliable tool to collect the opinions of human users about the explanations generated by XAI methods of trained complex models. Psychometrics can be defined as the science behind psychological assessment. It studies the theory and techniques for measuring latent constructs such as intelligence, introversion, and conscientiousness. The knowledge developed in psychometrics was exploited to develop and evaluate a novel questionnaire for reliably evaluating the explanations produced by XAI methods. Explainability is a multi-faceted concept. Thus, it was necessary to create a set of questions to assess various facets and return a comprehensive, reliable measurement of explainability. The questionnaire development process was divided into two phases. First, a pilot study was designed and carried out to test the first version of the questionnaire. The results of this study were exploited to create a second, refined version of the questionnaire. The questionnaire was evaluated by assessing 1) its internal structure with the Exploratory Factor Analysis to analyse the interrelationships between the questionnaire’s items, 2) its reliability with the Cronbach alpha tests, and 3) its construct validity by comparing the distribution of the questionnaire’s answers with a set of quantitative metrics. Results showed that the questionnaire is promising as it was deemed a valid and reliable tool for evaluating XAI methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cappelleri, J.C., Gerber, R.A., Kourides, I.A., Gelfand, R.A.: Development and factor analysis of a questionnaire to measure patient satisfaction with injected and inhaled insulin for type 1 diabetes. Diabetes Care 23(12), 1799–1803 (2000)
Dragoni, M., Donadello, I., Eccher, C.: Explainable AI meets persuasiveness: translating reasoning results into behavioral change advice. Artif. Intell. Med. 101840 (2020). https://doi.org/10.1016/j.artmed.2020.101840
Field, A., Miles, J., Field, Z.: Discovering Statistics Using R. Sage Publications, Ltd., Great Britain (2012)
Finch, J.F., West, S.G.: The investigation of personality structure: statistical models. J. Res. Pers. 31(4), 439–485 (1997)
Fung, G., Sandilya, S., Rao, R.B.: Rule extraction from linear support vector machines. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 32–40. ACM, Chicago (2005). https://doi.org/10.1145/1081870.1081878
Furr, R.M.: Psychometrics: An Introduction. SAGE publications (2021)
Gunning, D., Aha, D.: DARPA’s explainable artificial intelligence (XAI) program. AI Mag. 40(2), 44–58 (2019)
Gunning, D., Vorm, E., Wang, Y., Turek, M.: DARPA’s explainable AI (XAI) program: a retrospective. Authorea Preprints (2021)
Hair, J., Black, W., Babin, B., Anderson, R.: Multivariate Data Analysis: Pearson New International Edition PDF eBook. Pearson Education (2013)
Harbers, M., van den Bosch, K., Meyer, J.J.: Design and evaluation of explainable BDI agents. In: 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 2, pp. 125–132. IEEE, Toronto (2010). https://doi.org/10.1109/wi-iat.2010.115
Harbers, M., Broekens, J., Van Den Bosch, K., Meyer, J.J.: Guidelines for develo** explainable cognitive models. In: Proceedings of ICCM, pp. 85–90. Citeseer, Berlin (2010)
Holzinger, A., Carrington, A., Müller, H.: Measuring the quality of explanations: the system causability scale (SCS): comparing human and machine explanations. KI-Künstliche Intell. 34(2), 193–198 (2020). https://doi.org/10.1007/s13218-020-00636-z
Huysmans, J., Dejaeger, K., Mues, C., Vanthienen, J., Baesens, B.: An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models. Decis. Support Syst. 51(1), 141–154 (2011). https://doi.org/10.1016/j.dss.2010.12.003
Kaiser, H.F., Rice, J.: Little jiffy, mark IV. Educ. Psychol. Measur. 34(1), 111–117 (1974). https://doi.org/10.1177/001316447403400115
Lim, B.Y., Dey, A.K.: Assessing demand for intelligibility in context-aware applications. In: Proceedings of the 11th International Conference on Ubiquitous Computing, pp. 195–204. ACM, Orlando (2009). https://doi.org/10.1145/1620545.1620576
Lim, B.Y., Dey, A.K., Avrahami, D.: Why and why not explanations improve the intelligibility of context-aware intelligent systems. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2119–2128. ACM, Boston (2009). https://doi.org/10.1145/1518701.1519023
Nichols, L.A., Nicki, R.: Development of a psychometrically sound internet addiction scale: a preliminary step. Psychol. Addict. Behav. 18(4), 381 (2004)
Oldendick, R.W.: Question order effects. In: Encyclopedia of Survey Research Methods, pp. 664–665. Sage Publications Inc., California (2008). https://doi.org/10.4135/9781412963947
Pazzani, M.J.: Knowledge discovery from data? IEEE Intell. Syst. Their Appl. 15(2), 10–12 (2000)
Pew Research Centre: Religious beliefs underpin opposition to homosexuality (2003). https://www.pewresearch.org/religion/2003/11/18/religious-beliefs-underpin-opposition-to-homosexuality/. Accessed 23 Dec 2022
Robins, R.W., Hendin, H.M., Trzesniewski, K.H.: Measuring global self-esteem: construct validation of a single-item measure and the Rosenberg self-esteem scale. Pers. Soc. Psychol. Bull. 27(2), 151–161 (2001). https://doi.org/10.1177/0146167201272002
Rust, J., Kosinski, M., Stillwell, D.: Modern Psychometrics: The Science of Psychological Assessment, 4th edn. Routledge (2020). https://doi.org/10.4324/9781315637686
Tomé-Fernández, M., Fernández-Leyva, C., Olmedo-Moreno, E.M.: Exploratory and confirmatory factor analysis of the social skills scale for young immigrants. Sustainability 12(17), 6897 (2020). https://doi.org/10.3390/su12176897
Vilone, G., Longo, L.: Notions of explainability and evaluation approaches for explainable artificial intelligence. Inf. Fusion 76, 89–106 (2021). https://doi.org/10.1016/j.inffus.2021.05.009
Vilone, G., Longo, L.: A quantitative evaluation of global, rule-based explanations of post-hoc, model agnostic methods. Front. Artif. Intell. 4, 717899 (2021)
Vilone, G., Longo, L.: A global model-agnostic XAI method for the automatic formation of an abstract argumentation framework and its objective evaluation. In: 1st International Workshop on Argumentation for eXplainable AI Co-located with 9th International Conference on Computational Models of Argument (COMMA 2022), p. 2119. CEUR Workshop Proceedings (2022)
Vilone, G., Longo, L.: A novel human-centred evaluation approach and an argument-based method for explainable artificial intelligence. In: Maglogiannis, I., Iliadis, L., Macintyre, J., Cortez, P. (eds.) AIAI 2022, Part I. IFIP Advances in Information and Communication Technology, vol. 646, pp. 447–460. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-08333-4_36
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vilone, G., Longo, L. (2023). Development of a Human-Centred Psychometric Test for the Evaluation of Explanations Produced by XAI Methods. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1903. Springer, Cham. https://doi.org/10.1007/978-3-031-44070-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-031-44070-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44069-4
Online ISBN: 978-3-031-44070-0
eBook Packages: Computer ScienceComputer Science (R0)