Abstract
Nodaways, diverse artificial intelligence techniques have been applied to analyse datasets in the legal domain. Precisely, several studies aim at predicting the decision to help the competent authority resolve a specific legal process. However, AI-based prediction algorithms are usually black-box, and explaining why the algorithm predicted a label remains challenging. Therefore, this paper proposes a 5-step methodology for analysing legal documents from the agency responsible for resolving administrative sanction procedures related to consumer protection. Our methodology starts with corpus collection, pre-processing, and TF vectorisation. Later, fifteen machine and deep learning algorithms were tested, and the best-performing one was selected based on quality metrics. Interpretability is emphasised, with the SHAP scores used to explain predictions. The results show that our methodology contributes to the understanding the decisive influence of legal terms and their connection to the decision made by the competent authority. By providing tools for legal professionals to make more informed decisions, develop effective legal strategies, and ensure fairness and transparency in the legal decision-making process, this methodology has broad implications for various legal areas beyond disputes, including administrative procedures like bankruptcies and unfair competition.
Similar content being viewed by others
Data Availability
The dataset is available at https://github.com/huvaso/Interpretability_Legal_Domain
Code Availability
For reproducibility purposes, the code is available on https://github.com/huvaso/Interpretability_Legal_Domain
Notes
National Institute for the Defense of Competition and the Protection of Intellectual Property https://www.gob.pe/indecopi
References
Ables, J., Kirby, T., Anderson, W., Mittal, S., Rahimi, S., Banicescu, I., Seale, M.: Creating an explainable intrusion detection system using self organizing maps. In: 2022 IEEE Symposium Series on Computational Intelligence (SSCI), 404. IEEE (2022)
Abulaish, M., Sah, A.K.: A text data augmentation approach for improving the performance of CNN. In: 2019 11th International Conference on Communication Systems & Networks (COMSNETS), 625. IEEE (2019)
Alam, S., Yao, N.: The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis. Comput. Math. Organ. Theory 25, 319 (2019)
Bhambhoria, R., Dahan, S., Zhu, X.: Investigating the State-of-the-Art Performance and Explainability of Legal Judgment Prediction. In: Canadian Conference on AI (2021)
Bhambhoria, R., Liu, H., Dahan, S., Zhu, X.: Interpretable low-resource legal decision making. In: Proceedings of the AAAI Conference on Artificial Intelligence 36, 11819 (2022)
Costa, J.A.F., Dantas, N.C.D., Silva, E.D.S.: Evaluating Text Classification in the Legal Domain Using BERT Embeddings. In: International Conference on Intelligent Data Engineering and Automated Learning, 51. Springer (2023)
Danowski, J.A., Yan, B., Riopelle, K.: A semantic network approach to measuring sentiment. Qual. & Quant. 55, 221 (2021)
de Arriba-Pérez, F., García-Méndez, S., González-Castaño, F.J., González-González, J.: Explainable machine learning multi-label classification of Spanish legal judgements. J. K. Saud Univ.-Comput. Inf. Sci. 34, 10180 (2022)
Deliu, N.: Reinforcement learning for sequential decision making in population research. Qual. & Quant. 1 (2023)
Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. ar**v preprintar**v:1702.08608 (2017)
Durand, C., Peña Ibarra, L.P., Rezgui, N., Wutchiett, D.: How to combine and analyze all the data from diverse sources: a multilevel analysis of institutional trust in the world. Qual. & Quant., 1 (2021)
Garreau, D., Luxburg, U.: Explaining the explainer: a first theoretical analysis of LIME. In: International conference on artificial intelligence and statistics, 1287. PMLR (2020)
González-González, J., de Arriba-Pérez, F., García-Méndez, S., Busto-Castiñeira, A., González-Castaño, F.J.: Automatic explanation of the classification of Spanish legal judgments in jurisdiction-dependent law categories with tree estimators. J. K. Saud Univ.-Comput. Inf. Sci. 35, 101634 (2023)
Graziani, M., Dutkiewicz, L., Calvaresi, D., Amorim, J.P., Yordanova, K., Vered, M., Nair, R., Abreu, P.H., Blanke, T., Pulignano, V., et al.: A global taxonomy of interpretable AI: unifying the terminology for the technical and social sciences. Artif. Intell. Rev. 56, 3473 (2023)
Ha, C., Tran, V.-D., Van, L.N., Than, K.: Eliminating overfitting of probabilistic topic models on short and noisy text: The role of dropout. Int. J. Approx. Reason. 112, 85 (2019)
He, C., Tan, T.-P., Xue, S., Tan, Y.: Explaining legal judgments: A multitask learning framework for enhancing factual consistency in rationale generation. J. K. Saud Univ.-Comput. Inf. Sci. 35, 101868 (2023)
Krzeszewska, U., Poniszewska-Marańda, A., Ochelska-Mierzejewska, J.: Systematic comparison of vectorization methods in classification context. Appl. Sci. 12, 5119 (2022)
Lessmann, S., Baesens, B., Seow, H.-V., Thomas, L.C.: Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 247, 124 (2015)
Lisboa, P., Saralajew, S., Vellido, A., Fernández-Domenech, R., Villmann, T.: The coming of age of interpretable and explainable machine learning models. Neurocomputing 535, 25 (2023)
Liu, L., Zhang, W., Liu, J., Shi, W., Huang, Y.: Interpretable charge prediction for legal cases based on interdependent legal information. In: 2021 International Joint Conference on Neural Networks (IJCNN), 1. IEEE (2021)
Lossio-Ventura, J.A., Morzan, J., Alatrista-Salas, H., Hernandez-Boussard, T., Bian, J.: Clustering and topic modeling over tweets: A comparison over a health dataset. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 1544 (2019). https://doi.org/10.1109/BIBM47256.2019.8983167
Lossio-Ventura, J.A., Gonzales, S., Morzan, J., Alatrista-Salas, H., Hernandez-Boussard, T., Bian, J.: Evaluation of clustering and topic modeling methods over health-related tweets and emails. Artif. Intel. Med. 117, 102096 (2021)
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 30 (2017)
Luo, C.F., Bhambhoria, R., Dahan, S., Zhu, X.: Prototype-Based Interpretability for Legal Citation Prediction. (2023) ar**v preprintar**v:2305.16490
Medvedeva, M., Wieling, M., Vols, M.: Rethinking the field of automatic prediction of court decisions. Artif. Intel. Law 31, 195 (2023)
Moosbauer, J., Herbinger, J., Casalicchio, G., Lindauer, M., Bischl, B.: Explaining hyperparameter optimization via partial dependence plots. Adva. Neural Inf. Process. Syst. 34, 2280 (2021)
Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116, 22071 (2019)
Neupane, S., Ables, J., Anderson, W., Mittal, S., Rahimi, S., Banicescu, I., Seale, M.: Explainable intrusion detection systems (x-ids): A survey of current methods, challenges, and opportunities. IEEE Access 10, 112392 (2022)
Nowak, A.S., Radzik, T.: The Shapley value for n-person games in generalized characteristic function form. Games Econ. Behav. 6, 150 (1994)
Rani, D., Kumar, R., Chauhan, N.: Study and Comparision of Vectorization Techniques Used in Text Classification. In: 2022 13th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1. IEEE (2022)
Roelofs, R.: Measuring Generalization and overfitting in Machine learning. University of California, Berkeley (2019)
Solanke, A.A.: Explainable digital forensics AI: Towards mitigating distrust in AI-based digital forensics analysis using interpretable models. Forensic Sci. Int.: Digit. Investig. 42, 301403 (2022)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929 (2014)
Sun, X., Ren, X., Ma, S., Wang, H.: Meprop: sparsified back propagation for accelerated deep learning with reduced overfitting. In: International Conference on Machine Learning, 3299. PMLR (2017)
Suresh, A., Wu, C.-H., Grossglauser, M.: It’s all relative: interpretable models for scoring bias in documents. (2023) ar** study and cross-benchmark evaluation. Comput. Sci. Rev. 39, 100357 (2021)
Wysmułek, I., Tomescu-Dubrow, I., Kwak, J.: Ex-post harmonization of cross-national survey data: advances in methodological and substantive inquiries. Qual. & Quant. 1 (2021)
Zhong, H., Wang, Y., Tu, C., Zhang, T., Liu, Z., Sun, M.: Iteratively questioning and answering for interpretable legal judgment prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence 34, 1250 (2020)
Zhou, J., Troyanskaya, O.G.: An analytical framework for interpretable and generalizable single-cell data analysis. Nat. methods 18, 1317 (2021)
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
The authors have contributed in equal measure to the drafting of this document.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Alcántara Francia, O.A., Nunez-del-Prado, M. & Alatrista-Salas, H. Exploring the interpretability of legal terms in tasks of classification of final decisions in administrative procedures. Qual Quant (2024). https://doi.org/10.1007/s11135-024-01882-1
Accepted:
Published:
DOI: https://doi.org/10.1007/s11135-024-01882-1