Part of the book series: Synthesis Lectures on Human Language Technologies ((SLHLT))

  • 102 Accesses

Abstract

In this chapter, we discuss how behavioral patterns in NLP models can be analyzed. We discuss how models can be systematically tested for linguistic knowledge and introduce methods to diagnose cognitively implausible sensitivity against perturbations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 32.09
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
EUR 42.79
Price includes VAT (Germany)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Unless dropout is applied during testing, see Sect. 4.1.2.

  2. 2.

    The example is taken from [56] but we replaced the original main verb puts with places to avoid ambiguity with the past tense form.

  3. 3.

    For an interesting discussion about the fuzzy definition of the “right to explanation”, see Chapter 12.3 of Søgaard [122].

References

  1. Nick Chater, Joshua B Tenenbaum, and Alan Yuille. Probabilistic models of cognition: Conceptual foundations, 2006.

    Google Scholar 

  2. John B Watson. Psychology as the behaviorist views it. Psychological review, 20 (2): 158, 1913.

    Google Scholar 

  3. Yvette Graham, Christian Federmann, Maria Eskevich, and Barry Haddow. Assessing human-parity in machine translation on the segment level. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 4199–4207, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.375. https://aclanthology.org/2020.findings-emnlp.375.

  4. Kaiyang Zhou, Ziwei Liu, Yu Qiao, Tao **ang, and Chen Change Loy. Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

    Google Scholar 

  5. Alan Ramponi and Barbara Plank. Neural unsupervised domain adaptation in NLP—A survey. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6838–6855, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.603. https://aclanthology.org/2020.coling-main.603.

  6. Nan Du, Yan** Huang, Andrew M Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, et al. Glam: Efficient scaling of language models with mixture-of-experts. In International Conference on Machine Learning, pages 5547–5569. PMLR, 2022.

    Google Scholar 

  7. Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. Palm: Scaling language modeling with pathways. ar**v preprint ar**v:2204.02311, 2022.

  8. Dirk Hovy and Shannon L. Spruit. The social impact of natural language processing. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 591–598, Berlin, Germany, August 2016. Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-2096. https://aclanthology.org/P16-2096.

  9. Emily M. Bender and Batya Friedman. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics, 6:587–604, 2018. https://doi.org/10.1162/tacl_a_00041. https://aclanthology.org/Q18-1041.

  10. Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé Iii, and Kate Crawford. Datasheets for datasets. Communications of the ACM, 64 (12): 86–92, 2021.

    Google Scholar 

  11. Amandalynne Paullada, Inioluwa Deborah Raji, Emily M Bender, Emily Denton, and Alex Hanna. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns, 2 (11): 100336, 2021.

    Google Scholar 

  12. Alan Baker. Simplicity. In Edward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Summer 2022 edition, 2022.

    Google Scholar 

  13. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Occam’s razor. Information processing letters, 24 (6): 377–380, 1987.

    Google Scholar 

  14. Jacob Feldman. The simplicity principle in human concept learning. Current Directions in Psychological Science, 12(6):227–232, 2003. https://doi.org/10.1046/j.0963-7214.2003.01267.x. https://doi.org/10.1046/j.0963-7214.2003.01267.x.

  15. Nick Chater and Paul Vit?nyi. Simplicity: a unifying principle in cognitive science? Trends in Cognitive Sciences, 7 (1): 19–22, 2003. ISSN 1364-6613. https://doi.org/10.1016/S1364-6613(02)00005-0. https://www.sciencedirect.com/science/article/pii/S1364661302000050.

  16. Matt Gardner, Yoav Artzi, Victoria Basmov, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hannaneh Hajishirzi, Gabriel Ilharco, Daniel Khashabi, Kevin Lin, Jiangming Liu, Nelson F. Liu, Phoebe Mulcaire, Qiang Ning, Sameer Singh, Noah A. Smith, Sanjay Subramanian, Reut Tsarfaty, Eric Wallace, Ally Zhang, and Ben Zhou. Evaluating models’ local decision boundaries via contrast sets. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1307–1323, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.117. https://aclanthology.org/2020.findings-emnlp.117.

  17. Adam Poliak, Jason Naradowsky, Aparajita Haldar, Rachel Rudinger, and Benjamin Van Durme. Hypothesis only baselines in natural language inference. In Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics, pages 180–191, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/S18-2023. https://aclanthology.org/S18-2023.

  18. Mor Geva, Yoav Goldberg, and Jonathan Berant. Are we modeling the task or the annotator? an investigation of annotator bias in natural language understanding datasets. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1161–1166, Hong Kong, China, November 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1107. https://aclanthology.org/D19-1107.

  19. Karan Goel, Nazneen Fatema Rajani, Jesse Vig, Zachary Taschdjian, Mohit Bansal, and Christopher Ré. Robustness gym: Unifying the NLP evaluation landscape. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations, pages 42–55, Online, June 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-demos.6. https://aclanthology.org/2021.naacl-demos.6.

  20. Kawin Ethayarajh, Ye** Choi, and Swabha Swayamdipta. Understanding dataset difficulty with \({\cal V\it }\)-usable information. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 5988–6008. PMLR, 17–23 Jul 2022. https://proceedings.mlr.press/v162/ethayarajh22a.html.

  21. Jonathan K. Kummerfeld, David Hall, James R. Curran, and Dan Klein. Parser showdown at the Wall Street corral: An empirical investigation of error types in parser output. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1048–1059, Jeju Island, Korea, July 2012. Association for Computational Linguistics. https://aclanthology.org/D12-1096.

  22. Marc-Antoine Rondeau and T. J. Hazen. Systematic error analysis of the Stanford question answering dataset. In Proceedings of the Workshop on Machine Reading for Question Answering, pages 12–20, Melbourne, Australia, July 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-2602. https://aclanthology.org/W18-2602.

  23. Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, and Daniel Weld. Errudite: Scalable, reproducible, and testable error analysis. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 747–763, Florence, Italy, July 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1073. https://aclanthology.org/P19-1073.

  24. Aparna Elangovan, Jiayuan He, and Karin Verspoor. Memorization vs. generalization : Quantifying data leakage in NLP performance evaluation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1325–1335, Online, April 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.eacl-main.113. https://aclanthology.org/2021.eacl-main.113.

  25. Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 107–112, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-2017. https://aclanthology.org/N18-2017.

  26. Jonathan Kamp, Lisa Beinborn, and Antske Fokkens. Perturbations and subpopulations for testing robustness in token-based argument unit recognition. In Proceedings of the 9th Workshop on Argument Mining, pages 62–73, Online and in Gyeongju, Republic of Korea, October 2022. International Conference on Computational Linguistics. https://aclanthology.org/2022.argmining-1.5.

  27. Ruiqi Zhong, Dhruba Ghosh, Dan Klein, and Jacob Steinhardt. Are larger pretrained language models uniformly better? comparing performance at the instance level. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 3813–3827, Online, August 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-acl.334. https://aclanthology.org/2021.findings-acl.334.

  28. Urja Khurana, Eric Nalisnick, and Antske Fokkens. How emotionally stable is ALBERT? testing robustness with stochastic weight averaging on a sentiment analysis task. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 16–31, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.eval4nlp-1.3. https://aclanthology.org/2021.eval4nlp-1.3.

  29. Barbara Plank, Dirk Hovy, and Anders Søgaard. Linguistically debatable or just plain wrong? In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 507–511, Baltimore, Maryland, June 2014. Association for Computational Linguistics. https://doi.org/10.3115/v1/P14-2083. https://aclanthology.org/P14-2083.

  30. Ellie Pavlick and Tom Kwiatkowski. Inherent disagreements in human textual inferences. Transactions of the Association for Computational Linguistics, 7:677–694, 2019. https://doi.org/10.1162/tacl_a_00293. https://aclanthology.org/Q19-1043.

  31. **nliang Frederick Zhang and Marie-Catherine de Marneffe. Identifying inherent disagreement in natural language inference. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4908–4915, Online, June 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.390. https://aclanthology.org/2021.naacl-main.390.

  32. Runzhe Zhan, Xuebo Liu, Derek F. Wong, and Lidia S. Chao. Difficulty-aware machine translation evaluation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 26–32, Online, August 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-short.5. https://aclanthology.org/2021.acl-short.5.

  33. Pepa Atanasova, Jakob Grue Simonsen, Christina Lioma, and Isabelle Augenstein. A diagnostic study of explainability techniques for text classification. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3256–3274, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.263. https://aclanthology.org/2020.emnlp-main.263.

  34. Allyson Ettinger. What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models. Transactions of the Association for Computational Linguistics, 8:34–48, 2020. https://doi.org/10.1162/tacl_a_00298. https://aclanthology.org/2020.tacl-1.3.

  35. Shi Feng, Eric Wallace, Alvin Grissom II, Mohit Iyyer, Pedro Rodriguez, and Jordan Boyd-Graber. Pathologies of neural models make interpretations difficult. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3719–3728, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1407. https://aclanthology.org/D18-1407.

  36. Shrey Desai and Greg Durrett. Calibration of pre-trained transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 295–302, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.21. https://aclanthology.org/2020.emnlp-main.21.

  37. Jennifer Hu, Jon Gauthier, Peng Qian, Ethan Wilcox, and Roger Levy. A systematic assessment of syntactic generalization in neural language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1725–1744, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.158. https://aclanthology.org/2020.acl-main.158.

  38. Joris Baan, Wilker Aziz, Barbara Plank, and Raquel Fernandez. Stop measuring calibration when humans disagree. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 1892–1915, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. https://aclanthology.org/2022.emnlp-main.124.

  39. Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, and Noah A. Smith. The right tool for the job: Matching model and instance complexities. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6640–6651, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.593. https://aclanthology.org/2020.acl-main.593.

  40. Jasmijn Bastings, Sebastian Ebert, Polina Zablotskaia, Anders Sandholm, and Katja Filippova. “will you find these shortcuts?” a protocol for evaluating the faithfulness of input salience methods for text classification. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 976–991, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. https://aclanthology.org/2022.emnlp-main.64.

  41. Jack C Richards. Curriculum development in language teaching. Cambridge University Press, 2001.

    Google Scholar 

  42. Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng, Sheng-Fu Wang, and Samuel R. Bowman. BLiMP: The benchmark of linguistic minimal pairs for English. Transactions of the Association for Computational Linguistics, 8: 377–392, 2020. https://doi.org/10.1162/tacl_a_00321. https://aclanthology.org/2020.tacl-1.25.

  43. Alex Warstadt, Yian Zhang, **aocheng Li, Haokun Liu, and Samuel R. Bowman. Learning which features matter: RoBERTa acquires a preference for linguistic generalizations (eventually). In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 217–235, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.16. https://aclanthology.org/2020.emnlp-main.16.

  44. Jennifer C. White and Ryan Cotterell. Examining the inductive bias of neural language models with artificial languages. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 454–463, Online, August 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.38. https://aclanthology.org/2021.acl-long.38.

  45. Lukas Galke, Yoav Ram, and Limor Raviv. What makes a language easy to deep-learn? ar**v preprint ar**v:2302.12239, 2023.

  46. Limor Raviv, Marianne de Heer Kloots, and Antje Meyer. What makes a language easy to learn? a preregistered study on how systematic structure and community size affect language learnability. Cognition, 210: 104620, 2021. ISSN 0010-0277. https://doi.org/10.1016/j.cognition.2021.104620. https://www.sciencedirect.com/science/article/pii/S0010027721000391.

  47. Andrew T Hendrickson and Amy Perfors. Cross-situational learning in a zipfian environment. Cognition, 189: 11–22, 2019.

    Google Scholar 

  48. Amir Shufaniya and Inbal Arnon. A cognitive bias for zipfian distributions? uniform distributions become more skewed via cultural transmission. Journal of Language Evolution, 7(1):59–80, 2022.

    Article  Google Scholar 

  49. Ori Lavi-Rotbain and Inbal Arnon. The learnability consequences of zipfian distributions in language. Cognition, 223:105038, 2022.

    Article  Google Scholar 

  50. Swabha Swayamdipta, Roy Schwartz, Nicholas Lourie, Yizhong Wang, Hannaneh Hajishirzi, Noah A. Smith, and Ye** Choi. Dataset cartography: Map** and diagnosing datasets with training dynamics. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9275–9293, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.746. https://aclanthology.org/2020.emnlp-main.746.

  51. A.M. Turing. Computing machinery and intelligence. Computing Machinery and Intelligence, page 433–460, 1950. https://www.scopus.com/inward/record.uri?eid=2-s2.0-0011983060 &partnerID=40 &md5=b99ae3ebef56a66e44fe24b19073c0d8. Cited by: 191.

  52. Dieuwke Hupkes, Mario Giulianelli, Verna Dankers, Mikel Artetxe, Yanai Elazar, Tiago Pimentel, Christos Christodoulopoulos, Karim Lasri, Naomi Saphra, Arabella Sinclair, et al. State-of-the-art generalisation research in nlp: a taxonomy and review. ar**v preprint ar**v:2210.03050, 2022.

  53. Frank Keller. Cognitively plausible models of human language processing. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics: Short Papers, pages 60–67, 2010.

    Google Scholar 

  54. Roberto Navigli. Natural language understanding: Instructions for (present and future) use. In IJCAI, volume 18, pages 5697–5702, 2018.

    Google Scholar 

  55. Emily M. Bender and Alexander Koller. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.463. https://aclanthology.org/2020.acl-main.463.

  56. Mario Giulianelli, Jack Harding, Florian Mohnert, Dieuwke Hupkes, and Willem Zuidema. Under the hood: Using diagnostic classifiers to investigate and improve how language models track agreement information. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 240–248, Brussels, Belgium, November 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-5426. https://aclanthology.org/W18-5426.

  57. Harm Brouwer, Francesca Delogu, Noortje J Venhuizen, and Matthew W Crocker. Neurobehavioral correlates of surprisal in language comprehension: A neurocomputational model. Frontiers in Psychology, 12: 615538, 2021.

    Google Scholar 

  58. Tal Linzen, Emmanuel Dupoux, and Yoav Goldberg. Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4:521–535, 2016. https://doi.org/10.1162/tacl_a_00115. https://aclanthology.org/Q16-1037.

  59. Jaap Jumelet and Dieuwke Hupkes. Do language models understand anything? on the ability of LSTMs to understand negative polarity items. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 222–231, Brussels, Belgium, November 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-5424. https://aclanthology.org/W18-5424.

  60. Rebecca Marvin and Tal Linzen. Targeted syntactic evaluation of language models. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1192–1202, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1151. https://aclanthology.org/D18-1151.

  61. Jon Gauthier, Jennifer Hu, Ethan Wilcox, Peng Qian, and Roger Levy. SyntaxGym: An online platform for targeted evaluation of language models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 70–76, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-demos.10. https://aclanthology.org/2020.acl-demos.10.

  62. Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, and Marco Baroni. Colorless green recurrent networks dream hierarchically. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1195–1205, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1108. https://aclanthology.org/N18-1108.

  63. Jannis Vamvas and Rico Sennrich. On the limits of minimal pairs in contrastive evaluation. In Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, pages 58–68, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.blackboxnlp-1.5. https://aclanthology.org/2021.blackboxnlp-1.5.

  64. Hector Levesque, Ernest Davis, and Leora Morgenstern. The winograd schema challenge. In Thirteenth international conference on the principles of knowledge representation and reasoning, 2012.

    Google Scholar 

  65. Ernest Davis, Leora Morgenstern, and Charles L Ortiz. The first winograd schema challenge at ijcai-16. AI Magazine, 38 (3): 97–98, 2017.

    Google Scholar 

  66. Rachel Rudinger, Jason Naradowsky, Brian Leonard, and Benjamin Van Durme. Gender bias in coreference resolution. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 8–14, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-2002. https://aclanthology.org/N18-2002.

  67. Vid Kocijan, Ernest Davis, Thomas Lukasiewicz, Gary Marcus, and Leora Morgenstern. The defeat of the winograd schema challenge. ar**v preprint ar**v:2201.02387, 2022.

  68. Tal Linzen. How can we accelerate progress towards human-like linguistic generalization? In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5210–5217, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.465. https://aclanthology.org/2020.acl-main.465.

  69. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, November 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/W18-5446. https://aclanthology.org/W18-5446.

  70. Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. Superglue: A stickier benchmark for general-purpose language understanding systems. Advances in neural information processing systems, 32, 2019.

    Google Scholar 

  71. Sebastian Ruder. Challenges and Opportunities in NLP Benchmarking. http://ruder.io/nlp-benchmarking, 2021.

  72. Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, et al. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. ar**v preprint ar**v:2206.04615, 2022.

  73. Lloyd S Shapley. A value for n-person games. Classics in game theory, 69, 1997.

    Google Scholar 

  74. Ian Covert, Scott M Lundberg, and Su-In Lee. Explaining by removing: A unified framework for model explanation. J. Mach. Learn. Res., 22: 209–1, 2021.

    Google Scholar 

  75. Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. Explanations based on the missing: towards contrastive explanations with pertinent negatives. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 590–601, 2018.

    Google Scholar 

  76. Sandipan Sikdar, Parantapa Bhattacharya, and Kieran Heese. Integrated directional gradients: Feature interaction attribution for neural NLP models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 865–878, Online, August 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.71. https://aclanthology.org/2021.acl-long.71.

  77. Jaap Jumelet, Willem Zuidema, and Dieuwke Hupkes. Analysing neural language models: Contextual decomposition reveals default reasoning in number and gender assignment. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), pages 1–11, Hong Kong, China, November 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/K19-1001. https://aclanthology.org/K19-1001.

  78. Marco Tulio Ribeiro, Tongshuang Wu, Carlos Guestrin, and Sameer Singh. Beyond accuracy: Behavioral testing of NLP models with CheckList. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4902–4912, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.442. https://aclanthology.org/2020.acl-main.442.

  79. Koustuv Sinha, Prasanna Parthasarathi, Joelle Pineau, and Adina Williams. UnNatural Language Inference. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7329–7346, Online, August 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.569. https://aclanthology.org/2021.acl-long.569.

  80. Eric Wallace, Pedro Rodriguez, Shi Feng, Ikuya Yamada, and Jordan Boyd-Graber. Trick me if you can: Human-in-the-loop generation of adversarial examples for question answering. Transactions of the Association for Computational Linguistics, 7:387–401, 2019. https://doi.org/10.1162/tacl_a_00279. https://aclanthology.org/Q19-1029.

  81. Yuning Ding, Brian Riordan, Andrea Horbach, Aoife Cahill, and Torsten Zesch. Don’t take “nswvtnvakgxpm” for an answer –the surprising vulnerability of automatic content scoring systems to adversarial input. In Proceedings of the 28th International Conference on Computational Linguistics, pages 882–892, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.76. https://aclanthology.org/2020.coling-main.76.

  82. Wencong You and Daniel Lowd. Towards stronger adversarial baselines through human-AI collaboration. In Proceedings of NLP Power! The First Workshop on Efficient Benchmarking in NLP, pages 11–21, Dublin, Ireland, May 2022. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.nlppower-1.2. https://aclanthology.org/2022.nlppower-1.2.

  83. Robin Jia, Aditi Raghunathan, Kerem Göksel, and Percy Liang. Certified robustness to adversarial word substitutions. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 4129–4142, Hong Kong, China, November 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1423. https://aclanthology.org/D19-1423.

  84. Siwon Kim, Jihun Yi, Eunji Kim, and Sungroh Yoon. Interpretation of NLP models through input marginalization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 3154–3167, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.255. https://aclanthology.org/2020.emnlp-main.255.

  85. Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. In International conference on machine learning, pages 1885–1894. PMLR, 2017.

    Google Scholar 

  86. **aofei Sun, Diyi Yang, **aoya Li, Tianwei Zhang, Yuxian Meng, Qiu Han, Guoyin Wang, Eduard Hovy, and Jiwei Li. Interpreting deep learning models in natural language processing: A review. ar**v preprint ar**v:2110.10470, 2021.

  87. Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. DROP: A reading comprehension benchmark requiring discrete reasoning over paragraphs. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 2368–2378, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/N19-1246. https://aclanthology.org/N19-1246.

  88. Sebastian Ruder and Avi Sil. Multi-domain multilingual question answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts, pages 17–21, Punta Cana, Dominican Republic & Online, November 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-tutorials.4. https://aclanthology.org/2021.emnlp-tutorials.4.

  89. Yiding Hao, Simon Mendelsohn, Rachel Sterneck, Randi Martinez, and Robert Frank. Probabilistic predictions of people perusing: Evaluating metrics of language model performance for psycholinguistic modeling. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics, pages 75–86, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.cmcl-1.10. https://aclanthology.org/2020.cmcl-1.10.

  90. Edwin Simpson and Iryna Gurevych. Scalable bayesian preference learning for crowds. Mach. Learn., 109 (4): 689–718, apr 2020. ISSN 0885-6125. https://doi.org/10.1007/s10994-019-05867-2. https://doi.org/10.1007/s10994-019-05867-2.

  91. Edwin Simpson and Iryna Gurevych. A Bayesian approach for sequence tagging with crowds. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1093–1104, Hong Kong, China, November 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1101. https://aclanthology.org/D19-1101.

  92. Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR. https://proceedings.mlr.press/v48/gal16.html.

  93. Francesco Verdoja and Ville Kyrki. Notes on the behavior of mc dropout. In ICML Workshop on Uncertainty & Robustness in Deep Learning, 2021.

    Google Scholar 

  94. Artem Shelmanov, Evgenii Tsymbalov, Dmitri Puzyrev, Kirill Fedyanin, Alexander Panchenko, and Maxim Panov. How certain is your Transformer? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1833–1840, Online, April 2021. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.eacl-main.157. https://aclanthology.org/2021.eacl-main.157.

  95. **ang Zhou, Yixin Nie, and Mohit Bansal. Distributed NLI: Learning to predict human opinion distributions for language reasoning. In Findings of the Association for Computational Linguistics: ACL 2022, pages 972–987, Dublin, Ireland, May 2022. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.findings-acl.79. https://aclanthology.org/2022.findings-acl.79.

  96. Alexios Gidiotis and Grigorios Tsoumakas. Should we trust this summary? Bayesian abstractive summarization to the rescue. In Findings of the Association for Computational Linguistics: ACL 2022, pages 4119–4131, Dublin, Ireland, May 2022. Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.findings-acl.325. https://aclanthology.org/2022.findings-acl.325.

  97. Burrhus F Skinner. Reinforcement today. American Psychologist, 13 (3): 94, 1958.

    Google Scholar 

  98. Kai A Krueger and Peter Dayan. Flexible sha**: How learning in small steps helps. Cognition, 110 (3): 380–394, 2009.

    Google Scholar 

  99. Natalie Schluter and Daniel Varab. When data permutations are pathological: the case of neural natural language inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4935–4939, Brussels, Belgium, October-November 2018. Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-1534. https://aclanthology.org/D18-1534.

  100. Guy Hacohen and Daphna Weinshall. On the power of curriculum learning in training deep networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2535–2544. PMLR, 09–15 Jun 2019. https://proceedings.mlr.press/v97/hacohen19a.html.

  101. Petru Soviany, Radu Tudor Ionescu, Paolo Rota, and Nicu Sebe. Curriculum learning: A survey. International Journal of Computer Vision, pages 1–40, 2022.

    Google Scholar 

  102. Benjamin Samuel Bloom. Taxonomy of educational objectives: The classification of educational goals. Cognitive domain, 1956.

    Google Scholar 

  103. Daniel Campos. Curriculum learning for language modeling. ar**v preprint ar**v:2108.02170, 2021.

  104. Arielle Borovsky, Jeffrey L Elman, and Anne Fernald. Knowing a lot for one’s age: Vocabulary skill and not age is associated with anticipatory incremental sentence interpretation in children and adults. Journal of experimental child psychology, 112 (4): 417–436, 2012.

    Google Scholar 

  105. Lucas Willems, Salem Lahlou, and Yoshua Bengio. Mastering rate based curriculum learning. ar**v preprint ar**v:2008.06456, 2020.

  106. John P. Lalor and Hong Yu. Dynamic data selection for curriculum learning via ability estimation. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 545–555, Online, November 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.48. https://aclanthology.org/2020.findings-emnlp.48.

  107. Tom Kocmi and Ondřej Bojar. Curriculum learning and minibatch bucketing in neural machine translation. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 379–386, 2017.

    Google Scholar 

  108. Cao Liu, Shizhu He, Kang Liu, Jun Zhao, et al. Curriculum learning for natural answer generation. In IJCAI, pages 4223–4229, 2018.

    Google Scholar 

  109. Mrinmaya Sachan and Eric **ng. Easy questions first? a case study on curriculum learning for question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 453–463, Berlin, Germany, August 2016. Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1043. https://aclanthology.org/P16-1043.

  110. Alexis Conneau and Guillaume Lample. Cross-lingual language model pretraining. Advances in neural information processing systems, 32, 2019.

    Google Scholar 

  111. Pratik Joshi, Sebastin Santy, Amar Budhiraja, Kalika Bali, and Monojit Choudhury. The state and fate of linguistic diversity and inclusion in the NLP world. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6282–6293, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.560. https://aclanthology.org/2020.acl-main.560.

  112. Laura Cabello Piqueras and Anders Søgaard. Are pretrained multilingual models equally fair across languages? In Proceedings of the 29th International Conference on Computational Linguistics, pages 3597–3605, Gyeongju, Republic of Korea, October 2022. International Committee on Computational Linguistics. https://aclanthology.org/2022.coling-1.318.

  113. Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, and Madian Khabsa. Xlm-v: Overcoming the vocabulary bottleneck in multilingual masked language models, 2023.

    Google Scholar 

  114. Saima Malik-Moraleda, Dima Ayyash, Jeanne Gall?e, Josef Affourtit, Malte Hoffmann, Zachary Mineroff, Olessia Jouravlev, and Evelina Fedorenko. An investigation across 45 languages and 12 language families reveals a universal language network. Nature Neuroscience, 25: 1–6, 08 2022. https://doi.org/10.1038/s41593-022-01114-5.

  115. Elisabeth Norcliffe, Alice C. Harris, and T. Florian Jaeger. Cross-linguistic psycholinguistics and its critical role in theory development: early beginnings and recent advances. Language, Cognition and Neuroscience, 30 (9): 1009–1032, 2015. https://doi.org/10.1080/23273798.2015.1080373. https://doi.org/10.1080/23273798.2015.1080373.

  116. Daniel Kahneman. A perspective on judgment and choice: map** bounded rationality. American psychologist, 58(9):697, 2003.

    Article  Google Scholar 

  117. Emiel van Miltenburg. Stereoty** and bias in the flickr30k dataset. In Proceedings of Multimodal Corpora: Computer vision and language processing (MMC 2016), pages 1–4. 2016.

    Google Scholar 

  118. Emily M. Bender, Dirk Hovy, and Alexandra Schofield. Integrating ethics into the NLP curriculum. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Tutorial Abstracts, pages 6–9, Online, July 2020. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-tutorials.2. https://aclanthology.org/2020.acl-tutorials.2.

  119. Ninareh Mehrabi, Fred Morstatter, Nripsuta Saxena, Kristina Lerman, and Aram Galstyan. A survey on bias and fairness in machine learning. ACM Comput. Surv., 54 (6), jul 2021. ISSN 0360-0300. https://doi.org/10.1145/3457607. https://doi.org/10.1145/3457607.

  120. Tom McCoy, Ellie Pavlick, and Tal Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3428–3448, Florence, Italy, July 2019. Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1334. https://aclanthology.org/P19-1334.

  121. Olivia Guest and Andrea E Martin. On logical inference over brains, behaviour, and artificial neural networks. Computational Brain & Behavior, pages 1–15, 2023.

    Google Scholar 

  122. Anders Søgaard. Explainable natural language processing. Synthesis Lectures on Human Language Technologies, 14(3):1–123, 2021.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lisa Beinborn .

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Beinborn, L., Hollenstein, N. (2024). Behavioral Patterns. In: Cognitive Plausibility in Natural Language Processing. Synthesis Lectures on Human Language Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-43260-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43260-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43259-0

  • Online ISBN: 978-3-031-43260-6

  • eBook Packages: Synthesis Collection of Technology (R0)

Publish with us

Policies and ethics

Navigation