Abstract
We investigated the challenging task of generalizable automatic short answer scoring (ASAS), where a scoring model is tasked with generalizing to target domains (provided only with limited labeled data) that have no overlap with the auxiliary domains on which the model is trained. To address this, we introduced a framework based on Prototypical Neural Network (PNN). Specifically, for a target short answer instance whose score needs to be determined, the framework first calculates the distance between this target instance and each cluster of support instances (support instances are a set of labeled short answer instances that are grouped to different clusters according to their labels, i.e., the ground-truth scores). Then, it rates the target instance using the ground-truth score of the cluster that has the closest distance to the target instance. Through extensive empirical studies on an open-source ASAS dataset consisting of 10 different question prompts, we observed that the proposed approach consistently outperformed other baselines across settings concerning different numbers of support instances. We further observed that the proposed approach performed better when with wider training data sources than when with restricted data sources for training, showing that including more data sources for training may add to the generalizability of the proposed framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Alikaniotis, D., Yannakoudakis, H., Rei, M.: Automatic text scoring using neural networks. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 715–725 (2016)
Baral, S., Botelho, A.F., Erickson, J.A., Benachamardi, P., Heffernan, N.T.: Improving automated scoring of student open responses in mathematics. Int. Educ. Data Min. Soc. (2021)
Blanc, G., Rendle, S.: Adaptive sampled softmax with kernel based sampling. In: International Conference on Machine Learning, pp. 590–599. PMLR (2018)
Boney, R., Ilin, A., et al.: Active one-shot learning with prototypical networks. In: ESANN (2019)
Condor, A., Litster, M., Pardos, Z.: Automatic short answer grading with sbert on out-of-sample questions. In: Proceedings of the 14th International Conference on Educational Data Mining (2021)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
Dong, N., **ng, E.P.: Few-shot semantic segmentation with prototype learning. In: BMVC, vol. 3 (2018)
Dronen, N., Foltz, P.W., Habermehl, K.: Effective sampling for large-scale automated writing evaluation systems. In: Proceedings of the Second (2015) ACM Conference on Learning@ Scale, pp. 3–10 (2015)
Fazal, A., Dillon, T., Chang, E.: Noise reduction in essay datasets for automated essay grading. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2011. LNCS, vol. 7046, pp. 484–493. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25126-9_60
Geng, R., Li, B., Li, Y., Zhu, X., Jian, P., Sun, J.: Induction networks for few-shot text classification. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3904–3913 (2019)
Jakubik, J., Blumenstiel, B., Voessing, M., Hemmer, P.: Instance selection mechanisms for human-in-the-loop systems in few-shot learning. 6 (2022)
Jiang, Z., Liu, M., Yin, Y., Yu, H., Cheng, Z., Gu, Q.: Learning from graph propagation via ordinal distillation for one-shot automated essay scoring. In: Proceedings of the Web Conference 2021, pp. 2347–2356 (2021)
**, C., He, B., Hui, K., Sun, L.: TDNN: a two-stage deep neural network for prompt-independent automated essay scoring. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1088–1097 (2018)
Jurman, G., Riccadonna, S., Visintainer, R., Furlanello, C.: Canberra distance on ranked lists. In: Advances in Ranking NIPS 09 Workshop (2009)
Leacock, C., Chodorow, M.: C-rater: automated scoring of short-answer questions. Comput. Humanit. 37(4), 389–405 (2003)
Li, O., Liu, H., Chen, C., Rudin, C.: Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Nau, J., Haendchen Filho, A., Passero, G.: Evaluating semantic analysis methods for short answer grading using linear regression. Sciences 3(2), 437–450 (2017)
Pappano, L.: The year of the MOOC. N. Y. Times 2(12), 2012 (2012)
Reimers, N., Gurevych, I.: Sentence-bert: sentence embeddings using siamese bert-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3982–3992 (2019)
Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: BPR: Bayesian personalized ranking from implicit feedback. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pp. 452–461 (2009)
Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Prompt agnostic essay scorer: a domain generalization approach to cross-prompt automated essay scoring. ar**v preprint ar**v:2008.01441 (2020)
Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 30 (2017)
Sultan, M.A., Salazar, C., Sumner, T.: Fast and easy short answer grading with high accuracy. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1070–1075 (2016)
Sung, C., Dhamecha, T., Saha, S., Ma, T., Reddy, V., Arora, R.: Pre-training bert on domain resources for short answer grading. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 6071–6075 (2019)
Sung, C., Dhamecha, T.I., Mukhi, N.: Improving short answer grading using transformer-based pre-training. In: Isotani, S., Millán, E., Ogan, A., Hastings, P., McLaren, B., Luckin, R. (eds.) AIED 2019. LNCS (LNAI), vol. 11625, pp. 469–481. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23204-7_39
Surya, K., Gayakwad, E., Nallakaruppan, M.: Deep learning for short answer scoring. Int. J. Recent Technol. Eng. 7(6), 1712–1715 (2019)
Vinyals, O., Blundell, C., Lillicrap, T., Wierstra, D., et al.: Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29 (2016)
Wind, S.A., Peterson, M.E.: A systematic review of methods for evaluating rating quality in language assessment. Lang. Test. 35(2), 161–192 (2018)
**a, L., Guan, M., Liu, J., Cao, X., Luo, D.: Attention-based bidirectional long short-term memory neural network for short answer scoring. In: Guan, M., Na, Z. (eds.) MLICOM 2020. LNICST, vol. 342, pp. 104–112. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-66785-6_12
Zeng, Z., Li, X., Gasevic, D., Chen, G.: Do deep neural nets display human-like attention in short answer scoring? In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 191–205 (2022)
Zeng, Z., Lin, J., Li, L., Pan, W., Ming, Z.: Next-item recommendation via collaborative filtering with bidirectional item similarity. ACM Trans. Inf. Syst. (TOIS) 38(1), 1–22 (2019)
Zesch, T., Heilman, M., Cahill, A.: Reducing annotation efforts in supervised short answer scoring. In: Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications, pp. 124–132 (2015)
Zhang, M., Baral, S., Heffernan, N., Lan, A.: Automatic short math answer grading via in-context meta-learning. In: Proceedings of the 15th International Conference on Educational Data Mining (2022)
Zhu, Z., Wang, J., Caverlee, J.: Measuring and mitigating item under-recommendation bias in personalized ranking systems. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 449–458 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zeng, Z., Li, L., Guan, Q., Gašević, D., Chen, G. (2023). Generalizable Automatic Short Answer Scoring via Prototypical Neural Network. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2023. Lecture Notes in Computer Science(), vol 13916. Springer, Cham. https://doi.org/10.1007/978-3-031-36272-9_36
Download citation
DOI: https://doi.org/10.1007/978-3-031-36272-9_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36271-2
Online ISBN: 978-3-031-36272-9
eBook Packages: Computer ScienceComputer Science (R0)