Abstract
Most existing automated assessment (AA) systems focus on holistic scoring, falling short in providing learners with comprehensive feedback. In this paper, we propose a Multi-Task Automated Assessment (MTAA) system that can output detailed scores along multiple dimensions of essay quality to provide instructional feedback. This system is built on multi-task learning and incorporates Orthogonality Constraints (OC) to learn distinct information from different tasks. To achieve better training convergence, we develop a training strategy, Dynamic Learning Rate Decay (DLRD), to adapt the learning rates for tasks based on their loss descending rates. The results show that our proposed system achieves state-of-the-art performance on two benchmark datasets: ELLIPSE and ASAP++. Furthermore, we utilize ChatGPT to assess essays in both zero-shot and few-shot contexts using an ELLIPSE subset. The findings suggest that ChatGPT has not yet achieved a level of scoring consistency equivalent to our developed MTAA system and that of human raters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
The train/test split can be found at https://github.com/Aries-chen/MTAA/blob/main/README.md.
- 4.
The ELLIPSE rubric is available at: https://docs.google.com/document/d/1OSbRELoWKlq8chYmujAaHJqMwFZnwt2PnnbSXfOJkIY/edit.
- 5.
We used the GPT-4-0613 API. The prompts used in our experiments are available at https://github.com/Aries-chen/MTAA/blob/main/Few-shot_prompt.txt.
- 6.
This comparison is excluded from ASAP++ due to its lack of a clear evaluation rubric, making it difficult to provide precise prompts for ChatGPT inputs.
- 7.
Previous work has only reported QWK.
References
Franklin, A., et al.: Feedback prize - English language learning. Kaggle (2022). https://kaggle.com/competitions/feedback-prize-english-language-learning
Mathias, S., Bhattacharyya, P.: ASAP++: enriching the ASAP automated essay grading dataset with essay attribute scores. In: LREC (2018)
Hamner, B., Morgan, J., Vandev, L., Shermis, M., Vander Ark, T.: The Hewlett foundation: automated essay scoring. Kaggle (2012). https://kaggle.com/competitions/asap-aes
Ramineni, C., Trapani, C., Williamson, D., Davey, T., Bridgeman, B.: Evaluation of the e-rater® scoring engine for the TOEFL® independent and integrated prompts. ETS Research Report Series, Wiley Online Library, vol. 2012, no. 1, pp. i–51 (2012)
Ramineni, C., Trapani, C., Williamson, D., Davey, T., Bridgeman, B.: Evaluation of e-rater for the GRE issue and argument prompts. Educational Testing Service Princeton, NJ (2012)
Yannakoudakis, H., Briscoe, T., Medlock, B.: A new dataset and method for automatically grading ESOL texts. In: ACL-HLT, pp. 180–189 (2011)
Contreras, J.O., Hilles, S., Abubakar, Z.B.: Automated essay scoring with ontology based on text mining and NLTK tools. In: ICSCEE, pp. 1–6. IEEE (2018)
Kumar, Y., Aggarwal, S., Mahata, D., Shah, R.R., Kumaraguru, P., Zimmermann, R.: Get it scored using AutoSAS - an automated system for scoring short answers. CoRR, abs/2012.11243 (2020)
Phandi, P., Chai, K.M.A., Ng, H.T.: Flexible domain adaptation for automated essay scoring using correlated linear regression. In: EMNLP, pp. 431–439 (2015)
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Kim, Y.: Convolutional neural networks for sentence classification. CoRR, abs/1408.5882 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Taghipour, K., Ng, H.T.: A neural approach to automated essay scoring. In: EMNLP, pp. 1882–1891 (2016)
Dong, F., Zhang, Y., Yang, J.: Attention-based recurrent convolutional neural network for automatic essay scoring. In: CoNLL, pp. 153–162 (2017)
Wang, Y., Wei, Z., Zhou, Y., Huang, X.: Automatic essay scoring incorporating rating schema via reinforcement learning. In: EMNLP, pp. 791–797 (2018)
Tay, Y., Phan, M., Tuan, L.A., Hui, S.C.: SkipFlow: incorporating neural coherence features for end-to-end automatic text scoring. In: AAAI, vol. 32, no. 1 (2018)
Rodriguez, P.U., Jafari, A., Ormerod, C.M.: Language models and automated essay scoring. CoRR, abs/1909.09482 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Andersen, Ø.E., Yuan, Z., Watson, R., Cheung, K.Y.F.: Benefits of alternative evaluation methods for automated essay scoring. Int. Educ. Data Mining Soc. (2021)
Zhang, A., et al.: On orthogonality constraints for transformers. In: ACL-IJCNLP 2021 (Vol. 2: Short Papers), pp. 375–382 (2021)
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237 (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108 (2019)
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. CoRR, abs/1608.06019 (2016)
Caruana, R.: Multitask learning. Mach. Learn. 28, 41–75 (1997)
He, P., Gao, J., Chen, W.: DeBERTaV3: improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. CoRR, abs/2111.09543 (2021)
Yoon, S., Miszoglad, E., Pierce, L.R.: Evaluation of ChatGPT feedback on ELL writers’ coherence and cohesion. ar**v preprint ar**v:2310.06505 (2023)
Wu, X., He, X., Liu, T., Liu, N., Zhai, X.: Matching exemplar as next sentence prediction (MeNSP): zero-shot prompt learning for automatic scoring in science education. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds.) AIED 2023. LNCS, vol. 13916, pp. 401–413. Springer, Cham (2023)
Ke, Z., Inamdar, H., Lin, H., Ng, V.: Give me more feedback II: annotating thesis strength and related attributes in student essays. In: ACL, pp. 3994–4004 (2019)
Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Prompt agnostic essay scorer: a domain generalization approach to cross-prompt automated essay scoring. CoRR, abs/2008.01441 (2020)
Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Automated cross-prompt scoring of essay traits. In: AAAI, vol. 35, no. 15, pp. 13745–13753 (2021)
Chen, Y., Li, X.: PMAES: prompt-map** contrastive learning for cross-prompt automated essay scoring. In: ACL, pp. 1489–1503 (2023)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, S., Lan, Y., Yuan, Z. (2024). A Multi-task Automated Assessment System for Essay Scoring. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. AIED 2024. Lecture Notes in Computer Science(), vol 14830. Springer, Cham. https://doi.org/10.1007/978-3-031-64299-9_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-64299-9_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64298-2
Online ISBN: 978-3-031-64299-9
eBook Packages: Computer ScienceComputer Science (R0)