A Multi-task Automated Assessment System for Essay Scoring

Chen, Shigeng; Lan, Yunshi; Yuan, Zheng

doi:10.1007/978-3-031-64299-9_22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14830))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

273 Accesses

Abstract

Most existing automated assessment (AA) systems focus on holistic scoring, falling short in providing learners with comprehensive feedback. In this paper, we propose a Multi-Task Automated Assessment (MTAA) system that can output detailed scores along multiple dimensions of essay quality to provide instructional feedback. This system is built on multi-task learning and incorporates Orthogonality Constraints (OC) to learn distinct information from different tasks. To achieve better training convergence, we develop a training strategy, Dynamic Learning Rate Decay (DLRD), to adapt the learning rates for tasks based on their loss descending rates. The results show that our proposed system achieves state-of-the-art performance on two benchmark datasets: ELLIPSE and ASAP++. Furthermore, we utilize ChatGPT to assess essays in both zero-shot and few-shot contexts using an ELLIPSE subset. The findings suggest that ChatGPT has not yet achieved a level of scoring consistency equivalent to our developed MTAA system and that of human raters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.ets.org/toefl.html.
2.
https://www.ets.org/gre.html.
3.
The train/test split can be found at https://github.com/Aries-chen/MTAA/blob/main/README.md.
4.
The ELLIPSE rubric is available at: https://docs.google.com/document/d/1OSbRELoWKlq8chYmujAaHJqMwFZnwt2PnnbSXfOJkIY/edit.
5.
We used the GPT-4-0613 API. The prompts used in our experiments are available at https://github.com/Aries-chen/MTAA/blob/main/Few-shot_prompt.txt.
6.
This comparison is excluded from ASAP++ due to its lack of a clear evaluation rubric, making it difficult to provide precise prompts for ChatGPT inputs.
7.
Previous work has only reported QWK.

References

Franklin, A., et al.: Feedback prize - English language learning. Kaggle (2022). https://kaggle.com/competitions/feedback-prize-english-language-learning
Mathias, S., Bhattacharyya, P.: ASAP++: enriching the ASAP automated essay grading dataset with essay attribute scores. In: LREC (2018)
Google Scholar
Hamner, B., Morgan, J., Vandev, L., Shermis, M., Vander Ark, T.: The Hewlett foundation: automated essay scoring. Kaggle (2012). https://kaggle.com/competitions/asap-aes
Ramineni, C., Trapani, C., Williamson, D., Davey, T., Bridgeman, B.: Evaluation of the e-rater® scoring engine for the TOEFL® independent and integrated prompts. ETS Research Report Series, Wiley Online Library, vol. 2012, no. 1, pp. i–51 (2012)
Google Scholar
Ramineni, C., Trapani, C., Williamson, D., Davey, T., Bridgeman, B.: Evaluation of e-rater for the GRE issue and argument prompts. Educational Testing Service Princeton, NJ (2012)
Google Scholar
Yannakoudakis, H., Briscoe, T., Medlock, B.: A new dataset and method for automatically grading ESOL texts. In: ACL-HLT, pp. 180–189 (2011)
Google Scholar
Contreras, J.O., Hilles, S., Abubakar, Z.B.: Automated essay scoring with ontology based on text mining and NLTK tools. In: ICSCEE, pp. 1–6. IEEE (2018)
Google Scholar
Kumar, Y., Aggarwal, S., Mahata, D., Shah, R.R., Kumaraguru, P., Zimmermann, R.: Get it scored using AutoSAS - an automated system for scoring short answers. CoRR, abs/2012.11243 (2020)
Google Scholar
Phandi, P., Chai, K.M.A., Ng, H.T.: Flexible domain adaptation for automated essay scoring using correlated linear regression. In: EMNLP, pp. 431–439 (2015)
Google Scholar
Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. CoRR, abs/1408.5882 (2014)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article MATH Google Scholar
Taghipour, K., Ng, H.T.: A neural approach to automated essay scoring. In: EMNLP, pp. 1882–1891 (2016)
Google Scholar
Dong, F., Zhang, Y., Yang, J.: Attention-based recurrent convolutional neural network for automatic essay scoring. In: CoNLL, pp. 153–162 (2017)
Google Scholar
Wang, Y., Wei, Z., Zhou, Y., Huang, X.: Automatic essay scoring incorporating rating schema via reinforcement learning. In: EMNLP, pp. 791–797 (2018)
Google Scholar
Tay, Y., Phan, M., Tuan, L.A., Hui, S.C.: SkipFlow: incorporating neural coherence features for end-to-end automatic text scoring. In: AAAI, vol. 32, no. 1 (2018)
Google Scholar
Rodriguez, P.U., Jafari, A., Ormerod, C.M.: Language models and automated essay scoring. CoRR, abs/1909.09482 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Andersen, Ø.E., Yuan, Z., Watson, R., Cheung, K.Y.F.: Benefits of alternative evaluation methods for automated essay scoring. Int. Educ. Data Mining Soc. (2021)
Google Scholar
Zhang, A., et al.: On orthogonality constraints for transformers. In: ACL-IJCNLP 2021 (Vol. 2: Short Papers), pp. 375–382 (2021)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237 (2019)
Google Scholar
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108 (2019)
Google Scholar
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. CoRR, abs/1608.06019 (2016)
Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28, 41–75 (1997)
Google Scholar
He, P., Gao, J., Chen, W.: DeBERTaV3: improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. CoRR, abs/2111.09543 (2021)
Google Scholar
Yoon, S., Miszoglad, E., Pierce, L.R.: Evaluation of ChatGPT feedback on ELL writers’ coherence and cohesion. ar**v preprint ar**v:2310.06505 (2023)
Wu, X., He, X., Liu, T., Liu, N., Zhai, X.: Matching exemplar as next sentence prediction (MeNSP): zero-shot prompt learning for automatic scoring in science education. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds.) AIED 2023. LNCS, vol. 13916, pp. 401–413. Springer, Cham (2023)
Google Scholar
Ke, Z., Inamdar, H., Lin, H., Ng, V.: Give me more feedback II: annotating thesis strength and related attributes in student essays. In: ACL, pp. 3994–4004 (2019)
Google Scholar
Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Prompt agnostic essay scorer: a domain generalization approach to cross-prompt automated essay scoring. CoRR, abs/2008.01441 (2020)
Google Scholar
Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Automated cross-prompt scoring of essay traits. In: AAAI, vol. 35, no. 15, pp. 13745–13753 (2021)
Google Scholar
Chen, Y., Li, X.: PMAES: prompt-map** contrastive learning for cross-prompt automated essay scoring. In: ACL, pp. 1489–1503 (2023)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Informatics, King’s College London, London, UK
Shigeng Chen & Zheng Yuan
School of Data Science and Engineering, East China Normal University, Shanghai, China
Yunshi Lan

Authors

Shigeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yunshi Lan
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shigeng Chen .

Editor information

Editors and Affiliations

University of Memphis, Memphis, TN, USA
Andrew M. Olney
University of Duisburg-Essen, Duisburg, Germany
Irene-Angelica Chounta
**an University, Guangzhou, China
Zitao Liu
UNED, Madrid, Spain
Olga C. Santos
Universidade Federal de Alagoas, Maceio, Brazil
Ig Ibert Bittencourt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, S., Lan, Y., Yuan, Z. (2024). A Multi-task Automated Assessment System for Essay Scoring. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. AIED 2024. Lecture Notes in Computer Science(), vol 14830. Springer, Cham. https://doi.org/10.1007/978-3-031-64299-9_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-64299-9_22
Published: 02 July 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64298-2
Online ISBN: 978-3-031-64299-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Multi-task Automated Assessment System for Essay Scoring