A Multi-task Automated Assessment System for Essay Scoring

  • Conference paper
  • First Online:
Artificial Intelligence in Education (AIED 2024)

Abstract

Most existing automated assessment (AA) systems focus on holistic scoring, falling short in providing learners with comprehensive feedback. In this paper, we propose a Multi-Task Automated Assessment (MTAA) system that can output detailed scores along multiple dimensions of essay quality to provide instructional feedback. This system is built on multi-task learning and incorporates Orthogonality Constraints (OC) to learn distinct information from different tasks. To achieve better training convergence, we develop a training strategy, Dynamic Learning Rate Decay (DLRD), to adapt the learning rates for tasks based on their loss descending rates. The results show that our proposed system achieves state-of-the-art performance on two benchmark datasets: ELLIPSE and ASAP++. Furthermore, we utilize ChatGPT to assess essays in both zero-shot and few-shot contexts using an ELLIPSE subset. The findings suggest that ChatGPT has not yet achieved a level of scoring consistency equivalent to our developed MTAA system and that of human raters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ets.org/toefl.html.

  2. 2.

    https://www.ets.org/gre.html.

  3. 3.

    The train/test split can be found at https://github.com/Aries-chen/MTAA/blob/main/README.md.

  4. 4.

    The ELLIPSE rubric is available at: https://docs.google.com/document/d/1OSbRELoWKlq8chYmujAaHJqMwFZnwt2PnnbSXfOJkIY/edit.

  5. 5.

    We used the GPT-4-0613 API. The prompts used in our experiments are available at https://github.com/Aries-chen/MTAA/blob/main/Few-shot_prompt.txt.

  6. 6.

    This comparison is excluded from ASAP++ due to its lack of a clear evaluation rubric, making it difficult to provide precise prompts for ChatGPT inputs.

  7. 7.

    Previous work has only reported QWK.

References

  1. Franklin, A., et al.: Feedback prize - English language learning. Kaggle (2022). https://kaggle.com/competitions/feedback-prize-english-language-learning

  2. Mathias, S., Bhattacharyya, P.: ASAP++: enriching the ASAP automated essay grading dataset with essay attribute scores. In: LREC (2018)

    Google Scholar 

  3. Hamner, B., Morgan, J., Vandev, L., Shermis, M., Vander Ark, T.: The Hewlett foundation: automated essay scoring. Kaggle (2012). https://kaggle.com/competitions/asap-aes

  4. Ramineni, C., Trapani, C., Williamson, D., Davey, T., Bridgeman, B.: Evaluation of the e-rater® scoring engine for the TOEFL® independent and integrated prompts. ETS Research Report Series, Wiley Online Library, vol. 2012, no. 1, pp. i–51 (2012)

    Google Scholar 

  5. Ramineni, C., Trapani, C., Williamson, D., Davey, T., Bridgeman, B.: Evaluation of e-rater for the GRE issue and argument prompts. Educational Testing Service Princeton, NJ (2012)

    Google Scholar 

  6. Yannakoudakis, H., Briscoe, T., Medlock, B.: A new dataset and method for automatically grading ESOL texts. In: ACL-HLT, pp. 180–189 (2011)

    Google Scholar 

  7. Contreras, J.O., Hilles, S., Abubakar, Z.B.: Automated essay scoring with ontology based on text mining and NLTK tools. In: ICSCEE, pp. 1–6. IEEE (2018)

    Google Scholar 

  8. Kumar, Y., Aggarwal, S., Mahata, D., Shah, R.R., Kumaraguru, P., Zimmermann, R.: Get it scored using AutoSAS - an automated system for scoring short answers. CoRR, abs/2012.11243 (2020)

    Google Scholar 

  9. Phandi, P., Chai, K.M.A., Ng, H.T.: Flexible domain adaptation for automated essay scoring using correlated linear regression. In: EMNLP, pp. 431–439 (2015)

    Google Scholar 

  10. Pennington, J., Socher, R., Manning, C.: GloVe: global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)

    Google Scholar 

  11. Kim, Y.: Convolutional neural networks for sentence classification. CoRR, abs/1408.5882 (2014)

    Google Scholar 

  12. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  MATH  Google Scholar 

  13. Taghipour, K., Ng, H.T.: A neural approach to automated essay scoring. In: EMNLP, pp. 1882–1891 (2016)

    Google Scholar 

  14. Dong, F., Zhang, Y., Yang, J.: Attention-based recurrent convolutional neural network for automatic essay scoring. In: CoNLL, pp. 153–162 (2017)

    Google Scholar 

  15. Wang, Y., Wei, Z., Zhou, Y., Huang, X.: Automatic essay scoring incorporating rating schema via reinforcement learning. In: EMNLP, pp. 791–797 (2018)

    Google Scholar 

  16. Tay, Y., Phan, M., Tuan, L.A., Hui, S.C.: SkipFlow: incorporating neural coherence features for end-to-end automatic text scoring. In: AAAI, vol. 32, no. 1 (2018)

    Google Scholar 

  17. Rodriguez, P.U., Jafari, A., Ormerod, C.M.: Language models and automated essay scoring. CoRR, abs/1909.09482 (2019)

    Google Scholar 

  18. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  19. Andersen, Ø.E., Yuan, Z., Watson, R., Cheung, K.Y.F.: Benefits of alternative evaluation methods for automated essay scoring. Int. Educ. Data Mining Soc. (2021)

    Google Scholar 

  20. Zhang, A., et al.: On orthogonality constraints for transformers. In: ACL-IJCNLP 2021 (Vol. 2: Short Papers), pp. 375–382 (2021)

    Google Scholar 

  21. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. CoRR, abs/1906.08237 (2019)

    Google Scholar 

  22. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108 (2019)

    Google Scholar 

  23. Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. CoRR, abs/1608.06019 (2016)

    Google Scholar 

  24. Caruana, R.: Multitask learning. Mach. Learn. 28, 41–75 (1997)

    Google Scholar 

  25. He, P., Gao, J., Chen, W.: DeBERTaV3: improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. CoRR, abs/2111.09543 (2021)

    Google Scholar 

  26. Yoon, S., Miszoglad, E., Pierce, L.R.: Evaluation of ChatGPT feedback on ELL writers’ coherence and cohesion. ar**v preprint ar**v:2310.06505 (2023)

  27. Wu, X., He, X., Liu, T., Liu, N., Zhai, X.: Matching exemplar as next sentence prediction (MeNSP): zero-shot prompt learning for automatic scoring in science education. In: Wang, N., Rebolledo-Mendez, G., Matsuda, N., Santos, O.C., Dimitrova, V. (eds.) AIED 2023. LNCS, vol. 13916, pp. 401–413. Springer, Cham (2023)

    Google Scholar 

  28. Ke, Z., Inamdar, H., Lin, H., Ng, V.: Give me more feedback II: annotating thesis strength and related attributes in student essays. In: ACL, pp. 3994–4004 (2019)

    Google Scholar 

  29. Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Prompt agnostic essay scorer: a domain generalization approach to cross-prompt automated essay scoring. CoRR, abs/2008.01441 (2020)

    Google Scholar 

  30. Ridley, R., He, L., Dai, X., Huang, S., Chen, J.: Automated cross-prompt scoring of essay traits. In: AAAI, vol. 35, no. 15, pp. 13745–13753 (2021)

    Google Scholar 

  31. Chen, Y., Li, X.: PMAES: prompt-map** contrastive learning for cross-prompt automated essay scoring. In: ACL, pp. 1489–1503 (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shigeng Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, S., Lan, Y., Yuan, Z. (2024). A Multi-task Automated Assessment System for Essay Scoring. In: Olney, A.M., Chounta, IA., Liu, Z., Santos, O.C., Bittencourt, I.I. (eds) Artificial Intelligence in Education. AIED 2024. Lecture Notes in Computer Science(), vol 14830. Springer, Cham. https://doi.org/10.1007/978-3-031-64299-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-64299-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-64298-2

  • Online ISBN: 978-3-031-64299-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Navigation