An Empirical Study of Develo** Automated Scoring Engine Using Supervised Latent Dirichlet Allocation

  • Conference paper
  • First Online:
Quantitative Psychology (IMPS 2020)

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 353))

Included in the following conference series:

Abstract

The use of constructed-response and performance-oriented items is becoming increasingly more common in educational measurement. These items may be in the form of written essays or short answers and may appear in both high- and low-stakes assessments. Constructed responses may be scored by human raters or through an automated scoring engine. Topic modeling provides a tool for mining textual data in an effort to detect the latent semantic structures. The supervised Latent Dirichlet Allocation model (sLDA) is widely used in text analysis. In this study, we examine and compare the utility of different sLDA models for detecting the latent topic structure and scoring on a test of English and language arts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (Canada)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (Canada)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free ship** worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Attali, Y. (2004). Exploring the feedback and revision features of Criterion. Journal of Second Language Writing, 14, 191–205.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Chan, N., & Kenedy, P. E. (2002). Are multiple-choice exams easier for economics students? A comparison of multiple choice and “equivalent” constructed response exam questions. Southern Economic Journal, 68(4), 957–971.

    Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.

    Article  Google Scholar 

  • Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning, and Assessment, 5(1), 1–35.

    Google Scholar 

  • Dzikovska, M. O., Nielsen, R., & Brew, C. (2012). Towards effective tutorial feedback for explanation questions: A dataset and baselines. Paper presented at the Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.

    Google Scholar 

  • Ercikan, K., Sehwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and scoring of tests with multiple-choice and constructed-response item types. Journal of Educational Measurement, 35(2), 137–154.

    Article  Google Scholar 

  • Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613–619.

    Article  Google Scholar 

  • Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Paper presented at the Proceedings of the National Academy of Sciences.

    Google Scholar 

  • Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427–441.

    Article  Google Scholar 

  • Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering, 160(1), 3–24.

    Google Scholar 

  • Landauer, T. K., Laham, D., Rehder, B., & Schreiner, M. E. (1997). How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. Paper presented at the Proceedings of the 19th Annual Meeting of the Cognitive Science Society.

    Google Scholar 

  • Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.

    Article  MATH  Google Scholar 

  • Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed-response science items: Prospects and obstacles. Educational Measurement: Issues and Practice 33(2), 19–28.

    Article  Google Scholar 

  • Livingston, S. A. (2009). Constructed-Response Test Questions: Why We Use Them; How We Score Them. R&D Connections. Number 11. Educational Testing Service.

    Google Scholar 

  • Mcauliffe, J. D., & Blei, D. M. (2008). Supervised topic models. In Advances in neural information processing systems (pp. 121–128). Red Hook, NY: Curran Associates, Inc.

    Google Scholar 

  • Nichols, P. (2004). Evidence for the interpretation and use of scores from an automated essay scorer. Paper presented at the Annual Meeting of the American Educational Research Association.

    Google Scholar 

  • Nickerson, R. S. (1989). New directions in educational assessment. Educational Researcher, 18(9), 3–7.

    Article  Google Scholar 

  • Sebrechts, M. M., Bennett, R. E., & Rock, D. A. (1991). Agreement between expert-system and human raters’ scores on complex constructed-response quantitative items. ETS Research Report Series, 1991(1), 856–862.

    Article  Google Scholar 

  • Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. Paper presented at the Proceedings of the 26th Annual International Conference on Machine Learning.

    Google Scholar 

  • Williamson, D. M., **, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.

    Article  Google Scholar 

  • **ong, J., Choi, H.-J., Kim, S., Kwak, M., & Cohen, A. S. (2019). Topic modeling of constructed-response answers on social study assessments. Paper presented at the The Annual Meeting of the Psychometric Society.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiawei **%20Automated%20Scoring%20Engine%20Using%20Supervised%20Latent%20Dirichlet%20Allocation&author=Jiawei%20** Automated Scoring Engine Using Supervised Latent Dirichlet Allocation. In: Wiberg, M., Molenaar, D., González, J., Böckenholt, U., Kim, JS. (eds) Quantitative Psychology. IMPS 2020. Springer Proceedings in Mathematics & Statistics, vol 353. Springer, Cham. https://doi.org/10.1007/978-3-030-74772-5_38

Download citation

Publish with us

Policies and ethics

Navigation