Abstract
The use of constructed-response and performance-oriented items is becoming increasingly more common in educational measurement. These items may be in the form of written essays or short answers and may appear in both high- and low-stakes assessments. Constructed responses may be scored by human raters or through an automated scoring engine. Topic modeling provides a tool for mining textual data in an effort to detect the latent semantic structures. The supervised Latent Dirichlet Allocation model (sLDA) is widely used in text analysis. In this study, we examine and compare the utility of different sLDA models for detecting the latent topic structure and scoring on a test of English and language arts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Attali, Y. (2004). Exploring the feedback and revision features of Criterion. Journal of Second Language Writing, 14, 191–205.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Chan, N., & Kenedy, P. E. (2002). Are multiple-choice exams easier for economics students? A comparison of multiple choice and “equivalent” constructed response exam questions. Southern Economic Journal, 68(4), 957–971.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 37–46.
Dikli, S. (2006). An overview of automated scoring of essays. The Journal of Technology, Learning, and Assessment, 5(1), 1–35.
Dzikovska, M. O., Nielsen, R., & Brew, C. (2012). Towards effective tutorial feedback for explanation questions: A dataset and baselines. Paper presented at the Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Ercikan, K., Sehwarz, R. D., Julian, M. W., Burket, G. R., Weber, M. M., & Link, V. (1998). Calibration and scoring of tests with multiple-choice and constructed-response item types. Journal of Educational Measurement, 35(2), 137–154.
Fleiss, J. L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33(3), 613–619.
Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Paper presented at the Proceedings of the National Academy of Sciences.
Hogan, T. P., & Murphy, G. (2007). Recommendations for preparing and scoring constructed-response items: What the experts say. Applied Measurement in Education, 20(4), 427–441.
Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering, 160(1), 3–24.
Landauer, T. K., Laham, D., Rehder, B., & Schreiner, M. E. (1997). How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. Paper presented at the Proceedings of the 19th Annual Meeting of the Cognitive Science Society.
Landis, J. R., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174.
Liu, O. L., Brew, C., Blackmore, J., Gerard, L., Madhok, J., & Linn, M. C. (2014). Automated scoring of constructed-response science items: Prospects and obstacles. Educational Measurement: Issues and Practice 33(2), 19–28.
Livingston, S. A. (2009). Constructed-Response Test Questions: Why We Use Them; How We Score Them. R&D Connections. Number 11. Educational Testing Service.
Mcauliffe, J. D., & Blei, D. M. (2008). Supervised topic models. In Advances in neural information processing systems (pp. 121–128). Red Hook, NY: Curran Associates, Inc.
Nichols, P. (2004). Evidence for the interpretation and use of scores from an automated essay scorer. Paper presented at the Annual Meeting of the American Educational Research Association.
Nickerson, R. S. (1989). New directions in educational assessment. Educational Researcher, 18(9), 3–7.
Sebrechts, M. M., Bennett, R. E., & Rock, D. A. (1991). Agreement between expert-system and human raters’ scores on complex constructed-response quantitative items. ETS Research Report Series, 1991(1), 856–862.
Wallach, H. M., Murray, I., Salakhutdinov, R., & Mimno, D. (2009). Evaluation methods for topic models. Paper presented at the Proceedings of the 26th Annual International Conference on Machine Learning.
Williamson, D. M., **, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13.
**ong, J., Choi, H.-J., Kim, S., Kwak, M., & Cohen, A. S. (2019). Topic modeling of constructed-response answers on social study assessments. Paper presented at the The Annual Meeting of the Psychometric Society.
Author information
Authors and Affiliations
Corresponding author
Download citation
DOI: https://doi.org/10.1007/978-3-030-74772-5_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-74771-8
Online ISBN: 978-3-030-74772-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)