Abstract
For the sake of transparency, the use of the unweighted total score is demanded by society in many cases, especially in high-stakes situations such as exams. In the Rasch model, the total score is the sufficient statistic: all relevant information of the measurement is captured by the unweighted sum of the item scores. For this reason, many practitioners want to use the Rasch model. However, in many practical applications, the Rasch model does not fit, and the data is better described by a model that also uses a slope parameter. Although in these types of models, the total score is not the sufficient statistic; the unweighted item sum score can be used to compare candidates’ results on different equated tests. In a revaluation of the true-score equating procedure, we show how the benefits of using the better fitting model can be combined with the application of the total score in the context of equating cut-off scores. The advantages of the total scores are presented, and how the total score can be used also in case the Rasch model does not hold. An example is given to describe how the procedure works in practice. Finally, some reflections are given on the practical implications, meaning, and usefulness of the slope parameter, also known as the a-parameter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A specific national assessment on this topic for special education was also performed, but in order not to make the example for complicated, these data are not included here.
- 2.
Note that also under the Rasch model, TCCs may cross, depending on the distribution of the item difficulties in the tests. It is a not a property of the Rasch model that the TCCs do not cross. However, in practice it is often found that they do not.
- 3.
No reference to the actual is given in the COTAN review system and was not found; only a paper from 2002 by these four authors was found online.
- 4.
In an unpublished pilot study by Remco Feskens and Bas Hemker in 2020, similar numbers are found for OPLM. The estimated with the Rasch model seemed to be better and more robust in case there are less than 400 observations per item.
References
Birnbaum, A. (1968). Some latent class models and their use in inferring examinee ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 396–479). Addison-Wesley. https://ci.nii.ac.jp/naid/10011544105/
Cizek, G. J., & Bunch, M. B. (2007). The nedelsky method. In G. J. Cizek & M. B. Bunch (Eds.), Standard setting (pp. 68–74). SAGE. https://doi.org/10.4135/9781412985918
College voor Toetsing en Examens. (2015). Regeling omzetting scores in cijfers bij centrale examinering mbo [Rules for transforming scores into grades for the central exams in vocational school]. CvTE-15.01457. https://wetten.overheid.nl/BWBR0036876/2017-08-01
Eggen, T. J. H. M., & Verhelst, N. D. (2011). Item calibration in incomplete testing designs. Psicologica, 32(1), 107–132. https://eric.ed.gov/?id=EJ925442
Evers, A., Sijtsma, K., Lucassen, W., & Meijer, R. R. (2010). The Dutch Review Process for Evaluating the Quality of Psychological Tests: History, Procedure, and Results. International Journal of Testing, 10, 295–317. https://doi.org/10.1080/15305058.2010.518325
Evers, A., Lucassen, W., Meijer, R. R., & Sijtsma, K. (2015). COTAN review system for evaluating test quality (COTAN review system for evaluating test quality) (p. 41). NIP, Utrecht. https://www.psynip.nl/wp-content/uploads/2019/05/NIP-Brochure-Cotan-2018-correctie-1.pdf
Fisher, R. A., & Russell, E. J. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222(594–604), 309–368. https://doi.org/10.1098/rsta.1922.0009
Glas, C. A. W., & Verhelst, N. D. (1989). Extensions of the partial credit model. Psychometrika, 54(4), 635–659. https://doi.org/10.1007/BF02296401
Grayson, D. A. (1988). Two-group classification in latent trait theory: Scores with monotone likelihood ratio. Psychometrika, 53(3), 383–392. https://doi.org/10.1007/BF02294219
Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1996). Polytomous IRT models and monotone likelihood ratio of the total score. Psychometrika, 61(4), 679–693. https://doi.org/10.1007/BF02294042
Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62(3), 331–347. https://doi.org/10.1007/BF02294555
Huynh, H. (1994). A new proof for monotone likelihood ratio for the sum of independent Bernoulli random variables. Psychometrika, 59(1), 77–79. https://doi.org/10.1007/BF02294266
Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking. Springer. https://doi.org/10.1007/978-1-4939-0317-7
Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210–231. https://doi.org/10.1007/s11336-013-9347-z
Lord, F. M. (1980). Applications of item-response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.
Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hague/De Gruyter.
Mokken, R. J. (1997). Nonparametric models for dichotomous responses. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 351–367). Springer. https://doi.org/10.1007/978-1-4757-2691-6_20
Mokken, R. J., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses. Applied Psychological Measurement, 6(4), 417–430. https://doi.org/10.1177/014662168200600404
Molenaar, I. W. (1983). Some improved diagnostics for failure of the Rasch model. Psychometrika, 48(1), 49–72. https://doi.org/10.1007/BF02314676
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
Organisation for Economic Co-operation and Development. (2000). Measuring student knowledge and skills: The PISA 2000 assessment of reading, mathematical and scientific literacy. OECD Publishing. https://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-9-Scaling-PISA-Data.pdf
Organisation for Economic Co-operation and Development. (2017). PISA 2015 technical report. OECD. https://www.oecd.org/pisa/data/2015-technical-report/PISA2015_TechRep_Final.pdf
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danmarks Paedagogiske Institut.
Rasch, G. (1968). An individualistic approach to item analysis. In P. F. Lazarsfeld & N. W. Henry (Eds.), Reading in mathematical social science (pp. 89–108). Science Research Associates. https://www.rasch.org/memo19662.pdf
Roelofs, E. C., Emons, W. H. M., & Verschoor, A. J. (2021). Exploring task features that predict psychometric quality of test items: The case for the Dutch driving theory exam. International Journal of Testing, 21(2), 80–104. https://doi.org/10.1080/15305058.2021.1916506
Shakespeare, W. (1600). Hamlet. England.
Van den Brink, W. P., & Mellenbergh, G. J. (1998). Testleer en testconstructie [Test theory and test construction]. Boom Koninklijke Uitgevers.
Verhelst, N. D., & Glas, C. A. W. (1993). A dynamic generalization of the Rasch model. Psychometrika, 58(3), 395–415. https://doi.org/10.1007/BF02294648
Verhelst, N. D. & Verstralen, H. H. F. M. (1994). The one parameter logistic model: Computer program and manual. Universiteit Twente, CITO. https://research.utwente.nl/en/publications/the-one-parameter-logistic-model%2D%2Dcomputer-program-and-manual(5966c6a4-dd6d-4cac-8a38-b2527c8de44c).html
Zwitser, R. J., & Maris, G. (2016). Ordering individuals with sum scores: The introduction of the nonparametric Rasch model. Psychometrika, 81(1), 39–59. https://doi.org/10.1007/s11336-015-9481-x
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hemker, B.T. (2023). To a or not to a: On the Use of the Total Score. In: van der Ark, L.A., Emons, W.H.M., Meijer, R.R. (eds) Essays on Contemporary Psychometrics. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-031-10370-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-10370-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10369-8
Online ISBN: 978-3-031-10370-4
eBook Packages: EducationEducation (R0)