To a or not to a: On the Use of the Total Score

Hemker, Bas T.

doi:10.1007/978-3-031-10370-4_13

Bas T. Hemker¹²

Part of the book series: Methodology of Educational Measurement and Assessment ((MEMA))

511 Accesses
1 Citations

Abstract

For the sake of transparency, the use of the unweighted total score is demanded by society in many cases, especially in high-stakes situations such as exams. In the Rasch model, the total score is the sufficient statistic: all relevant information of the measurement is captured by the unweighted sum of the item scores. For this reason, many practitioners want to use the Rasch model. However, in many practical applications, the Rasch model does not fit, and the data is better described by a model that also uses a slope parameter. Although in these types of models, the total score is not the sufficient statistic; the unweighted item sum score can be used to compare candidates’ results on different equated tests. In a revaluation of the true-score equating procedure, we show how the benefits of using the better fitting model can be combined with the application of the total score in the context of equating cut-off scores. The advantages of the total scores are presented, and how the total score can be used also in case the Rasch model does not hold. An example is given to describe how the procedure works in practice. Finally, some reflections are given on the practical implications, meaning, and usefulness of the slope parameter, also known as the a-parameter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Chapter: EUR 29.95; Price includes VAT (Germany)

eBook: EUR 117.69; Price includes VAT (Germany)

Softcover Book: EUR 160.49; Price includes VAT (Germany)

Hardcover Book: EUR 160.49; Price includes VAT (Germany)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Empirical Assessment of Guttman’s Lambda 4 Reliability Coefficient

Balance: A Neglected Aspect of Reporting Test Results

Finding Equivalent Standards in Small Samples

Notes

1.
A specific national assessment on this topic for special education was also performed, but in order not to make the example for complicated, these data are not included here.
2.
Note that also under the Rasch model, TCCs may cross, depending on the distribution of the item difficulties in the tests. It is a not a property of the Rasch model that the TCCs do not cross. However, in practice it is often found that they do not.
3.
No reference to the actual is given in the COTAN review system and was not found; only a paper from 2002 by these four authors was found online.
4.
In an unpublished pilot study by Remco Feskens and Bas Hemker in 2020, similar numbers are found for OPLM. The estimated with the Rasch model seemed to be better and more robust in case there are less than 400 observations per item.

References

Birnbaum, A. (1968). Some latent class models and their use in inferring examinee ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 396–479). Addison-Wesley. https://ci.nii.ac.jp/naid/10011544105/
Google Scholar
Cizek, G. J., & Bunch, M. B. (2007). The nedelsky method. In G. J. Cizek & M. B. Bunch (Eds.), Standard setting (pp. 68–74). SAGE. https://doi.org/10.4135/9781412985918
Chapter Google Scholar
College voor Toetsing en Examens. (2015). Regeling omzetting scores in cijfers bij centrale examinering mbo [Rules for transforming scores into grades for the central exams in vocational school]. CvTE-15.01457. https://wetten.overheid.nl/BWBR0036876/2017-08-01
Eggen, T. J. H. M., & Verhelst, N. D. (2011). Item calibration in incomplete testing designs. Psicologica, 32(1), 107–132. https://eric.ed.gov/?id=EJ925442
Google Scholar
Evers, A., Sijtsma, K., Lucassen, W., & Meijer, R. R. (2010). The Dutch Review Process for Evaluating the Quality of Psychological Tests: History, Procedure, and Results. International Journal of Testing, 10, 295–317. https://doi.org/10.1080/15305058.2010.518325
Evers, A., Lucassen, W., Meijer, R. R., & Sijtsma, K. (2015). COTAN review system for evaluating test quality (COTAN review system for evaluating test quality) (p. 41). NIP, Utrecht. https://www.psynip.nl/wp-content/uploads/2019/05/NIP-Brochure-Cotan-2018-correctie-1.pdf
Google Scholar
Fisher, R. A., & Russell, E. J. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222(594–604), 309–368. https://doi.org/10.1098/rsta.1922.0009
Article Google Scholar
Glas, C. A. W., & Verhelst, N. D. (1989). Extensions of the partial credit model. Psychometrika, 54(4), 635–659. https://doi.org/10.1007/BF02296401
Article Google Scholar
Grayson, D. A. (1988). Two-group classification in latent trait theory: Scores with monotone likelihood ratio. Psychometrika, 53(3), 383–392. https://doi.org/10.1007/BF02294219
Article Google Scholar
Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1996). Polytomous IRT models and monotone likelihood ratio of the total score. Psychometrika, 61(4), 679–693. https://doi.org/10.1007/BF02294042
Article Google Scholar
Hemker, B. T., Sijtsma, K., Molenaar, I. W., & Junker, B. W. (1997). Stochastic ordering using the latent trait and the sum score in polytomous IRT models. Psychometrika, 62(3), 331–347. https://doi.org/10.1007/BF02294555
Article Google Scholar
Huynh, H. (1994). A new proof for monotone likelihood ratio for the sum of independent Bernoulli random variables. Psychometrika, 59(1), 77–79. https://doi.org/10.1007/BF02294266
Article Google Scholar
Kolen, M. J., & Brennan, R. L. (2014). Test equating, scaling, and linking. Springer. https://doi.org/10.1007/978-1-4939-0317-7
Book Google Scholar
Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness. A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210–231. https://doi.org/10.1007/s11336-013-9347-z
Article Google Scholar
Lord, F. M. (1980). Applications of item-response theory to practical testing problems. Mahwah, NJ: Lawrence Erlbaum Associates.
Google Scholar
Mokken, R. J. (1971). A theory and procedure of scale analysis. The Hague/De Gruyter.
Book Google Scholar
Mokken, R. J. (1997). Nonparametric models for dichotomous responses. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 351–367). Springer. https://doi.org/10.1007/978-1-4757-2691-6_20
Chapter Google Scholar
Mokken, R. J., & Lewis, C. (1982). A nonparametric approach to the analysis of dichotomous item responses. Applied Psychological Measurement, 6(4), 417–430. https://doi.org/10.1177/014662168200600404
Article Google Scholar
Molenaar, I. W. (1983). Some improved diagnostics for failure of the Rasch model. Psychometrika, 48(1), 49–72. https://doi.org/10.1007/BF02314676
Article Google Scholar
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176. https://doi.org/10.1177/014662169201600206
Article Google Scholar
Organisation for Economic Co-operation and Development. (2000). Measuring student knowledge and skills: The PISA 2000 assessment of reading, mathematical and scientific literacy. OECD Publishing. https://www.oecd.org/pisa/sitedocument/PISA-2015-Technical-Report-Chapter-9-Scaling-PISA-Data.pdf
Google Scholar
Organisation for Economic Co-operation and Development. (2017). PISA 2015 technical report. OECD. https://www.oecd.org/pisa/data/2015-technical-report/PISA2015_TechRep_Final.pdf
Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danmarks Paedagogiske Institut.
Google Scholar
Rasch, G. (1968). An individualistic approach to item analysis. In P. F. Lazarsfeld & N. W. Henry (Eds.), Reading in mathematical social science (pp. 89–108). Science Research Associates. https://www.rasch.org/memo19662.pdf
Google Scholar
Roelofs, E. C., Emons, W. H. M., & Verschoor, A. J. (2021). Exploring task features that predict psychometric quality of test items: The case for the Dutch driving theory exam. International Journal of Testing, 21(2), 80–104. https://doi.org/10.1080/15305058.2021.1916506
Article Google Scholar
Shakespeare, W. (1600). Hamlet. England.
Google Scholar
Van den Brink, W. P., & Mellenbergh, G. J. (1998). Testleer en testconstructie [Test theory and test construction]. Boom Koninklijke Uitgevers.
Google Scholar
Verhelst, N. D., & Glas, C. A. W. (1993). A dynamic generalization of the Rasch model. Psychometrika, 58(3), 395–415. https://doi.org/10.1007/BF02294648
Article Google Scholar
Verhelst, N. D. & Verstralen, H. H. F. M. (1994). The one parameter logistic model: Computer program and manual. Universiteit Twente, CITO. https://research.utwente.nl/en/publications/the-one-parameter-logistic-model%2D%2Dcomputer-program-and-manual(5966c6a4-dd6d-4cac-8a38-b2527c8de44c).html
Zwitser, R. J., & Maris, G. (2016). Ordering individuals with sum scores: The introduction of the nonparametric Rasch model. Psychometrika, 81(1), 39–59. https://doi.org/10.1007/s11336-015-9481-x
Article Google Scholar

Download references

Author information

Authors and Affiliations

CITO – Research and Development, Arnhem, The Netherlands
Bas T. Hemker

Authors

Bas T. Hemker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bas T. Hemker .

Editor information

Editors and Affiliations

Research Institute of Child Development and Education, University of Amsterdam, Amsterdam, The Netherlands
L. Andries van der Ark
Department of Methodology and Statistics, Tilburg University, Tilburg, The Netherlands
Wilco H. M. Emons
The expertise group Psychometrics and Statistics, University of Groningen, Groningen, The Netherlands
Rob R. Meijer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hemker, B.T. (2023). To a or not to a: On the Use of the Total Score. In: van der Ark, L.A., Emons, W.H.M., Meijer, R.R. (eds) Essays on Contemporary Psychometrics. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-031-10370-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-10370-4_13
Published: 16 March 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10369-8
Online ISBN: 978-3-031-10370-4
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics

To a or not to a: On the Use of the Total Score

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Empirical Assessment of Guttman’s Lambda 4 Reliability Coefficient

Balance: A Neglected Aspect of Reporting Test Results

Finding Equivalent Standards in Small Samples

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

To a or not to a: On the Use of the Total Score

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Empirical Assessment of Guttman’s Lambda 4 Reliability Coefficient

Balance: A Neglected Aspect of Reporting Test Results

Finding Equivalent Standards in Small Samples

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation