Abstract
An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters.
Similar content being viewed by others
References
Agresti, A. (1990). Categorical data analysis. New York: Wiley.
Artstein, R., & Poesio, M. (2005). NLE technical note: Vol. 05-1. Kappa 3 = alpha (or beta). Colchester: University of Essex.
Berry, K.J., & Mielke, P.W. (1988). A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educational and Psychological Measurement, 48, 921–933.
Brennan, R.L., & Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699.
Brenner, H., & Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199–202.
Cicchetti, D., & Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101–109.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213–220.
Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.
Conger, A.J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322–328.
Davies, M., & Fleiss, J.L. (1982). Measuring agreement for multinomial data. Biometrics, 38, 1047–1051.
Fleiss, J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382.
Fleiss, J.L. (1981). Statistical methods for rates and proportions. New York: Wiley.
Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.
Fleiss, J.L., Cohen, J., & Everitt, B.S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.
Heuvelmans, A.P.J.M., & Sanders, P.F. (1993). Beoordelaarsovereenstemming. In Eggen, T.J.H.M., & Sanders, P.F. (Eds.) Psychometrie in de Praktijk (pp. 443–470). Arnhem: Cito Instituut voor Toestontwikkeling.
Holmquist, N.S., McMahon, C.A., & Williams, E.O. (1968). Variability in classification of carcinoma in situ of the uterine cervix. Obstetrical & Gynecological Survey, 23, 580–585.
Hsu, L.M., & Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s π and Aickin’s α. Understanding Statistics, 2, 205–219.
Hubert, L. (1977). Kappa revisited. Psychological Bulletin, 84, 289–297.
Jakobsson, U., & Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427–431.
Janson, H., & Olsson, U. (2001). A measure of agreement for interval or nominal multivariate observations. Educational and Psychological Measurement, 61, 277–289.
Kraemer, H.C. (1979). Ramifications of a population model for κ as a coefficient of reliability. Psychometrika, 44, 461–472.
Kraemer, H.C., Periyakoil, V.S., & Noda, A. (2004). Tutorial in biostatistics: kappa coefficients in medical research. Statistics in Medicine, 21, 2109–2129.
Krippendorff, K. (2004). Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research, 30, 411–433.
Kundel, H.L., & Polansky, M. (2003). Measurement of observer agreement. Radiology, 288, 303–308.
Landis, J.R., & Koch, G.G. (1977). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33, 363–374.
Mielke, P.W., & Berry, K.J. (2009). A note on Cohen’s weighted kappa coefficient of agreement with linear weights. Statistical Methodology, 6, 439–446.
Mielke, P.W., Berry, K.J., & Johnston, J.E. (2007). The exact variance of weighted kappa with multiple raters. Psychological Reports, 101, 655–660.
Mielke, P.W., Berry, K.J., & Johnston, J.E. (2008). Resampling probability values for weighted kappa with multiple raters. Psychological Reports, 102, 606–613.
Nelson, J.C., & Pepe, M.S. (2000). Statistical description of interrater variability in ordinal ratings. Statistical Methods in Medical Research, 9, 475–496.
Pop**, R. (1983). Overeenstemmingsmaten voor Nominale Data. Unpublished doctoral dissertation, Rijksuniversiteit Groningen, Groningen.
Pop**, R. (2010). Some views on agreement to be used in content analysis studies. Quality & Quantity, 44, 1067–1078.
Schouten, H.J.A. (1986). Nominal scale agreement among observers. Psychometrika, 51, 453–466.
Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.
Scott, W.A. (1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19, 321–325.
Vanbelle, S., & Albert, A. (2009a). Agreement between two independent groups of raters. Psychometrika, 74, 477–491.
Vanbelle, S., & Albert, A. (2009b). Agreement between an isolated rater and a group of raters. Statistica Neerlandica, 63, 82–100.
Vanbelle, S., & Albert, A. (2009c). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163.
Visser, H., & de Nijs, T. (2006). The map comparison kit. Environmental Modelling & Software, 21, 346–358.
Warrens, M.J. (2008a). On similarity coefficients for 2×2 tables and correction for chance. Psychometrika, 73, 487–502.
Warrens, M.J. (2008b). On the equivalence of Cohen’s kappa and the Hubert–Arabie adjusted Rand index. Journal of Classification, 25, 177–183.
Warrens, M.J. (2009). k-adic similarity coefficients for binary (presence/absence) data. Journal of Classification, 26, 227–245.
Warrens, M.J. (2010a). Inequalities between kappa and kappa-like statistics for k×k tables. Psychometrika, 75, 176–185.
Warrens, M.J. (2010b). Cohen’s kappa can always be increased and decreased by combining categories. Statistical Methodology, 7, 673–677.
Warrens, M.J. (2010c). A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika, 75, 328–330.
Warrens, M.J. (2010d). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322–332.
Warrens, M.J. (2010e). Inequalities between multi-rater kappas. Advances in Data Analysis and Classification, 4, 271–286.
Warrens, M.J. (2011). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 4, 271–286.
Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374–378.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Warrens, M.J. Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas. Psychometrika 76, 471–486 (2011). https://doi.org/10.1007/s11336-011-9210-z
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-011-9210-z