Log in

Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

An agreement table with n∈ℕ≥3 ordered categories can be collapsed into n−1 distinct 2×2 tables by combining adjacent categories. Vanbelle and Albert (Stat. Methodol. 6:157–163, 2009c) showed that the components of Cohen’s weighted kappa with linear weights can be obtained from these n−1 collapsed 2×2 tables. In this paper we consider several consequences of this result. One is that the weighted kappa with linear weights can be interpreted as a weighted arithmetic mean of the kappas corresponding to the 2×2 tables, where the weights are the denominators of the 2×2 kappas. In addition, it is shown that similar results and interpretations hold for linearly weighted kappas for multiple raters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agresti, A. (1990). Categorical data analysis. New York: Wiley.

    Google Scholar 

  • Artstein, R., & Poesio, M. (2005). NLE technical note: Vol. 05-1. Kappa 3 = alpha (or beta). Colchester: University of Essex.

    Google Scholar 

  • Berry, K.J., & Mielke, P.W. (1988). A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educational and Psychological Measurement, 48, 921–933.

    Article  Google Scholar 

  • Brennan, R.L., & Prediger, D.J. (1981). Coefficient kappa: some uses, misuses, and alternatives. Educational and Psychological Measurement, 41, 687–699.

    Article  Google Scholar 

  • Brenner, H., & Kliebsch, U. (1996). Dependence of weighted kappa coefficients on the number of categories. Epidemiology, 7, 199–202.

    Article  PubMed  Google Scholar 

  • Cicchetti, D., & Allison, T. (1971). A new procedure for assessing reliability of scoring EEG sleep recordings. The American Journal of EEG Technology, 11, 101–109.

    Google Scholar 

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 213–220.

    Article  Google Scholar 

  • Cohen, J. (1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70, 213–220.

    Article  PubMed  Google Scholar 

  • Conger, A.J. (1980). Integration and generalization of kappas for multiple raters. Psychological Bulletin, 88, 322–328.

    Article  Google Scholar 

  • Davies, M., & Fleiss, J.L. (1982). Measuring agreement for multinomial data. Biometrics, 38, 1047–1051.

    Article  Google Scholar 

  • Fleiss, J.L. (1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76, 378–382.

    Article  Google Scholar 

  • Fleiss, J.L. (1981). Statistical methods for rates and proportions. New York: Wiley.

    Google Scholar 

  • Fleiss, J.L., & Cohen, J. (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement, 33, 613–619.

    Article  Google Scholar 

  • Fleiss, J.L., Cohen, J., & Everitt, B.S. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72, 323–327.

    Article  Google Scholar 

  • Heuvelmans, A.P.J.M., & Sanders, P.F. (1993). Beoordelaarsovereenstemming. In Eggen, T.J.H.M., & Sanders, P.F. (Eds.) Psychometrie in de Praktijk (pp. 443–470). Arnhem: Cito Instituut voor Toestontwikkeling.

    Google Scholar 

  • Holmquist, N.S., McMahon, C.A., & Williams, E.O. (1968). Variability in classification of carcinoma in situ of the uterine cervix. Obstetrical & Gynecological Survey, 23, 580–585.

    Article  Google Scholar 

  • Hsu, L.M., & Field, R. (2003). Interrater agreement measures: comments on kappan, Cohen’s kappa, Scott’s π and Aickin’s α. Understanding Statistics, 2, 205–219.

    Article  Google Scholar 

  • Hubert, L. (1977). Kappa revisited. Psychological Bulletin, 84, 289–297.

    Article  Google Scholar 

  • Jakobsson, U., & Westergren, A. (2005). Statistical methods for assessing agreement for ordinal data. Scandinavian Journal of Caring Sciences, 19, 427–431.

    Article  PubMed  Google Scholar 

  • Janson, H., & Olsson, U. (2001). A measure of agreement for interval or nominal multivariate observations. Educational and Psychological Measurement, 61, 277–289.

    Article  Google Scholar 

  • Kraemer, H.C. (1979). Ramifications of a population model for κ as a coefficient of reliability. Psychometrika, 44, 461–472.

    Article  Google Scholar 

  • Kraemer, H.C., Periyakoil, V.S., & Noda, A. (2004). Tutorial in biostatistics: kappa coefficients in medical research. Statistics in Medicine, 21, 2109–2129.

    Article  Google Scholar 

  • Krippendorff, K. (2004). Reliability in content analysis: some common misconceptions and recommendations. Human Communication Research, 30, 411–433.

    Google Scholar 

  • Kundel, H.L., & Polansky, M. (2003). Measurement of observer agreement. Radiology, 288, 303–308.

    Article  Google Scholar 

  • Landis, J.R., & Koch, G.G. (1977). An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33, 363–374.

    Article  PubMed  Google Scholar 

  • Mielke, P.W., & Berry, K.J. (2009). A note on Cohen’s weighted kappa coefficient of agreement with linear weights. Statistical Methodology, 6, 439–446.

    Article  Google Scholar 

  • Mielke, P.W., Berry, K.J., & Johnston, J.E. (2007). The exact variance of weighted kappa with multiple raters. Psychological Reports, 101, 655–660.

    PubMed  Google Scholar 

  • Mielke, P.W., Berry, K.J., & Johnston, J.E. (2008). Resampling probability values for weighted kappa with multiple raters. Psychological Reports, 102, 606–613.

    Article  PubMed  Google Scholar 

  • Nelson, J.C., & Pepe, M.S. (2000). Statistical description of interrater variability in ordinal ratings. Statistical Methods in Medical Research, 9, 475–496.

    Article  PubMed  Google Scholar 

  • Pop**, R. (1983). Overeenstemmingsmaten voor Nominale Data. Unpublished doctoral dissertation, Rijksuniversiteit Groningen, Groningen.

  • Pop**, R. (2010). Some views on agreement to be used in content analysis studies. Quality & Quantity, 44, 1067–1078.

    Article  Google Scholar 

  • Schouten, H.J.A. (1986). Nominal scale agreement among observers. Psychometrika, 51, 453–466.

    Article  Google Scholar 

  • Schuster, C. (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales. Educational and Psychological Measurement, 64, 243–253.

    Article  Google Scholar 

  • Scott, W.A. (1955). Reliability of content analysis: the case of nominal scale coding. Public Opinion Quarterly, 19, 321–325.

    Article  Google Scholar 

  • Vanbelle, S., & Albert, A. (2009a). Agreement between two independent groups of raters. Psychometrika, 74, 477–491.

    Article  Google Scholar 

  • Vanbelle, S., & Albert, A. (2009b). Agreement between an isolated rater and a group of raters. Statistica Neerlandica, 63, 82–100.

    Article  Google Scholar 

  • Vanbelle, S., & Albert, A. (2009c). A note on the linearly weighted kappa coefficient for ordinal scales. Statistical Methodology, 6, 157–163.

    Article  Google Scholar 

  • Visser, H., & de Nijs, T. (2006). The map comparison kit. Environmental Modelling & Software, 21, 346–358.

    Article  Google Scholar 

  • Warrens, M.J. (2008a). On similarity coefficients for 2×2 tables and correction for chance. Psychometrika, 73, 487–502.

    Article  PubMed  Google Scholar 

  • Warrens, M.J. (2008b). On the equivalence of Cohen’s kappa and the Hubert–Arabie adjusted Rand index. Journal of Classification, 25, 177–183.

    Article  Google Scholar 

  • Warrens, M.J. (2009). k-adic similarity coefficients for binary (presence/absence) data. Journal of Classification, 26, 227–245.

    Article  Google Scholar 

  • Warrens, M.J. (2010a). Inequalities between kappa and kappa-like statistics for k×k tables. Psychometrika, 75, 176–185.

    Article  Google Scholar 

  • Warrens, M.J. (2010b). Cohen’s kappa can always be increased and decreased by combining categories. Statistical Methodology, 7, 673–677.

    Article  Google Scholar 

  • Warrens, M.J. (2010c). A Kraemer-type rescaling that transforms the odds ratio into the weighted kappa coefficient. Psychometrika, 75, 328–330.

    Article  Google Scholar 

  • Warrens, M.J. (2010d). A formal proof of a paradox associated with Cohen’s kappa. Journal of Classification, 27, 322–332.

    Article  Google Scholar 

  • Warrens, M.J. (2010e). Inequalities between multi-rater kappas. Advances in Data Analysis and Classification, 4, 271–286.

    Article  Google Scholar 

  • Warrens, M.J. (2011). Weighted kappa is higher than Cohen’s kappa for tridiagonal agreement tables. Statistical Methodology, 4, 271–286.

    Google Scholar 

  • Zwick, R. (1988). Another look at interrater agreement. Psychological Bulletin, 103, 374–378.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthijs J. Warrens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Warrens, M.J. Cohen’s Linearly Weighted Kappa is a Weighted Average of 2×2 Kappas. Psychometrika 76, 471–486 (2011). https://doi.org/10.1007/s11336-011-9210-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-011-9210-z

Keywords

Navigation