Measurement of observer agreement

Harold L Kundel; Marcia Polansky

doi:10.1148/radiol.2282011860

Measurement of observer agreement

Radiology. 2003 Aug;228(2):303-8. doi: 10.1148/radiol.2282011860. Epub 2003 Jun 20.

Authors

Harold L Kundel¹, Marcia Polansky

Affiliation

¹ Department of Radiology and MCP Hahnemann School of Public Health, University of Pennsylvania Medical Center, 3600 Market St, Suite 370, Philadelphia, PA 19104, USA. kundel@rad.upenn.edu

PMID: 12819342
DOI: 10.1148/radiol.2282011860

Abstract

Statistical measures are described that are used in diagnostic imaging for expressing observer agreement in regard to categorical data. The measures are used to characterize the reliability of imaging methods and the reproducibility of disease classifications and, occasionally with great care, as the surrogate for accuracy. The review concentrates on the chance-corrected indices, kappa and weighted kappa. Examples from the imaging literature illustrate the method of calculation and the effects of both disease prevalence and the number of rating categories. Other measures of agreement that are used less frequently, including multiple-rater kappa, are referenced and described briefly.

Publication types

Research Support, U.S. Gov't, P.H.S.

MeSH terms

Data Interpretation, Statistical*
Diagnostic Imaging*
Humans
Observer Variation*
Reproducibility of Results

Grants and funding

P01 CA53141/CA/NCI NIH HHS/United States