Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests

https://doi.org/10.1016/S0167-5877(00)00115-XGet rights and content

Abstract

We review the principles and practical application of receiver-operating characteristic (ROC) analysis for diagnostic tests. ROC analysis can be used for diagnostic tests with outcomes measured on ordinal, interval or ratio scales. The dependence of the diagnostic sensitivity and specificity on the selected cut-off value must be considered for a full test evaluation and for test comparison. All possible combinations of sensitivity and specificity that can be achieved by changing the test’s cut-off value can be summarised using a single parameter; the area under the ROC curve. The ROC technique can also be used to optimise cut-off values with regard to a given prevalence in the target population and cost ratio of false-positive and false-negative results. However, plots of optimisation parameters against the selected cut-off value provide a more-direct method for cut-off selection. Candidates for such optimisation parameters are linear combinations of sensitivity and specificity (with weights selected to reflect the decision-making situation), odds ratio, chance-corrected measures of association (e.g. kappa) and likelihood ratios. We discuss some recent developments in ROC analysis, including meta-analysis of diagnostic tests, correlated ROC curves (paired-sample design) and chance- and prevalence-corrected ROC curves.

Introduction

The crude results of most serodiagnostic tests are measured on ordinal (e.g. grading scheme or sample titration) or continuous (e.g. quantitative readings of single-dilution tests) scales. For all diagnostic tests (except those producing dichotomous outcomes) a value on the original scale is selected as a decision threshold (cut-off value) to define positive and negative test outcomes. Comparison of the dichotomised test results against the true status of individuals (as determined by a reference or “gold standard” test) allows estimation of the diagnostic sensitivity (Se, probability of a positive test outcome in a diseased individual) and specificity (Sp, probability of a negative test outcome in a non-diseased individual) (see Greiner and Gardner, 2000). It is well recognised that Se and Sp are inversely related depending on the choice of cut-off value. When increasing values of a measurement are associated with disease, higher (lower) cut-off values are generally associated with lower (higher) Se and a higher (lower) Sp. This relationship has two important implications. First, we would like to select a cut-off value such that the desired operating characteristics (Se, Sp) are achieved. Second, we realise that Se and Sp at a single cut-off value do not describe the test’s performance at other potential cut-off values. The latter also implies that the effect of the selected cut-off value should be taken into account when comparing diagnostic tests. These problems are addressed by the receiver-operating characteristic (ROC) analysis and its derivatives.

The ROC methodology was developed in the early 1950s for the analysis of signal detection in technical sciences and was first used in medicine in the late 1960s for the assessment of imaging devices (reviewed by Zweig and Campbell, 1993). ROC analysis has been increasingly used for the evaluation of clinical laboratory tests (Metz, 1978; Henderson, 1993; Schulzer, 1994; Smith, 1995). However, Henderson and Bhayana (1995) reported a lack of consistency with respect to the presentation of ROC analyses. The use of ROC analysis is still limited in the medical and veterinary literature. A systematic review of evaluation (validation) studies of serodiagnostic tests published in 12 biomedical journals in 1995 revealed that ROC analysis has been used in only 3 of 65 medical studies and 1 of 33 veterinary studies (Greiner and Wind, unpublished).

We review practically relevant features of ROC curves and related approaches with emphasis on cut-off selection and test comparison. Data obtained by enzyme-linked immunosorbent assays (ELISAs) for the detection of Trypanosoma antibodies will be used as an example. The presentation will refer to continuous ELISA data because this test format is often used for seroepidemiologic applications. The principles, however, apply also to continuous and ordinal diagnostic tests in general. Finally, we describe some extensions of classical ROC-analysis methodology. In the following examples, increasing values of a test result are associated with increasing likelihood of disease.

Section snippets

Example data

We use a random subset of data from a validation study of antibody ELISAs for the detection of Trypanosoma antibodies in bovine serum. In this study, a negative control group was sampled from non-exposed (Germany) and from exposed (parasitologically non-infected cattle from a tsetse-infested area in Uganda) cattle populations. The positive control group was sampled from the exposed (parasitologically confirmed) population (Greiner et al., 1997). Test antigen derived from blood-stream form

Basic principles of ROC curves

The underlying assumption of ROC analysis is that a diagnostic variable (e.g. ELISA values) is used to discriminate between two mutually exclusive states of tested animals. During the following discussion, we consider the true disease status (denoted D+ and D− for diseased and non-diseased animals, respectively) but note that various other conditions such as infected/non-infected and protected/non-protected established using an appropriate reference method could also be the aim of diagnostic

Recent developments

Confidence bands for ROC curves are needed for inferences from a visual comparison of curves for two or more tests. Methods based on the Greenhouse–Mantel test (Schäfer, 1994), Kolmogorov–Smirnov test and bootstrapping (Campbell, 1994) have been suggested for construction of confidence bands. Confidence intervals for the AUC for diagnostic systems that involve multiple tests were developed by Reiser and Faraggi (1997).

Another topic of current methodological research is the analysis of

Software for ROC analysis

Software for ROC analysis is available in various formats including commercial, shareware or stand-alone products, statistical-program packages with built-in or user-defined ROC modules, and spreadsheet calculation macros. Some available programmes are listed in Table 3. However, the list is not comprehensive and we have not compared the relative advantages of the listed programmes. Some features (based on our experience and information provided by the producers) are listed as a guide. A

Conclusions

ROC analysis visualises the cut-off-dependency of ordinal or continuous diagnostic tests and provides an estimate of the accuracy that is independent of specific cut-off values and prevalence. ROC curves allow a comparison between different diagnostic tests. In addition, the curve provides information which will enable the diagnostician to optimise use of a test through targeted selection of cut-off values for particular diagnostic strategies.

References (54)

  • L Irwig et al.

    Meta-analytic methods for diagnostic test accuracy

    J. Clin. Epidemiol.

    (1995)
  • C.E Metz

    Basic principles of ROC analysis

    Semin. Nucl. Med.

    (1978)
  • D.L Simel et al.

    Likelihood ratios for continuous test results — making the clinician’s job easier or harder?

    J. Clin. Epidemiol.

    (1993)
  • E.J Sondik

    Clinical evaluation of test strategies. A decision analysis of parameter estimation

    Clin. Lab. Med.

    (1982)
  • S Vida

    A computer program for non-parametric receiver operating characteristic analysis

    Comput. Methods Programs Biomed.

    (1993)
  • A Albert

    On the use of likelihood ratios in clinical chemistry

    Clin. Chem.

    (1982)
  • Anderson, J.A., 1982. Logistic regression. In: Krishnaiah, P.R., Kanal, L.N. (Eds.), Handbook of Statistics....
  • C.A Beam

    Analysis of clustered data in receiver operating characteristic studies

    Stat. Meth. Med. Res.

    (1998)
  • C.A Beam et al.

    A statistical method for the comparison of a discrete diagnostic test with several continuous diagnostic tests

    Biometrics

    (1991)
  • B.M Bennett

    On comparisons of sensitivity, specificity, and predictive value of a number of diagnostic procedures

    Biometrics

    (1972)
  • G Campbell

    Advances in statistical methodology for the evaluation of diagnostic and laboratory tests

    Stat. Med.

    (1994)
  • B.C.K Choi

    Slopes of a receiver operating characteristic curve and likelihood ratios for a diagnostic test

    Am. J. Epidemiol.

    (1998)
  • D.D Dorfman et al.

    Maximum likelihood estimation of parameters of signal detection theory — a direct solution

    Psychometrika

    (1968)
  • O Gefeller et al.

    How to correct for chance agreement in the estimation of sensitivity and specificity of diagnostic tests

    Methods Inf. Med.

    (1994)
  • J.A Hanley

    The robustness of the “binormal” assumptions used in fitting ROC curves

    Med. Decis. Mak.

    (1988)
  • J.A Hanley et al.

    The meaning and use of the area under a receiver operating characteristic curve

    Radiology

    (1982)
  • J.A Hanley et al.

    A method of comparing the areas under receiver operating characteristic curves derived from the same cases

    Radiology

    (1983)
  • Cited by (1547)

    View all citing articles on Scopus
    View full text