Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests
Introduction
The crude results of most serodiagnostic tests are measured on ordinal (e.g. grading scheme or sample titration) or continuous (e.g. quantitative readings of single-dilution tests) scales. For all diagnostic tests (except those producing dichotomous outcomes) a value on the original scale is selected as a decision threshold (cut-off value) to define positive and negative test outcomes. Comparison of the dichotomised test results against the true status of individuals (as determined by a reference or “gold standard” test) allows estimation of the diagnostic sensitivity (Se, probability of a positive test outcome in a diseased individual) and specificity (Sp, probability of a negative test outcome in a non-diseased individual) (see Greiner and Gardner, 2000). It is well recognised that Se and Sp are inversely related depending on the choice of cut-off value. When increasing values of a measurement are associated with disease, higher (lower) cut-off values are generally associated with lower (higher) Se and a higher (lower) Sp. This relationship has two important implications. First, we would like to select a cut-off value such that the desired operating characteristics (Se, Sp) are achieved. Second, we realise that Se and Sp at a single cut-off value do not describe the test’s performance at other potential cut-off values. The latter also implies that the effect of the selected cut-off value should be taken into account when comparing diagnostic tests. These problems are addressed by the receiver-operating characteristic (ROC) analysis and its derivatives.
The ROC methodology was developed in the early 1950s for the analysis of signal detection in technical sciences and was first used in medicine in the late 1960s for the assessment of imaging devices (reviewed by Zweig and Campbell, 1993). ROC analysis has been increasingly used for the evaluation of clinical laboratory tests (Metz, 1978; Henderson, 1993; Schulzer, 1994; Smith, 1995). However, Henderson and Bhayana (1995) reported a lack of consistency with respect to the presentation of ROC analyses. The use of ROC analysis is still limited in the medical and veterinary literature. A systematic review of evaluation (validation) studies of serodiagnostic tests published in 12 biomedical journals in 1995 revealed that ROC analysis has been used in only 3 of 65 medical studies and 1 of 33 veterinary studies (Greiner and Wind, unpublished).
We review practically relevant features of ROC curves and related approaches with emphasis on cut-off selection and test comparison. Data obtained by enzyme-linked immunosorbent assays (ELISAs) for the detection of Trypanosoma antibodies will be used as an example. The presentation will refer to continuous ELISA data because this test format is often used for seroepidemiologic applications. The principles, however, apply also to continuous and ordinal diagnostic tests in general. Finally, we describe some extensions of classical ROC-analysis methodology. In the following examples, increasing values of a test result are associated with increasing likelihood of disease.
Section snippets
Example data
We use a random subset of data from a validation study of antibody ELISAs for the detection of Trypanosoma antibodies in bovine serum. In this study, a negative control group was sampled from non-exposed (Germany) and from exposed (parasitologically non-infected cattle from a tsetse-infested area in Uganda) cattle populations. The positive control group was sampled from the exposed (parasitologically confirmed) population (Greiner et al., 1997). Test antigen derived from blood-stream form
Basic principles of ROC curves
The underlying assumption of ROC analysis is that a diagnostic variable (e.g. ELISA values) is used to discriminate between two mutually exclusive states of tested animals. During the following discussion, we consider the true disease status (denoted D+ and D− for diseased and non-diseased animals, respectively) but note that various other conditions such as infected/non-infected and protected/non-protected established using an appropriate reference method could also be the aim of diagnostic
Recent developments
Confidence bands for ROC curves are needed for inferences from a visual comparison of curves for two or more tests. Methods based on the Greenhouse–Mantel test (Schäfer, 1994), Kolmogorov–Smirnov test and bootstrapping (Campbell, 1994) have been suggested for construction of confidence bands. Confidence intervals for the AUC for diagnostic systems that involve multiple tests were developed by Reiser and Faraggi (1997).
Another topic of current methodological research is the analysis of
Software for ROC analysis
Software for ROC analysis is available in various formats including commercial, shareware or stand-alone products, statistical-program packages with built-in or user-defined ROC modules, and spreadsheet calculation macros. Some available programmes are listed in Table 3. However, the list is not comprehensive and we have not compared the relative advantages of the listed programmes. Some features (based on our experience and information provided by the producers) are listed as a guide. A
Conclusions
ROC analysis visualises the cut-off-dependency of ordinal or continuous diagnostic tests and provides an estimate of the accuracy that is independent of specific cut-off values and prevalence. ROC curves allow a comparison between different diagnostic tests. In addition, the curve provides information which will enable the diagnostician to optimise use of a test through targeted selection of cut-off values for particular diagnostic strategies.
References (54)
The area above the ordinal dominance graph and the area below the receiver operating characteristic graph
J. Math. Psychol.
(1975)- et al.
Notes about determining the cut-off value in enzyme-linked immunosorbent assay (ELISA)
Prev. Vet. Med.
(1993) - et al.
Methods for estimating areas under receiver-operating characteristic curves: illustration with somatic-cell scores in subclinical intramammary infections
Prev. Vet. Med.
(1999) - et al.
Conditional dependence between tests affects the diagnosis and surveillance of animal diseases
Prev. Vet. Med.
(2000) Two-graph receiver operating characteristic (TG-ROC): update version supports optimisation of cut-off values that minimise overall misclassification costs
J. Immunol. Methods
(1996)- et al.
Epidemiologic issues in the validation of veterinary diagnostic tests
Prev. Vet. Med.
(2000) - et al.
A modified ROC analysis for the selection of cut-off values and the definition of intermediate results of serodiagnostic tests
J. Immunol. Methods
(1995) - et al.
Evaluation and comparison of antibody ELISAs for serodiagnosis of bovine trypanosomosis
Vet. Parasitol.
(1997) - et al.
Log-linear and logistic modeling of dependence among diagnostic tests
Prev. Vet. Med.
(2000) - et al.
Is there a gain from chance-corrected measures of diagnostic validity?
J. Clin. Epidemiol.
(1997)
Meta-analytic methods for diagnostic test accuracy
J. Clin. Epidemiol.
Basic principles of ROC analysis
Semin. Nucl. Med.
Likelihood ratios for continuous test results — making the clinician’s job easier or harder?
J. Clin. Epidemiol.
Clinical evaluation of test strategies. A decision analysis of parameter estimation
Clin. Lab. Med.
A computer program for non-parametric receiver operating characteristic analysis
Comput. Methods Programs Biomed.
On the use of likelihood ratios in clinical chemistry
Clin. Chem.
Analysis of clustered data in receiver operating characteristic studies
Stat. Meth. Med. Res.
A statistical method for the comparison of a discrete diagnostic test with several continuous diagnostic tests
Biometrics
On comparisons of sensitivity, specificity, and predictive value of a number of diagnostic procedures
Biometrics
Advances in statistical methodology for the evaluation of diagnostic and laboratory tests
Stat. Med.
Slopes of a receiver operating characteristic curve and likelihood ratios for a diagnostic test
Am. J. Epidemiol.
Maximum likelihood estimation of parameters of signal detection theory — a direct solution
Psychometrika
How to correct for chance agreement in the estimation of sensitivity and specificity of diagnostic tests
Methods Inf. Med.
The robustness of the “binormal” assumptions used in fitting ROC curves
Med. Decis. Mak.
The meaning and use of the area under a receiver operating characteristic curve
Radiology
A method of comparing the areas under receiver operating characteristic curves derived from the same cases
Radiology
Cited by (1547)
Predicting the Effect of Proton Beam Therapy Technology on Pulmonary Toxicities for Patients With Locally Advanced Lung Cancer Enrolled in the Proton Collaborative Group Prospective Clinical Trial
2024, International Journal of Radiation Oncology Biology PhysicsDevelopment of an optimal short form of the GAD-7 scale with cross-cultural generalizability based on Riskslim
2024, General Hospital PsychiatryA novel Bayesian Latent Class Model (BLCM) evaluates multiple continuous and binary tests: A case study for Brucella abortus in dairy cattle
2024, Preventive Veterinary Medicine