Assessment of the accuracy of diagnostic tests: the cross-sectional study

https://doi.org/10.1016/S0895-4356(03)00206-3Get rights and content

Abstract

  • In diagnostic accuracy studies, the contrast of interest can be one of the following: one single test contrast; comparing two or more single tests; further testing in addition to previous diagnostics; and comparing alternative diagnostic strategies. The clinical diagnostic problem under study must be specified. Studies of “extreme contrasts” (as early phase evaluations) and studies in “clinical practice” settings (assessing clinical value) should be distinguished.

  • Design options are (1) survey of the total study population, (2) case–referent approach, or (3) test-based enrolment. Data collection should generally be prospective, but ambispective and retrospective approaches are sometimes appropriate. In addition to determinants of primary interest [the test(s) under study] possible modifiers of test accuracy and confounding variables must be specified.

  • The reference standard procedure should be independent from the test results. Applying a reference standard can be difficult in case of classification errors, lack of a clear pathophysiologic concept, incorporation bias, or invasive or complex investigations. Possible solutions are: an independent expert panel, and the delayed type cross-sectional study (clinical follow-up). Also, a prognostic criterion can be chosen.

  • For studies to be relevant for practice, inclusion criteria must be based on “intention to diagnose” or “intention to screen.” The recruitment procedure is preferably a consecutive series of presenting patients or a target population screening, respectively.

  • Sample size estimation should be routine. Analysis has to be focused on the contrast of interest. Estimating test accuracy and prediction of outcome need different approaches.

  • External (clinical) validation requires repeated studies in other, similar populations. Also, systematic reviews and meta-analysis have a role.

  • To enable readers of diagnostic research reports to evaluate whether methodological key issues were addressed, authors are advised to follow the STARD guidelines.

Introduction

Although the ultimate objective of the diagnostic process is to optimize the prognosis of the patient by enabling the clinician to choose an adequate management strategy, an accurate diagnostic assessment is a first and indispensable step.

Making a clinical diagnosis implies classifying the presented health problem in the context of accepted nosologic knowledge. This can result in confirming or excluding the presence of a certain disease, in selecting a disease from a set of candidate diseases, or in concluding that a number of diseases is simultaneously present [1]. Not infrequently, however, further classification than the observed symptomatology (e.g., low back pain) cannot be achieved. Sometimes, further classification is possible but not worthwhile considering the balance between expected gain in certainty, the burden of making a definitive diagnosis, and clinical relevance.

Apart from diagnostic classification, assessing the clinical severity or monitoring the clinical course of an already diagnosed condition may be the aim of testing. Another clinical application is documenting the precise localization or shape of a diagnosed lesion to support further, for example, surgical, decision making.

A possibly new diagnostic test must first go through a phase of pathophysiologic and technical development before its clinical effectiveness in terms of diagnostic accuracy or prognostic impact will be evaluated. The methodology discussed in this article, focused on accuracy, is applicable to the clinical evaluation of tests that have successfully passed this early development. A basic question then to be answered is: what is the probability that this particular patient with this particular test result has a certain disorder or a combination of disorders? Obtaining an evidence-based answer, using clinical epidemiologic research data, requires the analysis of the association between the test result and the appropriate diagnostic classification, that is, the presence or absence of certain diagnoses.

This article discusses principles, design, and pitfalls of cross-sectional diagnostic accuracy research. In this context, cross-sectional research includes studies in which the measured test results and the health status to be diagnosed essentially represent one point in time for each study subject [2].

Section snippets

Diagnostic research on test accuracy: the basic steps to take

All measures of diagnostic association [3] can be derived from research data on the relation between test results and a reference standard diagnosis. A valid data collection on this relation is the first and foremost point of concern [4], while the various measures can be calculated by applying straightforward analytic methods. Research data for the purpose of diagnostic discrimination are generally collected in cross-sectional research, irrespective of the diagnostic parameters to be used.

As

The research question: contrast to be evaluated

The diagnostic research question should define:

  • a.

    The test or test set to be evaluated.

  • b.

    The clinical problem for which the use of the test(s) is considered possibly relevant.

  • c.

    Whether the planned study should evaluate (1) the potential of the test procedure to discriminate between subjects with and without a target disorder in a situation of extreme contrast, or (2) to what extent testing could discriminate in a daily practice clinical setting (where discrimination is, by definition, more difficult).

Outline of the study design

Because study questions on diagnostic accuracy generally evaluate the association between (combinations of) test results and health status (mostly the presence or absence of a target disorder), a cross-sectional design is a natural basic design option. However, this basic design has various modifications, each with specific pros and cons in terms of scientific requirements, burden for the study subjects and efficient use of resources (Box 2).

Determinants

As in any (clinical) epidemiologic study, research questions on diagnostic accuracy can be operationalized in a central “occurrence relation” [12] between independent and dependent variables (Fig. 1).

The independent variable or determinant of primary interest is the test result(s) to be evaluated, and the primary dependent or outcome variable is (presence or absence of) the target disorder. When evaluating a single test, the test results in all study subjects are compared with the reference

Specifying the study population

As in all clinical research, the study population for diagnostic research should be appropriately defined and recruited. The selection of patients is crucial for the study outcome and its external (clinical) validity. Diagnostic accuracy depends on the spectrum of included patients and the results of associated tests earlier performed, and may differ for patients in primary care and those referred to a hospital [6], [7], [22], [27].

Given that the test has succesfully passed early phase studies

Adverse effects of test and reference standard

Apart from test accuracy, the performance of a test has to be evaluated as to its discomfort to patient and doctor. A test should be minimally invasive and have a minimal risk of adverse effects and serious complications. Measuring these aspects in the context of a diagnostic accuracy study adds to the comparison with other tests as to their clinical pros and cons.

For the research community, it is also important to learn about the invasiveness and risks of the reference standard used. For

Statistical aspects

In the planning phase of the study, the needed sample size should be estimated (Box 5). For evaluating the relation between a dichotomous test and the presence of a disorder, conventional programs for sample size estimation can be used. For example, for a case–referent study with equal group sizes, accepting certain values for a type I and a type II error (e.g., 0.05 and 0.20, respectively) and using two-sided testing, one can calculate the number of subjects needed per group to detect a

External validation

Analysis of diagnostic accuracy in the collected dataset, especially the results of multivariable analyses, can yield optimistic results that may not be reproduced in clinical practice or similar study populations [36]. Therefore, it is advisable to perform one or more separate external validation studies in independent but clinically similar populations (Box 6). Sometimes, authors derive a diagnostic model in a random half of the research data set and test its performance in the other half

Conclusion

This article was focused on diagnostic research to assess test accuracy. In all phases of developing and executing study protocols for this purpose, specific options, requirements, and pitfalls have to be considered. In reporting results of diagnostic research, it is important to provide all essential information to enable readers to evaluate whether a study addressed the methodologic key issues at stake. To achieve this, authors are advised to follow the guidelines recently published as the

References (39)

  • D.L Sackett et al.

    The architecture of diagnostic research

  • H.E.J.H Stoffers

    Peripheral arterial occlusive disease. Prevalence and diagnostic management in general practice. Thesis

    (1995)
  • P.C.A.J Vroomen

    The diagnosis and conservative treatment of sciatica

    (1998)
  • J.A Knottnerus et al.

    Unexplained fatigue and hemoglobin, a primary care study

    Can Fam Physician

    (1986)
  • O.S Miettinen

    Theoretical epidemiology. Principles of occurrence research in medicine

    (1985)
  • D.J Spiegelhalter et al.

    Statistical and knowledge-based approaches to clinical decision support systems, with an application to gastroenterology

    Journal of the Royal Statistical Society

    (1984)
  • J.A Knottnerus

    Application of logistic regression to the analysis of diagnostic data: exact modeling of a probability tree of multiple binary variables

    Med Decis Making

    (1992)
  • L Irwig et al.

    Designing studies to ensure that estimates of test accuracy are transferable

    BMJ

    (2002)
  • G.J Dinant et al.

    Reliability of the erythrocyte sedimentation rate in general practice

    Scand J Prim Health Care

    (1989)
  • Cited by (230)

    View all citing articles on Scopus

    This article is an adapted version of: Knottnerus JA, Ed. The evidence base of clinical diagnosis. chapter 3. London: BMJ Books; 2002; published with written permission obtained from BMJ Books.

    View full text