TO THE EDITOR: We noted with interest the recent publication of “Randomized Controlled Trials on PET: A Systematic Review of Topics, Design, and Quality” (1) in The Journal of Nuclear Medicine. We are sure that this article will be confrontational to members of the nuclear medicine community as it again highlights the wide gulf that exists between our profession’s assessment of the patient benefits of PET and the conclusions reached by a highly influential international health technology assessment agency. This continues a theme addressed by us in a recent review in The Journal of Nuclear Medicine (2). Unfortunately, we believe that Scheibler et al. offer a rather simplistic analysis that is based on a superficial review of original data and lacks appropriate clinical perspective. Further, our critical evaluation suggests several methodologic, factual, and conceptual limitations that render the authors’ conclusions untenable.
Even the primary motivation for the review is flawed. The authors opine that randomized controlled trials (RCTs) are a critical component of evidence-based medicine (EBM) and are therefore required to evaluate the benefits of any new technology. It is, however, quite wrong to state that the principles of EBM require RCT evidence before valid conclusions can be drawn about the benefits of new diagnostic tests. A seminal article defining the values of EBM states that “Evidence-based medicine is not restricted to randomized trials and meta-analyses. It involves tracking down the best external evidence with which to answer our clinical questions. To find out about the accuracy of a diagnostic test, we need to find proper cross-sectional studies of patients clinically suspected of harboring the relevant disorder, not a randomized trial” (3).
RCTs are most useful when the mechanism of action of treatments is not fully understood or when there is uncertainty about the benefits versus risks. Unlike drug trials, in which 2 different therapies cannot be administered to a single patient to assess differential response or outcome, it is possible to perform more than one diagnostic test in an individual patient and ascertain which is superior. There is already abundant evidence that the diagnostic accuracy of PET/CT is superior to conventional staging approaches in many cancers (4), thus decreasing the need for RCTs and potentially making them unethical (5). Moreover, many of the RCTs identified by the authors, especially RCTs under way involving lymphoma, are primarily randomized trials of new risk-adapted therapeutic approaches rather than studies of PET per se. These involve a so-called enrichment design, in which the results of PET are used to enrich the sample before randomization. As such, almost all assume, on the basis of previously published studies (6), that PET provides superior prognostic stratification compared with conventional imaging. Even superficial analysis of the titles or the summary protocols of most of these trials makes it patently clear that they are not an evaluation of PET but rather are testing whether alternative treatment strategies can improve outcomes in patients stratified by PET. This is no different from almost any RCT in oncology, which uses imaging for determining patient eligibility or for stratification and as an integral component of response assessment—often a key study endpoint. It would be as nonsensical to consider such studies as being evaluations of conventional imaging as it is to consider many of the cited studies as being trials of PET.
In a more general context, if the authors used their methodology to ascertain the utility of a vast array of investigations or therapeutics such as chest radiography in patients with shortness of breath, defibrillation in cardiac arrest, or use of antibiotics in sepsis, the findings would similarly suggest a lack of clinical utility since randomized trials are lacking for these medical procedures. As such, their study methodology lacks clinical relevance or perspective, and the result denies the large body of high-quality scientific studies that demonstrate the high clinical impact of PET/CT.
The authors’ reference to the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) approach (7) constitutes further evidence of their misrepresentation of the principles of EBM. The GRADE working group publication provides a framework for thinking about diagnostic tests in terms of their impact on outcomes that are important to patients. The working group explicitly recognizes the multidimensional nature of evidence and provides guidelines for using studies of test accuracy to make inferences about the likely impact on patient-important outcomes. The guideline is clear that diagnostic accuracy can be used as a surrogate outcome for benefits and harms to patients and that observational studies can provide a valid basis for direct assessment of the patient benefits of diagnostic tests.
As demonstrated to be a disconcerting feature of many prior health technology assessment publications (2), the authors’ apparent misunderstanding of the values and principles of EBM allow them to accept as representative an analysis of just 0.02% of the PET literature (12 suitable RCTs of 60,174 articles) and to conclude from this limited sample that “it seems to be too early to draw general conclusions on the clinical benefit of this technology.” Even more remarkable is that the results of these 12 studies were not even analyzed for clinical validity but merely characterized by trial design and undefined quality measures. A recent publication from members of the GRADE working group (8) points to the potential for bias inherent in the analysis of Scheibler et al. The GRADE participants state, “Tests can be compared by evaluating the downstream consequences of testing on patient outcomes, either directly in a randomised controlled trial or by decision analysis models that integrate multiple sources of evidence.” These EBM experts describe an approach that “supports a full interpretation of empirical results by enabling trialists to distinguish between true ineffectiveness, poor protocol implementation, and methodologic flaws in the study design.” With reference to the published RCT of Viney et al. (9) in non–small cell lung cancer, they correctly identified that “the failure of PET to reduce the rate of thoracotomies in patients with non–small cell lung cancer was shown to lie with an ill conceived treatment strategy, rather than with efficacy of the test.” Nevertheless, Scheibler et al. cite the Viney RCT as a “negative” result for PET with a “low risk” of bias despite these obvious quality limitations in the conduct of the RCT and the marked spectrum bias that the trial cohort exhibited. This speaks loudly of a lack of clinical perspective. Similarly, the inclusion—from at least 60,000 papers on PET—of the trial by Plewnia et al. (10) involving 6 patients who were randomized to an intervention or sham procedure to treat tinnitus guided by 15O-water PET must raise questions about the validity of the primary data underpinning their conclusions.
Beyond the misguided rationale and scanty assessment of the available literature, there are significant internal inconsistencies in the paper that are of concern. For example, the abstract’s conclusion that a relatively high number of ongoing RCTs of PET in several oncologic fields are expected to produce robust results over the next few years is vastly different in meaning from the statement in the body text that “it is difficult to determine whether an interaction is going to be calculated between the PET result and the effect of therapy.” When the methodologic quality of these pending RCTs cannot be known before publication, and Scheibler et al. grade 50% of the already published RCTs as having a “high” risk of bias, their abstract is, at best, disingenuous and, at worst, misleading. The authors also fail to explain how studies completed in 2006 and 2008, but as yet unpublished, are likely to contribute knowledge about the patient benefits of PET in future.
The rigor with which they have assessed even the cited trials must also be questioned. As well as minor factual errors such as indicating that trial NCT00313560 in the ClinicalTrials.gov registry was conducted in Australia when it was actually done in the United States, the authors failed to identify NCT00882609, a large international RCT comparing 18F-sodium fluoride PET/CT with conventional bone scanning for detecting skeletal metastases.
Of greater concern, a substantial number of unpublished RCTs listed by Scheibler et al. cannot provide robust information about the independent contribution of PET to patient outcomes. For example, in NCT00367341, the outcome of escitalopram versus cognitive behavioral therapy is being examined. However, the randomization is not enriched by PET, because both the control and the treatment arms are undergoing 18F-FDG PET, and the study additionally has a crossover design. Therefore, the investigators quite appropriately have not included PET in either the primary or secondary objectives of the study because the relationship between PET and the patient outcomes being measured could not be determined from the trial design. Many other studies listed by Scheibler et al. in Table 5 also use PET in both the control and the experimental arms. Although these trials may provide useful evidence of merits of PET as a biomarker in particular clinical settings, any evidence of the utility of PET must be considered observational in nature. Although it may be possible to conclude that the combination of PET as a biomarker and a particular intervention strategy does not improve patient outcomes (with due caution as outlined by the GRADE methodologists above), negative trials of this design should not be used to justify conclusions that PET does not provide patient benefits.
In our view, it is long past time that clinical researchers and methodologists heeded the wisdom of Black, who wrote in 1996 that “the false conflict between those who advocate randomized trials in all situations and those who believe observational data provide sufficient evidence needs to be replaced with a mutual recognition of the complementary roles of the two approaches. Researchers should be united in their quest for scientific rigour in evaluation, regardless of the method used” (11).
In this context, we believe Scheibler et al. have presented an analysis that is so lacking in scientific rigor that it can be fairly judged to have provided a prejudiced appraisal of the evidence pertaining to the patient benefits of PET and the potential benefits of pending and future RCTs.
Footnotes
Published online Oct. 3, 2012.
- © 2012 by the Society of Nuclear Medicine and Molecular Imaging, Inc.