Abstract
Randomized controlled trials (RCTs) add important information to diagnostic accuracy studies in the evaluation of PET and PET/CT. We evaluated how many RCTs on PET existed, which clinical topics they addressed, and what their design and quality were. Methods: We searched MEDLINE, EMBASE, and the Cochrane Central Register of Controlled Trials (Clinical Trials) up to August 2010. We also searched in ClinicalTrials.gov and the International Clinical Trials Registry Platform for ongoing RCTs up to March 2011. Titles and abstracts and full texts were screened independently by 2 reviewers. Study characteristics were extracted with standard extraction sheets for ongoing and published RCTs, and risk of bias was assessed for published ones. Results: We identified 54 RCTs, 12 of which were published. The main topics in published studies were non–small cell lung cancer and colorectal cancer; only 3 were conducted in nononcologic fields (this trend was similar in ongoing studies, in which the most common topic was Hodgkin disease). The main indications in the oncologic PET studies were staging in published studies and restaging (mostly including an early assessment of treatment response) in ongoing ones. All except 1 of the published studies applied a marker-based strategy design, whereas about 43% (18/42) of ongoing studies use a more efficient design (Enrichment Design or Marker by Treatment Interaction Design). Conclusion: A relatively high number of ongoing RCTs of PET in several oncologic fields are expected to produce robust results over the next few years. For nononcologic topics, further high-quality studies are still needed to ascertain the benefit of this technique for patients. As funding is usually difficult in nondrug topics, alternative concepts of funding, which should also involve the manufacturers of diagnostic devices, but also more efficient study designs, should be applied to bridge the evidence gap on PET in the near future.
Positron emission tomography (PET) or PET/CT is a rapidly evolving technique that enables the imaging of metabolically active tissue, such as many types of cancer (1). PET is widely applied because of its expected capacity to detect, describe, and monitor various malignant and benign diseases. Between 2004 and 2008, the annual rates of PET examinations increased by 18% in the United States (2). In most European countries, the annual rate of PET examinations is increasing at a similar pace and ranges between 1,000 and 2,000 per 1 million inhabitants (3).
According to the principles of evidence-based medicine, evidence from RCTs measuring patient-relevant outcomes (i.e., mortality, morbidity, and quality of life) is required for new diagnostic tests or markers (e.g., for those with higher sensitivity than existing ones) to draw valid conclusions as to their benefit (4–6). Studies investigating diagnostic test accuracy alone are unable to prove that patients with the disease of interest who were additionally identified with the new test actually benefit from the detection of the disease (7,8). The same applies to test-negative patients additionally identified: here too, it needs to be demonstrated that a reduction in treatment is actually accompanied by an improvement in patient-relevant outcomes. The Grading of Recommendations Assessment, Development and Evaluation working group, therefore, regards test accuracy only as a surrogate for such outcomes (8). Thus, the evaluation of a diagnostic intervention is inevitably linked to the evaluation of a therapeutic intervention, and a benefit will be achieved only if both are effective (9,10).
In the past, most clinical studies on PET have focused on diagnostic accuracy or changes in management, without bridging the gap to patient-relevant outcomes. There has been some debate in the field of nuclear medicine on whether randomized designs are necessary and possible (11,12). It has to be considered that 18F-FDG PET and PET/CT entered clinical practice at the same time as evidence-based medicine and have thus been subjected to closer scrutiny than older technologies. Meanwhile, more RCTs on PET measuring patient-relevant outcomes have been published (13–15), and clinical guidelines and funding agencies call for evidence from RCTs to make positive recommendations or reimbursement decisions (1,5,16). These developments indicate that the future use of PET will heavily depend on the availability of evidence from RCTs. Therefore, it seems important to overcome reservations concerning the conduct of such studies and to improve their design.
The aim of the present review was to systematically identify RCTs on PET measuring patient-relevant outcomes in any medical indication to outline both the main fields and any gaps in research and to summarize features of study design and quality. Furthermore, it was to be assessed whether it was feasible to conduct RCTs to demonstrate a patient-relevant benefit of PET.
MATERIALS AND METHODS
Study Inclusion Criteria
We included both published and unpublished studies comparing PET or PET/CT with standard diagnostic technologies or clinical examinations in patients with any medical condition. Eligible studies were those that used a randomized controlled design. We applied no restrictions on language or time of publication or on the type of PET tracer used.
Randomized Designs for Diagnostic Tests or Markers
Three major types of randomized designs for proving the benefit of diagnostic tests and markers are discussed in the methodologic literature (6,17,18). In their seminal paper, Sargent et al. (17) proposed to distinguish among the following 3 different designs: marker-based strategy, enrichment, and marker by treatment interaction.
Marker-Based Strategy Design.
In the marker-based strategy design, patients are randomized to a group applying the new marker (or diagnostic technology) and a group not applying it (i.e., relying on the old diagnostic strategy for selecting treatment). Similar to drug studies, the new technology should result in better patient-relevant outcomes than the old one.
Enrichment Design.
In the enrichment design, the marker (or diagnostic technology) is used to enrich the sample before randomization. For instance, in the ongoing PETAL (Positron Emission Tomography Guided Therapy of Aggressive Non-Hodgkin's Lymphomas) study (19), only those patients with aggressive non-Hodgkin lymphoma who have a positive PET scan after 2 cycles of the standard therapy R-CHOP (rituximab–cyclophosphamide, doxorubicin, vincristine, and prednisone) are being randomized to receive either a further 6 cycles of R-CHOP or 6 blocks of the more aggressive B-ALL protocol (rituximab, methotrexate, ifosfamide, etoposide, cytarabine, vincristine, cyclophosphamide, doxorubicin, vindesine, and dexamethasone) (19). If studies using this design find a difference in patient-relevant outcomes, a PET-guided therapy plan will have proven to be beneficial. The same technique can be applied to identify test-negative patients in the sample who can then be randomized into 2 different treatment groups (e.g., one with a deescalating treatment strategy, “derichment” design) (20). An enrichment design requires strong arguments against the existence of a marker-by-treatment interaction.
Marker-by-Treatment-Interaction Design.
The marker-by-treatment-interaction design is similar to an RCT on a new drug or any kind of intervention. Patients are randomized to receive either the new drug or the conventional treatment (or a placebo). The diagnostic test is performed before the randomization and should ideally be kept masked. This test is then used to prove whether treatment is particularly effective, or not effective at all, for given (ideally prespecified) subgroups (21). To prove the benefit of the diagnostic test, a qualitative or a strong quantitative interaction between the result of the test (or marker) and the effect of the therapy should be demonstrated in the study (Fig. 1).
Search Strategy and Study Selection
We searched for relevant primary studies in MEDLINE (Ovid, 1966 to August 2010), Pubmed (NLM, specific search for nonindexed references), EMBASE (Ovid, 1980 to August 2010), and the Cochrane Central Register of Controlled Trials (Clinical Trials–Wiley, to August 2010). The search strategy in MEDLINE is described in detail in Table 1. To identify further studies, we also searched reference lists in secondary publications (systematic reviews and health technology assessment reports) identified within the framework of a series of reports on PET prepared by our institute in the last 3 y. In addition, we searched ClinicalTrials.gov and the International Clinical Trials Registry Platform search portal for completed and ongoing RCTs up to March 2011.
Assessment of Primary Studies
Two reviewers independently screened titles and abstracts of the retrieved citations to identify potentially eligible primary and secondary publications. The full texts of these articles were obtained and evaluated independently by 2 reviewers. Primary publications were identified, and subsequently the full set of inclusion and exclusion criteria was applied to identify eligible studies. All documents retrieved from nonbibliographic sources were also screened for eligibility or relevant information on studies. Disagreements were resolved by a third reviewer.
Data Extraction
From each included study, information was extracted on study characteristics, including country, period of recruitment, aim, length of follow-up, sample size, diseases, and indication; characteristics of the study participants; and characteristics of the test and control interventions and indication for the use of PET within the treatment plan. From completed and published studies, information on risk-of-bias items was also extracted. Details of the studies were extracted using standardized tables developed and routinely used by our institute.
Information and data from publications were supplemented by publicly available reports from study registries. The consistency was assessed between the data reported in publications and those reported in additional sources such as study registries.
The individual steps of the data extraction and risk-of-bias assessment procedures were always conducted by one person and checked by another; disagreements were resolved by consensus. The risk of bias in individual studies was assessed by determining the adequacy of the following quality criteria: randomization and allocation concealment, masking of patients, investigators and outcome assessors, handling and reporting of study discontinuations, and application of the intention-to-treat (ITT) principle. Studies for which it was likely that correcting for methodologic problems would have altered the main results and conclusions were classified as having a high risk of bias.
RESULTS
The search in bibliographic databases yielded a total of 3,118 references (after exclusion of duplicates), from which 12 eligible completed and published studies (21 publications) were identified (Fig. 2).
In addition, of 200 unique entries in study registries, 42 eligible but ongoing studies and 2 completed studies already detected in the bibliographic search were identified. In total, 54 ongoing or completed RCTs were found.
Description of Completed and Published Studies
Tables 2 and 3 show the main characteristics of the 12 studies considered. The studies included a total of 1,242 patients (range, 6–232), with ages ranging from 30 to 67 y. In the 10 studies providing data on sex, 32.4% of the participants were women.
Most of the included studies investigated PET in non–small cell lung cancer (NSCLC; 5/12 studies), followed by colorectal cancer and coronary heart disease (2 each). Nine studies were conducted in oncology (7 of these in staging, 1 in restaging, and 1 in diagnosing recurrent disease), 2 in coronary heart disease, and 1 in tinnitus. Four studies (25%) applied an integrated PET/CT device. Four were conducted in The Netherlands and 2 in Canada. The other countries (Denmark, Italy, Australia, France, Taiwan, and Germany) accounted for 1 study each. Recruitment for the earliest study started in 1998; recruitment periods ranged from 1 to 5 y. One study was closed earlier because of insufficient recruitment and a change in diagnostic technology. Eleven studies applied a marker-based strategy design (one of which was a crossover design) and one an enrichment design. No study using a marker-by-treatment-interaction design was identified.
Two of the 12 published studies had been registered (13,14). Eleven studies were publicly funded, and 1 study was partly financed by a grant from a company producing PET tracers.
According to the authors’ conclusions, the results of 4 of the 12 studies were positive, 7 studies were negative, and 1 showed mixed findings (Table 3) .
Risk of Bias
Half of the studies (6/12) showed a low risk of bias (Table 4). Information on allocation concealment was provided in 6 studies. Masking was incomplete in 3 studies. In 10 studies, a sample size calculation was published, and in 11 an ITT analysis was performed. All studies used predefined patient-relevant outcomes. However, in 1 study the primary outcome in the publication had been changed, compared with the primary outcome defined in the registry (13).
Description of Ongoing or Unpublished Studies
Our search in study registries identified 42 ongoing RCTs (Table 5; we also identified the 2 registered published studies, which are not listed here). Most of them refer to Hodgkin disease (11), followed by NSCLC (7), colorectal cancer (5), cervical cancer (3), and head and neck cancer (3). Only 2 studies are investigating nononcology topics (dementia and major depressive disorder). The restaging of cancer is the most frequent indication (19 studies; 11 of which are investigating early assessment of treatment response), followed by staging (7) and planning of radiotherapy (7). The estimated sample sizes range from 30 to 1,600 (median, 300). More than half of the studies apply a marker-based strategy design (24/42), 9 a marker-by-treatment-interaction design, and 9 an enrichment design. Most of the ongoing studies are being conducted in the United States (8), followed by the United Kingdom and France (6 each), Germany (5), and Canada (4). According to the registry data, 6 of these studies should have been completed in 2011, 5 will be completed in 2012, and 6 in 2013.
DISCUSSION
Our systematic review of RCTs on PET and PET/CT identified 12 published and 42 ongoing studies, indicating an increasing number of RCTs in this field. The main fields of research are NSCLC, Hodgkin disease, and colorectal cancer, with only a few studies conducted in nononcologic fields. Half of the published studies showed a low risk of bias.
The aim of this review was to systematically identify and descriptively evaluate the study design and other characteristics of published and ongoing RCTs on PET. It was not the aim to synthesize the results of these studies. For most of the diseases (and indications within diseases), only 1 published RCT was identified, often with a high risk of bias. At this stage, it seems to be too early to draw general conclusions on the clinical benefit of this technology. It is hoped that the current intense research activities identified in our searches for ongoing trials will continue in the next few years. If at least 2 high-quality RCTs with adequate numbers of patients were available in an indication, it would allow robust recommendations on the use of PET in the clinical setting investigated. This may soon be achieved in some indications. For example, 5 RCTs on the staging of NSCLC have already been published, and our group is currently conducting a metaanalysis of these studies.
This review has certain limitations. As we searched only bibliographic databases and trial registries, additional RCTs may have been missed. Further studies might have been identified by systematically searching conference proceedings or by additional hand searches. However, we previously searched conference proceedings within the framework of 7 ongoing reports on PET in different indications and were unable to identify any additional studies in these sources.
To manage the vast amount of literature in this field (60,000 hits without the use of a filter), we applied a filter for RCTs in our searches. Our filter was based on the one developed by Wong et al. (22), which was expanded by a search for the word randomized (or randomised) in the title or abstract. The filter by Wong reached a sensitivity of 93%. Our filter should be even more sensitive. Additionally, searches in ClinicalTrials.gov should have identified more relevant papers. We cannot exclude that relevant RCTs might have been missed with this strategy. However, in our institute’s ongoing PET reports we have used search strategies without this filter for specific diseases; so far, we have not found an RCT that was missed by applying the filter.
Furthermore, the results for ongoing and unpublished studies with a marker-by-treatment-interaction design are somewhat hypothetical. On the basis of registry data, it is difficult to determine whether an interaction is going to be calculated between the PET result and the effect of therapy. Some of the identified studies will probably not calculate such interactions. However, it is also possible that studies applying PET as a predictive marker were not identified, because its use was not documented in the registries.
The results of this review show that the calls in the methodologic literature for RCTs to evaluate the benefit of diagnostic technologies such as PET are not too ambitious and that this type of design is feasible (6,11,16,17). Considering the number of published RCTs on PET, but also the fact that in future approximately 5–7 such RCTs will be published annually, this design could add important information on the patient-relevant benefit of PET to the knowledge on diagnostic and prognostic accuracy.
Depending on the clinical question, disease characteristics (e.g., incidence), and practical and ethical considerations, different RCT designs can be applied (6,17).
Most of the published trials applied a marker-based strategy design and only one an enrichment design. The latter is more efficient, because a smaller number of patients is needed to show a significant difference between the intervention (PET-guided therapy) and the control group (standard therapy). Interestingly, the marker-by-treatment-interaction design has not been identified so far in the published literature on PET. At the moment, we can only speculate on the reasons. For example, this approach might be unfamiliar to researchers in this field, interaction difficulties between different medical professions (e.g., oncologists and radiologists) might exist, or industry might not be interested in evaluating tests that might considerably reduce the number of eligible patients. However, this design seems to be increasingly applied in ongoing trials. It offers a highly valid but at the same time pragmatic solution to the evaluation of diagnostic tests. Because it is at least as valid as the 2 other randomized designs for evaluating the benefit of PET, it might offer an efficient alternative. For example, this design, which has been applied successfully in studies on genetic markers, can be applied as a piggyback strategy within any RCT for new drugs or other treatment interventions. The additional effort required for such a study involves the conduct of the PET examinations, the interpretation of the images, and the statistical analyses. In this context, it should be noted that health care professionals involved in the medical treatment of patients undergoing PET should ideally be masked to the test results. A limitation of the marker-by-treatment-interaction design is that it can be applied only to certain clinical questions. If, for instance, PET is used for radiation planning, the result of the test is necessary for the intervention: in this case, the radiation target region found with PET is different from that found with, for example, CT. Because the PET result is part of the intervention, no interaction can be calculated.
Our review shows that it is important to precisely describe the clinical indication for the use of PET, because in most studies it was unclear whether PET was tested as an additional or alternative diagnostic device. In addition, because patients included in the studies were rarely described exactly, the transferability of existing study results seems to be questionable. A detailed description of the test (tracer, time of fasting, stage of disease, and so on) and the algorithm for defining cutoff points, especially if the PET results are being interpreted qualitatively (e.g., reporting of the number and qualification of radiologists involved and handling of interrater disagreement), are further important points to consider in future studies. Only a few of the published studies reported unclear or equivocal PET or PET/CT results and how they were interpreted.
Our results indicate that ongoing and unpublished studies have on average (˜3 times) higher estimated patient numbers than published studies. These adjusted sample sizes, which will have greater statistical power to detect smaller effects, might be a consequence of the nonsignificant results of some of the published studies. Another explanation could be the noninferiority design of studies investigating the superiority of PET-induced changes in management and concurrent noninferiority for other patient-relevant outcomes such as mortality; this type of design usually requires a higher number of patients.
Because RCTs are not yet mandatory for the approval of nondrug interventions, funding is usually difficult. Only one of the published RCTs identified was (in part) funded by the manufacturer of the diagnostic device. Alternative concepts of funding, which should also involve the manufacturers, and more efficient study designs should be applied to bridge this evidence gap in the near future.
CONCLUSION
In addition to diagnostic and prognostic accuracy studies, RCTs on PET should be conducted to prove the benefit of this technology in terms of patient-relevant outcomes. Although 12 RCTs have already been published and about 5 will be published per year in the future, more high-quality studies are needed to ascertain the benefit of this technology for patients.
DISCLOSURE STATEMENT
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
We thank Ulrike Paschen for her contribution to data extraction and Natalie McGauran for editorial support. No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Jun. 7, 2012.
- © 2012 by the Society of Nuclear Medicine, Inc.
REFERENCES
- Received for publication November 21, 2011.
- Accepted for publication February 27, 2012.