|
|
|||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Clinical Investigation |
1 Clinical Research Center for Blood Diseases, National Hospital Organization Nagoya Medical Center, Nagoya, Japan; and 2 Department of Radiology, National Center for Geriatrics and Gerontology, Obu City, Japan
Correspondence: For correspondence or reprints contact: Teruhiko Terasawa, MD, Institute for Clinical Research and Health Policy Studies, Tufts-New England Medical Center, 750 Washington St., Tufts-NEMC #63, Boston, MA 02111. E-mail: tterasawa{at}tufts-nemc.org
| ABSTRACT |
|---|
|
|
|---|
Key Words: 18F-FDG PET lymphoma response assessment residual disease
| INTRODUCTION |
|---|
|
|
|---|
18F-FDG PET is a promising functional imaging test for patients with malignant lymphoma and other malignancies and has gained wide use during the last decade (4). Its routine use has been recommended to assess the posttherapy response of HD, especially if CT reveals a residual mass (3). Others have also recommended its use for the same purpose on the basis of its high negative predictive value for HD and high positive predictive value for NHL (5). A recent survey of physicians caring for lymphoma patients found that intended management plans were often changed on the basis of 18F-FDG PET results (6).
Recently, a systematic review explored the diagnostic accuracy of 18F-FDG PET for this purpose and assessed the quality of the included studies (7). Similar to studies of diagnostic tests in other medical fields, this review revealed several methodologic problems affecting both the internal and the external validity of the published studies. The authors, however, estimated summary diagnostic accuracy without considering the effect of major methodologic variability on diagnostic tests underlying the original studies. For example, some of the studies included a mix of NHL histologies consisting of indolent, aggressive, and highly aggressive lymphomas, and others included a mix of patients consisting of those who received first-line therapy and those who received salvage therapy. The diagnostic accuracy estimates reported from these studies have limited external validity and cannot be directly applied to specific clinical scenarios; each distinct histologic subtype has its unique clinical profiles, such as treatment strategies, responses, and prognoses. Also, the review did not consider the different ways that the primary studies presented results (e.g., individual patient vs. each lymphoma lesion, all involved sites vs. only bulky disease, or single vs. multiple inclusions of a patient).
This study was an updated systematic review of the diagnostic accuracy of 18F-FDG PET in assessing the response of HD and aggressive NHL after first-line therapy, with special emphasis on the methodologic issues discussed above.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Study Selection
Two of us reviewed the pertinent studies to determine eligibility. We included prospective or retrospective studies evaluating posttherapy response assessment by 18F-FDG PET exclusively for patients with HD or aggressive NHL followed by clinical follow-up, with or without pathologic confirmation as a reference standard. We included studies that evaluated at least 10 patients. We considered aggressive NHL to be mantle cell lymphoma; follicular center lymphoma, grade III; diffuse large B-cell lymphoma (DLBCL); primary mediastinal large B-cell lymphoma; peripheral T-cell lymphoma, unspecified; angioimmunoblastic T-cell lymphoma; angiocentric lymphoma; and anaplastic large cell lymphoma, T- and null-cell type by the Revised European–American Classification of Lymphoid Neoplasms (REAL)/World Health Organization classification, or corresponding subcategories for the International Working Formulation classification, the Kiel classification, or the Rappaport classification (8). We included studies of only those patients who completed first-line chemotherapy, radiotherapy, or combined-modality therapy and underwent conventional imaging tests such as CT, ultrasonography, or MRI for posttreatment restaging just before or after undergoing PET. We focused our analysis on studies that reported individual patients as the unit of analysis irrespective of the number of relapses or of the sites of relapse or residual disease, because this is the most appropriate perspective for clinical decision making. We excluded abstracts, editorials, comments, letters, review articles, and case reports. We excluded studies enrolling patients with HIV-associated or posttransplant lymphoproliferative disorders.
Many studies did not meet all the rigorous inclusion criteria but did partially include a relevant patient population. For these studies, we contacted the authors by mail or email to ask for individual patient data or subgroup data relevant to our inclusion criteria. When there was no response within 3 wk, another correspondence was sent. When there was no response to the third communication attempt, we considered the request rejected.
Data Abstraction
Two independent, board-certified hematologists abstracted relevant data for English-language articles. For non–English-language articles, data were extracted by a single reviewer working with a physician native speaker of the relevant language. We used an abstraction form consisting of items recommended in the Standards for Reporting of Diagnostic Accuracy (9). One nuclear medicine specialist evaluated the technical specifications and quality of the PET procedure by using recommended guidelines (10). The reviewers knew in which journals the studies had been published. Based on the enrollment of participants before PET, we categorized studies into 2 groups: "posttherapy evaluation," or studies that included patients irrespective of restaging results on conventional imaging tests, and "residual mass evaluation," or studies that evaluated only patients for whom visible residual mass lesions were shown on conventional imaging tests. In the posttherapy evaluation, we also abstracted the data on the subgroup of patients who had a residual mass shown on imaging. If the relevant data were unavailable from the published literature, we contacted the authors of the paper to request the subgroup data. Inconsistencies between reviewers were either clarified by the paper authors or resolved by consensus.
Assessment of Study Quality and Applicability
To evaluate the quality and applicability of the studies included in this review, we used an established quality rating system for diagnostic studies (11) and a recently proposed quality evaluation tool (12). In the established system, we examined 6 aspects of study quality: quality and application of the reference standard, independence of test interpretation, description of patient characteristics, cohort assembly, and sample size. We then rated each study as "a" (the highest quality), "b," "c," or "d" (the lowest quality) according to the predefined score. The recently proposed quality evaluation tool, which was designed exclusively for studies of diagnostic accuracy, comprehensively explores both methodologic quality and reporting. The tool consists of 14 items addressing patient spectrum, reference standard, disease progression bias, partial and differential verification bias, test review bias, clinical review bias, incorporation bias, test execution of both index test and reference standard, study withdrawals, and indeterminate test results.
Data Synthesis and Statistical Analysis
For each study, we constructed a 2 x 2 contingency table consisting of true-positive, false-positive, false-negative, and true-negative, where all patients were categorized as being PET-positive or -negative and as being positive or negative for disease according to the results determined through 18F-FDG PET and the reference standard, respectively. We defined as disease-positive a patient who had biopsy-confirmed residual disease or whose disease had relapsed during clinical follow-up. We did not independently combine the sensitivity and specificity of the included studies because this approach does not take into account the interdependence of these 2 test parameters. We instead estimated summary receiver operating characteristic (ROC) curves and elliptic 95% confidence regions of summary sensitivity and specificity by the hierarchical summary ROC method (13). This model is a more sophisticated approach than the conventional linear regression model to compute summary diagnostic measures taking account of variations both within a study and between studies. We fitted the model by using maximum-likelihood estimation implemented in the NLMIXED procedure of SAS/STAT (version 8; SAS Institute) (14). Then, we depicted the summery ROC curves and confidence regions for summary sensitivity and specificity by using Stata (version 8.2.; Stata Corp.) (15). Also, we estimated the area under the curve and the Q* statistic, the point on the curve where sensitivity equals specificity, as global measures for the summary ROC curves. We explored heterogeneity between studies by visual assessment of ROC plots for the following predetermined items: study design (prospective vs. retrospective), rates of patients with residual mass as found on conventional imaging tests, relapse rates, follow-up period, timing of PET scan after completion of therapy, and publication year. We also performed post hoc subgroup analyses for the items used for the technical specifications of PET and quality assessment.
| RESULTS |
|---|
|
|
|---|
|
|
For HD studies, nodular sclerosis was the leading histologic subcategory and either ABVD (doxorubicin, bleomycin, vinblastine, dacarbazine) or MOPP (nitrogen mustard, vincristine, procarbazine, prednisone) plus ABVD or ABV (doxorubicin, bleomycin, vincristine) with or without radiotherapy was the most widely reported first-line treatment (Table 2). In posttherapy evaluations, 35%–72% of patients were found to have a residual mass on conventional imaging (Supplemental Table 2). Relapse rates were similar for both posttherapy evaluations and residual mass evaluations, ranging from 4% to 55% and 0% to 50%, respectively.
For aggressive NHL studies, DLBCL was the leading histologic subcategory, and the most widely adopted first-line therapy was CHOP (cyclophosphamide, doxorubicin, vincristine, prednisone) or comparable doxorubicin-containing regimens with or without radiotherapy (Table 2). In posttherapy evaluations, 26%–56% of patients were found to have a residual mass on conventional imaging (Supplemental Table 2). Relapse rates were 33%–60% for posttherapy evaluations and 33%–67% for residual mass evaluations.
Concerning imaging techniques and technologies, although their reporting was limited, the included studies generally followed the guidelines of the Society of Nuclear Medicine for performing 18F-FDG PET (Supplemental Table 3). Only a single study used PET/CT (30). Most studies adopted qualitative diagnostic criteria: foci of elevated 18F-FDG uptake unexplained by physiologic uptake. Two studies (32,34) also adopted quantitative diagnostic criteria: standardized uptake values (the ratio of 18F-FDG uptake in tumor sites to that in normal sites). Only 2 studies clearly reported that all the included participants underwent pretherapy PET (17,23). Generally, experienced nuclear medicine physicians interpreted the results.
Sensitivity, Specificity, and Summary ROC Curves
HD studies reported widely ranging sensitivities and specificities for 18F-FDG PET. For posttherapy evaluations, sensitivity ranged from 0.50 to 1.00 and specificity ranged from 0.67 to 1.00 (Supplemental Table 2; Fig. 1). For residual mass evaluations, reported estimates had a similarly wide range: 0.43–1.00 for sensitivity and 0.67–1.00 for specificity. The summary ROC curves and confidence regions for summary sensitivity and specificity for posttherapy evaluations and residual mass evaluations were similar (Fig. 2): The area under the curve for the summary ROC curve was 0.94 for posttherapy evaluations and 0.93 for residual mass evaluations, and the Q* statistic was 0.88 for posttherapy evaluations and 0.86 for residual mass evaluations.
|
|
Investigating Heterogeneity
We did not identify any clinical or 18F-FDG PET test characteristics, or any items that assessed the quality and applicability of each study, to explain the heterogeneity of sensitivity and specificity (data not shown).
Quality Assessment of Published Studies
Overall, the quality and reporting of the included studies were limited (Supplemental Table 4), suggesting that they are subject to bias and variation limiting the internal and external validities, respectively, of the test results. The detailed results of the quality assessment can be found in the Supplemental Appendix.
| DISCUSSION |
|---|
|
|
|---|
Many potential factors may explain the heterogeneity. For test characteristics, differences in the type of PET scanner, in the timing of PET after the completion of therapy, in positive test criteria, and in the clinical experience of the interpreters are relevant. For patient characteristics, several differences should be considered: in the type of histology, especially for aggressive NHL (e.g., DLBCL vs. other aggressive NHLs); in therapeutic strategy (e.g., chemotherapy vs. combined-modality therapy); in the presence or absence of a visible residual mass on conventional imaging tests; and in relapse risk groups, such as the international prognostic score for advanced-stage HD (36) or the international prognostic index for aggressive NHL (8). For study characteristics, differences such as method of patient selection (e.g., prospective vs. retrospective), type and application of a reference standard, and patient follow-up have been reported to affect the variability of sensitivity and specificity (9,37). Because the available data were limited, we could not identify specific factors that explain the heterogeneity, which should be further addressed in future studies. Although the reported positive PET criteria appeared almost identical in all HD studies, the large crescent shape of the confidence regions implies a negative correlation between sensitivity and specificity across the included studies, suggesting that a variation in threshold may partially explain the between-study heterogeneity.
Our study had several important limitations. Because we selected only those studies for which pertinent data were available, several important investigations (some of which were included in the previous metaanalysis) may have been excluded. Also, because most data were derived from retrospective studies with poor-quality design and reporting, our conclusions are subject to the bias and variations in the original studies (37). In addition, our review included only a single study in which patients with DLBCL received rituximab in addition to CHOP (22). Because the combination of rituximab with CHOP is a current standard therapy (38), our results may be less applicable to clinical practice. Further, we included only a single study (30) that used a relatively new and promising additional technique—PET/CT—that may overcome the current technical limitations of PET (39).
In the recently revised consensus recommendations, 18F-FDG PET has become an important component of posttherapy response assessments in clinical trials for HD and DLBCL (40). Although the currently available data have limitations, our systematic review would probably support the clinical relevance of the response criteria for high-risk HD; incomplete response defined by positive PET findings would have an excellent ability to predict relapse irrespective of whether a residual tumor mass is found on conventional imaging. For favorable-risk HD, patients labeled as incomplete responders after first-line therapy should still have a moderate possibility of long-term remission. Thus, clinical investigators adopting the recommendations into therapeutic efficacy trials would need to decide how to manage patients in this category before implementing the trials. For DLBCL and other aggressive NHLs, our results based on the limited clinical evidence would not suffice to support the criteria, and further research is necessary to validate them.
Our review shows that currently available data do not suffice to answer the phase 3 question of diagnostic accuracy studies: What is the diagnostic accuracy of 18F-FDG PET for posttherapy response assessment of malignant lymphoma? (41). Reliable clinical evidence is especially limited for aggressive NHL. Further investigation should include prospective diagnostic accuracy studies (phase 3) of PET or PET/CT that adopt a more rigorous research methodology. We propose that, ideally, diagnostic accuracy studies should accompany prospective clinical trials to answer efficacy questions. Data on additional therapy, such as involved-field radiotherapy or high-dose chemotherapy with autotransplantation, for posttherapy PET-positive patients are limited. Before the routine clinical implementation of a treatment strategy based on posttherapy PET findings, randomized studies should assess the impact on patients' clinical outcomes, if appropriate (phase 4) (41). Also, determination of the cost-effectiveness of treatment strategies adopting PET is necessary to allow a better understanding of the role of posttherapy PET.
| CONCLUSION |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
| References |
|---|
|
|
|---|
Related articles in JNM:
This article has been cited by other articles:
![]() |
H. Mocikova, P. Obrtlikova, B. Vackova, and M. Trneny Positron emission tomography at the end of first-line therapy and during follow-up in patients with Hodgkin lymphoma: a retrospective study Ann. Onc., November 9, 2009; (2009) mdp522v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Terasawa, J. Lau, S. Bardet, O. Couturier, T. Hotta, M. Hutchings, T. Nihashi, and H. Nagai Fluorine-18-Fluorodeoxyglucose Positron Emission Tomography for Interim Response Assessment of Advanced-Stage Hodgkin's Lymphoma and Diffuse Large B-Cell Lymphoma: A Systematic Review J. Clin. Oncol., April 10, 2009; 27(11): 1906 - 1914. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Itti, C. Lin, J. Dupuis, G. Paone, D. Capacchione, A. Rahmouni, C. Haioun, and M. Meignan Prognostic Value of Interim 18F-FDG PET in Patients with Diffuse Large B-Cell Lymphoma: SUV-Based Assessment at 4 Cycles of Chemotherapy J. Nucl. Med., April 1, 2009; 50(4): 527 - 533. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Dupuis, E. Itti, A. Rahmouni, F. Hemery, C. Gisselbrecht, C. Lin, C. Copie-Bergman, K. Belhadj, T. El Gnaoui, I. Gaillard, et al. Response assessment after an inductive CHOP or CHOP-like regimen with or without rituximab in 103 patients with diffuse large B-cell lymphoma: integrating 18fluorodeoxyglucose positron emission tomography to the International Workshop Criteria Ann. Onc., March 1, 2009; 20(3): 503 - 507. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. W.M. Johnson and A. J. Davies Primary Mediastinal B-Cell Lymphoma Hematology, January 1, 2008; 2008(1): 349 - 358. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | RSS | TABLE OF CONTENTS |
| JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY | THE JOURNAL OF NUCLEAR MEDICINE |