In this issue of The Journal of Nuclear Medicine, Cottereau et al. (1) retrospectively evaluate data compiled from a cohort of 106 peripheral T cell lymphoma patients, 50% of whom were previously enrolled in Lymphoma Study Association (LYSA) studies coordinated by 5 French and Belgian centers between 2006 and 2014. The objective was to determine the prognostic value of baseline total metabolic tumor volume (TMTV) calculated with adaptive thresholding methods as compared with TMTV measured with a fixed threshold. The methods used to calculate TMTV included the Daisne method, based on the tumor-to-background ratios to segment tumor volumes, and the Nestle method, which compares the intensity of the tumor to that of the background. Despite substantial differences in cutoff values across different TMTV computing methods (±31%), the authors reported excellent intraclass correlation coefficients (from 0.97 to 0.98) for discriminating low versus high TMTV.
Tumor burden has long been considered an important prognostic marker in several lymphoma subtypes, hence its surrogatesSee page 276
such as bulky lesions, stage III–IV, or extranodal disease have been incorporated in several prognostic models including the IPI (international prognostic index), IPS (international prognostic score for Hodgkin lymphoma [HL]), FLIPI 1 and 2 (prognostic score for follicular lymphoma), MIPI (prognostic score for mantle cell lymphoma), PIT (prognostic index for peripheral T cell lymphoma), and AIP (prognostic model for angioimmunoblastic lymphoma). These prognostic models, however, are not considered sufficient to accurately stratify disease risk categories across patient populations. With the advent of advanced imaging techniques such as 18F-FDG PET/CT and the availability of software offering sophisticated computer algorithms necessary for accurate tumor segmentation to calculate the sum of voxels of the tumor bulk, it became possible to quantify the functional activity and the total tumor burden.
Earlier 18F-FDG PET/CT studies found that a high metabolic tumor volume (MTV) was independently associated with progression-free survival (PFS) and overall survival (OS) in HL patients treated with standard ABVD (doxorubicine, bleomycine, vinblastine, and dacarbazine), with or without involved-field radiotherappy (2,3), suggesting that pretherapy MTV as a measure of metabolically active whole-body tumor bulk was a predictor of outcome when the conventionally described tumor burden did not reach a significance. However, there is also evidence for contrasting results, showing that baseline MTV could not predict survival when IPS did, whereas percentage change (Δ) in both MTV and SUVmax at interim PET was associated with PFS and OS (4). It is difficult to generalize these divergent results on the basis of retrospective analyses associated with no statistical design to determine a sufficiently high number of patient cohorts, inherent risk of bias for population selection, treatment protocols, and segmentation methodologies resulting in various MTV cutoffs. These limitations raise significant concerns for the internal validity of these published results.
In aggressive B-cell lymphoma (DLBCL), multiple retrospective studies investigated pretreatment PET-derived volumetrics as a potential predictor of survival in patients undergoing R-CHOP therapy (5–9). A systematic review of 7 retrospective DLBCL studies (n = 703) suggested that both SUVmax and MTV were significant prognostic factors for PFS (P = 0.038 and 0.000, respectively) (5). For OS, only high MTV was a strong predictor of poor prognosis (P = 0.000). When Ann Arbor staging did not predict survival, higher MTV was associated with a significantly inferior event-free survival or PFS compared with the lower-MTV group, independent of the IPI (4,5). Similar to HL data, the combination of baseline MTV and interim PET performed after 2 cycles of chemotherapy findings improved the predictive value (8). There are, however, contradictory reports (7–9), one of which showed that the baseline SUVmax was a better predictor of event-free survival (P = 0.0002) than MTV and total lesion glycolysis (TLG) (7), and only the IPI score 3 was significantly associated with poor outcome. In the other study, the National Comprehensive Cancer Network IPI was the only significant predictor of PFS (P = 0.024), whereas both National Comprehensive Cancer Network IPI and MTV were significant predictors of OS (P = 0.039 and 0.043, respectively) (9). More recently, in a prospective cohort of 103 primary mediastinal (thymic) large B-cell lymphoma patients enrolled in the International Extranodal Lymphoma Study Group trial IELSG-26, who received combination chemoimmunotherapy, Ceriani et al. showed that only TLG retained statistical significance for both OS (P = 0.001) and PFS (P < 0.001) (11).
These summarized studies used different segmentation techniques varying from fixed-threshold methods based on absolute SUVmax (8) or percentage thresholding using 25% (11), 37% (9), 40% (10), 41% (1), or 42% of the SUVmax (7). This methodologic variability resulted in widely disparate cutoff values ranging from 11 to 30 for SUVmax, from 130 to 550 mL for MTV, and from 415 to 2,955 for TLG in the prediction survival in various lymphoma subtypes. Moreover, these studies did not uniformly compare the prognostic values of SUVmax and conventional prognostic factors with MTV or TLG in a systematic fashion. Overall, the variability in methodology, the lack of demonstration of comparative superiority over the conventional prognosticators, and the use of MTV computing methods in noncontrolled heterogeneous populations have generated skepticism about the internal and external validity of these quantitative parameters as independent prognostic markers. In general, gradient-based methods that factor in the background activity are considered a more accurate tumor volume segmentation method than fixed thresholding methods (12). On the other hand, the importance of harmonization and cross-calibration across scanners in the multicenter studies should also be stressed. Another caveat is that all quantitative imaging surrogates should be prospectively validated before consideration for any preclinical or clinical application. Consequently, although there is suggestion toward a potential for PET-derived quantitative volumetrics to better prognosticate disease, one should realize that the previously published studies were not optimally designed to discriminate between risk groups to individualize DLBCL treatment. The shortcomings of the study by Cottereau et al. largely follows the deficiencies of the prior studies published on this topic: retrospective design, no statistical sample analysis to attain the objective with a meaningful margin, patients being treated with different regimens, the use of different generation PET cameras, and noncross calibration of the cameras across centers, all of which might have potentially affected the calculation of SUV and TMTV. Last, propagation of a systematic error cannot be ruled out with the results of this study, again bringing the focus on the internal validity of the results.
Briefly, some preliminary conclusions could be derived from the published literature: in the entire series of reported cases, baseline TMTV or TLG was suggested to be a relevant prognostic factor; in a given lymphoma subtype such as HL or DLBCL, significantly varying cutoff values have been reported; and the lack of technical information on methods to standardize the quality of the PET images as well as the absence of stringent methodologic procedures to ensure the reproducibility of the results potentially undermines the robustness of the prognostic information. Essentially, a shared unit of optimal measure, a platinum-standard metric, proving accurate and comparable results quantitatively still remains to be developed (5). The reported results for TMTV computing and the variability of MTV cutoff values to predict treatment outcome point toward a prognostic role of MTV as a continuous instead of a dichotomous variable to identify patient outcome. The same conclusions could be drawn from the current data presented here by Cottereau et al.: despite a variability of cutoff values, higher than 30%, all the models showed an equivalent predictive value with an exceptionally high concordance rate across different methods. However, although prognostically relevant, the results generated by this retrospective analysis remain of limited utility in a clinical setting. Hence, the predictive superiority of MTV or TLG over SUVmax for survival is yet to be proven in large-scale prospective, multicenter, well-designed studies. Only a validated cutoff value would be useful in clinical practice to modulate the intensity of treatment for patients showing significantly different risks of treatment failure.
Footnotes
Published online Nov. 10, 2016.
- © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication October 18, 2016.
- Accepted for publication October 24, 2016.