Visual Abstract
Abstract
The GAINED phase 3 trial (ClinicalTrials.gov identifier: NCT01659099) evaluated a PET-driven consolidative strategy in patients with diffuse large B-cell lymphoma. In this post hoc analysis, we aimed to compare the prognostic value of the per-protocol PET interpretation criteria (Menton 2011 consensus) with the change in the SUVmax (ΔSUVmax) alone. Methods: Real-time central review of 18F-FDG PET/CT was performed in 581 patients after 2 cycles (PET2) and 4 cycles (PET4) of immunochemotherapy using the Menton 2011 criteria, combining the ΔSUVmax (cutoffs of 66% and 70% at PET2 and PET4, respectively) and the Deauville scale. In “special cases,” when the baseline SUVmax was less than 10.0 or the interim residual tumor SUVmax was greater than 5.0, the Menton 2011 experts’ consensus agreed that the ΔSUVmax may not be reliable and that the Deauville score is preferable. Prognostic values of Menton 2011 and ΔSUVmax were evaluated by Kaplan–Meier analyses in terms of progression-free survival (PFS). Results: Seventeen percent of patients at PET2 (100/581) and 8% at PET4 (49/581) had PET-negative results by ΔSUVmax but were considered to have PET-positive results according to Menton 2011 with residual SUVmax of greater than 5.0. For the population with PET2-positive results, 2-y PFS was 70% (range, 58%–80%) with ΔSUVmax alone, whereas the outcome tended to be better for those who were considered to have PET-positive results by Menton 2011, 81% (range, 72%–87%). Conversely, all 10 patients with baseline SUVmax of less than 10.0 had PET2-positive results by ΔSUVmax but were considered to have PET2-negative results by Menton 2011. These patients had the same 2-y PFS as patients with PET2-negative/PET4-negative results, indicating that the ΔSUVmax yielded false-positive results in this situation. Conclusion: We recommend the use of the ΔSUVmax alone rather than the Menton 2011 criteria for assessing the interim metabolic response in patients with diffuse large B-cell lymphoma, except when the baseline SUVmax is less than 10.0.
Evaluation of the metabolic response in lymphoma by 18F-FDG PET relies on the Deauville 5-point scale (1). For diffuse large B-cell lymphoma (DLBCL), several independent studies have demonstrated that semiquantitative assessment of the early response by computation of the change in the SUVmax (ΔSUVmax) between baseline and interim PET scans is more reproducible than visual assessment and allows the reduction of false-positive interpretation in patients with minimal residual uptake (2–8). The best cutoff for distinguishing good from bad responders is greater than 66% reduction of the SUVmax after 2 cycles of chemotherapy and greater than 70%–92% after 4 cycles. However, these cutoffs have been defined from retrospective studies. In some instances when the baseline SUVmax is relatively low (<10.0) or the interim residual tumor SUVmax is relatively high (>5.0), experts agree that ΔSUVmax may not be appropriate for evaluating the response and that the Deauville 5-point scale is preferable (9). This approach, referred to as the Menton 2011 consensus, was recently used in a large prospective trial (10).
The GAINED (GA In NEwly diagnosed Diffuse large B-cell lymphoma) randomized phase 3 trial (ClinicalTrials.gov identifier: NCT01659099), comparing obinutuzumab (GA101) and rituximab in association with chemotherapy as an induction treatment followed by a PET-driven consolidative strategy (10), was recently conducted by the Lymphoma Study Association (Fig. 1). Patients underwent 18F-FDG PET/CT at baseline (PET0), after 2 cycles of immunochemotherapy (PET2), and after 4 cycles of immunochemotherapy (PET4). Induction treatment was based on 14 d of anthracycline-containing chemotherapy in association with either rituximab or GA101. The consolidation treatment was driven by an early metabolic response at PET2 and PET4 using the Menton 2011 criteria (9): patients were assigned to receive standard immunochemotherapy, high-dose therapy followed by autologous stem cell transplantation (ASCT), or salvage therapy (Fig. 1). The results of this trial demonstrated that obinutuzumab was not superior to rituximab in transplant-eligible patients and that PET-driven treatment escalation using ASCT enabled patients with PET2-positive (PET2+)/PET4-negative (PET4−) results to achieve similar outcomes as patients with PET2-negative (PET2−)/PET4− results (10).
Study design and flowchart of PET central review. Patients were randomized to receive induction treatment with either rituximab or obinutuzumab (GA101) immunochemotherapy. Consolidation treatment was driven by PET metabolic response at PET2 and PET4 using Menton 2011 criteria. aaIPI = age-adjusted International Prognostic Index.
Considering these results, we conducted a post hoc analysis to compare the prognostic value of the per-protocol Menton 2011 criteria with the ΔSUVmax alone and to make recommendations for quantitative evaluation of the metabolic response by 18F-FDG PET/CT.
MATERIALS AND METHODS
Imaging Review Process
Imaging was performed between September 2013 and December 2015. Scans were interpreted by the local investigator using on-site image viewers and remotely by independent reviewers from a panel of 10 available readers at 3 French institutions located in Créteil, Dijon, and Nantes. An online central review platform was set up for this purpose (11), and all readers were automatically notified by e-mail each time a scan was uploaded on the platform (Imagys; Keosys). Images were automatically deidentified and controlled by a research engineer. All PET/CT datasets that did not pass the quality control guidelines (incomplete datasets, calibration errors preventing SUVmax calculation, or attenuation maps generated from contrast-enhanced CT) were omitted and not reviewed.
At baseline (PET0), in addition to the local interpretation, 2 independent reviewers were asked to identify the tumor with the most intense uptake using a maximum-intensity projection display with a graded color scale, with red indicating the SUVmax (2,7); in case of discrepancy among these 3 interpretations, with regard to either the location or the SUVmax of the target lesion (≥10% difference between the lowest and the highest measured values), a choice among the interpretations available on the platform was made by an adjudicator. At interim evaluations, in addition to the local interpretation, 2 independent reviewers were asked to measure the whole-body SUVmax on the most intense tumor lesion if residual uptake was present, even though its location differed from that at PET0; if no lesions were identifiable, then the SUVmax was measured in the area of the initially most active tumor at PET0 or, alternatively, was set to 1.0. At all time points, mediastinal blood-pool SUVmax and liver SUVmax were also measured to ensure quality control (12).
Determination of Metabolic Response
The metabolic response using the Menton 2011 consensus criteria was determined by majority agreement of the 3 readings. ΔSUVmax cutoffs of 66% and 70% were used at PET2 and PET4, respectively (4,13). In “special cases,” when the baseline SUVmax was less than 10.0 or the interim residual tumor SUVmax was greater than 5.0, the Deauville 5-point scale was used (9). This interpretation was used per protocol to allocate patients to the different consolidation arms (continuing immunochemotherapy, high-dose chemotherapy plus ASCT, or salvage therapy). In a post hoc analysis, the ΔSUVmax alone was used to evaluate the metabolic response and classify patients as responders or nonresponders, regardless of special cases and without using the Deauville score.
Statistical Analysis
Concordance between local reading and central review was assessed using Cohen κ-statistics. Progression-free survival (PFS) according to PET2 and PET4 interpretations was estimated using the Kaplan–Meier method and compared using the log-rank test. A P value of less than 0.05 indicated significance. Statistical analyses were performed by the statistical department of the Lymphoma Study Association using SAS 9.3 software (SAS Institute Inc.).
RESULTS
Population Characteristics
Among the 670 patients included in GAINED, the scans for 636 at PET0, 604 at PET2, and 586 at PET4 were centrally reviewed (Fig. 1). Locally interpreted scans failing quality control were not reviewed. When the quality control of a PET0 examination failed, interpretation of the interim PET scans was not possible on the platform. Therefore, 581 patients (87%) had an interpretable baseline PET scan and centrally reviewed PET2 and PET4 scans. Patient characteristics are presented in Table 1. For the remaining 89 patients, treatment allocation relied on the local interpretation.
Population Characteristics of GAINED Study Participants and Patients Who Had PET Central Review
Metabolic Response by Different Interpretation Criteria
The median delay between image upload and final interpretation of PET0 scans was 6 d (range, 0–47 d). The median PET0 SUVmax was 23.4 (range, 1.9–100.7) by local reading and 24.1 (range, 3.0–78.4) by central reviewing. An adjudication consecutive to an SUVmax discrepancy of greater than or equal to 10% was required in 131 of the 636 PET0 scans (21%); in 97% of them (n = 127), the adjudicator chose the SUVmax of 1 of the central reviewers. The median number of days between image upload and final interpretation of PET2 or PET4 scans was 0 d (range, 0–44 d; interquartile range, 0–4 d). For 92% of interim PET scans, interpretation of the first reviewer was concordant with the local interpretation and sufficient to draw a conclusion, whereas a second review was necessary for 8%. Finally, the local interpretation was concordant with the final result in 95% of cases, whereas it differed from the conclusion of the reviewers in 5% of cases.
Table 2 summarizes the per-protocol and post hoc PET interpretations for the 581 patients whose PET2 and PET4 scans were both centrally reviewed. Supplemental Table 1 details the interpretations of all scans uploaded on the online platform, including those for patients who had only PET2 or PET4 scans centrally reviewed but not both. PET2 scan results were positive for 28% of patients (n = 164). In 81% of cases (n = 471), a PET2 scan result was directly given by the ΔSUVmax, whereas in 19% (n = 110), a special case was identified and interpretation relied on the Deauville score. In all instances, the Menton 2011 interpretation changed the final conclusion. In 100 cases, a PET2 scan result that was negative by ΔSUVmax was considered positive (Deauville score of ≥4) because the residual SUVmax was greater than 5.0, whereas in 10 patients, a PET2 scan result that was positive by ΔSUVmax was considered negative (Deauville score of ≤3) because the PET0 SUVmax was less than 10.0. PET4 scan results were positive in 16% of patients (n = 93). In 90% of cases (n = 524), a PET4 scan result was directly given by the ΔSUVmax, whereas in 10% (n = 57), a special case was identified. Again, in all instances, the Menton 2011 interpretation changed the final conclusion. In 49 cases, a PET4 scan result that was negative by ΔSUVmax was considered positive, whereas in 8 patients, a PET4 scan result that was positive by ΔSUVmax was considered negative. When the ΔSUVmax alone was used, PET2 and PET4 scan results were positive for only 13% (n = 73) and 9% (n = 54) of patients, respectively, instead of 28% (n = 164) and 16% (n = 93) when the Menton 2011 interpretation was used, respectively.
PET Results According to Central Review Using Per-Protocol Menton 2011 Criteria and Post Hoc Analysis with ΔSUVmax Alone
The Cohen κ-value between local reading and central reviewing was 0.80 at PET2, reaching 0.84 between the 2 central reviewers; at PET4, these values were quite similar, 0.81 and 0.78, respectively. The median relative SUVmax difference between the 2 central reviewers was 0% (range, 0%–333%; interquartile range, 0%–13%), and the locations of the target tumor were identical in 70% of cases. Identical PET interpretations (PET positive or PET negative) were reached between central reviewers in 97% and 94% of cases when ΔSUVmax and Menton 2011 were used, respectively. In contrast, the median relative SUVmax difference between the local investigator and the central review was 3% (range, 0%–5,190%; interquartile range, 0.7%–74%), and the locations of the target tumor were identical in 66% of cases. Interestingly, the ΔSUVmax led to the same conclusion (PET positive or PET negative) between the local investigator and the central reviewer in 97% of cases, whereas this result occurred in only 32% of special cases when Menton 2011 was used.
Survival Analyses
Kaplan–Meier estimates of PFS using the 2 methods of PET interpretation are presented in Figure 2. They were calculated for 581 patients whose PET2 and PET4 scans were both centrally reviewed and for whom 110 events occurred (disease progression). When per-protocol Menton 2011 criteria were used, the 2-y PFS estimates were 90% (range, 87%–93%) in patients with PET2−/PET4− scan results, 84% (range, 74%–90%) in patients with PET2+/PET4− scan results, and 62% (range, 51%–71%) in patients with PET4+ scan results (P < 0.0001). When the ΔSUVmax alone was used, the 2-y PFS estimates were 88% (range, 85%–90%), 77% (range, 60%–88%), and 60% (range, 46%–72%), respectively (P < 0.0001).
Kaplan–Meier estimates of progression-free survival according to metabolic response at PET2 and PET4 using Menton 2011 criteria (per protocol) (A) and ΔSUVmax alone (post hoc analysis) (B).
Focus on Special Cases of Menton 2011 Criteria
As stated earlier, 17% of patients at PET2 (100/581) and 8% at PET4 (49/581) had PET scan results that were negative by ΔSUVmax but that were considered positive by Menton 2011 (residual tumor uptake with SUVmax of >5.0). Interestingly, when we focused on survival analyses for the population with PET2+ scan results, the 2-y PFS estimate was 70% (range, 58%–80%) in patients with PET scan results that were positive by ΔSUVmax, whereas the outcome tended to be better for those with PET scan results that were considered positive by Menton 2011, 81% (range, 72%–87%) (P = 0.099) (Fig. 3A). This finding reached significance at PET4, with 2-y PFS estimates of 51% (range, 36%–65%) and 71% (range, 57%–82%), respectively (P = 0.043) (Fig. 3B), suggesting that Menton 2011 yields false-positive results when residual tumor SUVmax is greater than 5.0. On the contrary, 10 patients at PET2 and 8 patients at PET4 with SUVmax of less than 10.0 at baseline had PET scan results that were positive by ΔSUVmax but that were considered PET negative by Menton 2011. These specific 10 patients had the same 2-y PFS estimates as patients with PET2−/PET4− scan results, that is, 90% (range, 47%–99%) versus 90% (range, 87%–93%), respectively, indicating that ΔSUVmax yields false-positive results when the baseline SUVmax is less than 10.0.
Kaplan–Meier estimates of progression-free survival in subgroup of patients with PET-positive scan results at PET2 (A) and PET4 (B) according to interpretation criteria: Menton 2011 (i.e., “special cases” consecutive to interim SUVmax of >5.0) or ΔSUVmax alone.
DISCUSSION
The GAINED study demonstrates that interim PET/CT can identify slow metabolic-responders (PET2+/PET4−) who gain benefit from a therapy escalation. Menton 2011 criteria are quite reproducible, with Cohen κ-values ranging from 0.78 (substantial agreement) to 0.84 (almost perfect agreement) between local and central readers as well as between the 2 independent central reviewers (14). Local interpretation was concordant with the final conclusion of the central review in 95% of cases, which suggests that Menton 2011 criteria can be used in routine practice. However, accurate determination of the baseline SUVmax (most hypermetabolic target) may be problematic, as a SUVmax discrepancy of greater than or equal to 10% was noted in 21% of PET0 interpretations. In these situations, an adjudication was performed, and the adjudicator chose the same SUVmax as a central reviewer in 97% of cases, whereas he agreed with the local reader in only 3%. This finding emphasizes that SUVmax identification on baseline PET is more reliable when using a dedicated software and strict rules to identify the target lesion (maximum intensity projection display, use of a graded color scale and a spheric volume of interest), as previously proposed (2,7).
When comparing ΔSUVmax alone with Menton 2011 criteria for the assessment of interim metabolic response, we emphasize that ΔSUVmax alone could be preferentially used in DLBCL patients when a PET-driven escalation strategy is planned. Indeed, ΔSUVmax led to the same conclusion (PET positive or PET negative) between the local investigator and the central review in 97% of cases, whereas this finding occurred in only 32% in the special cases of Menton 2011 interpretations, pointing out the variability of visual interpretation. When evaluating outcomes (Fig. 3), we found that patients with PET scan results that were positive by ΔSUVmax alone tended to have a lower PFS than patients with PET scan results that were negative by ΔSUVmax but that were considered positive by Menton 2011 (interim SUVmax of >5.0). This finding may have affected the PET-driven escalation scheme, as some PET2+ scan results were intensified with ASCT, although the patients would have received standard consolidation if the ΔSUVmax alone had been used. Although it is possible that ASCT intensification was beneficial for a small proportion of these patients, it may have also slightly overestimated the beneficial prognostic impact of ASCT in patients with PET2+/PET4− scan results.
In a retrospective analysis of 189 DLBCL patients homogenously treated with rituximab + cyclophosphamide, doxorubicin, vincristine, prednisone, Mikhaeel et al. demonstrated that patients with a ΔSUVmax of less than 66% after 2 rituximab + cyclophosphamide, doxorubicin, vincristine, prednisone treatments had worse PFS than those with ΔSUVmax of greater than or equal to 66%, whereas the Deauville criteria (1, 2, 3 vs. 4, 5 and 1, 2, 3, 4 vs. 5) were not predictive (15). Recently, Michaud et al. demonstrated in a 166 DLBCL patients who underwent a PET-driven escalation strategy, that the combination of ΔSUVmax and Deauville criteria after 4 cycles of chemotherapy (according to the recommendation of the Menton 2011 consensus) could improve risk stratification for patients with extremely poor prognosis, compared with the Deauville classification alone (16). However, they did not analyze the prognostic value of the ΔSUVmax alone, which in our series of 581 patients, suggests an even better prognostic value.
With regard to the special cases consecutive to a baseline SUVmax of less than 10.0, 10 patients at PET2 (and 8 patients at PET4) had PET scan results that were positive by ΔSUVmax but that were considered PET negative by the Menton 2011 interpretation (Deauville score of ≤3). These specific patients had the same 2-y PFS estimates as patients with PET2−/PET4− scan results, suggesting that ΔSUVmax calculation is prone to generate false-positive results when baseline SUVmax is less than 10.0. As of today, only the PETAL trial (17) has relied on the ΔSUVmax alone to evaluate the interim metabolic response. However, the PETAL study design was different to ours and the benefits of the PET-driven strategy in terms of PFS was not comparable with the benefits seen in the GAINED trial, not due to interim PET issues, but to an inappropriate experimental arm in terms of tumor control.
Our study has several limitations. Because the GAINED trial was a PET-driven escalation strategy in which all patients with PET2+/PET4− scan results received ASCT intensification, it remains difficult to definitively evaluate the prognostic impact of the ΔSUVmax alone. In addition, semiquantitative data provided here are available for the generation of PET/CT scanners between 2013 and 2015. Newer generations of digital detectors with improved sensitivity/spatial resolution and image reconstruction methods using resolution or point spread function (PSF) modeling or statistical recovery of partial-volume effects to improve SUV calculation (18), typically result in 2-fold-higher values than with older scanners. The special cases, in which PET results that were negative by ΔSUVmax are considered PET positive because residual uptake is greater than 5.0 will probably be more frequent in the future. To overcome this bias, the current recommendation is to perform an EANM Research Ltd.–compliant reconstruction to interpret interim PET/CT, either visually or semiquantitatively (19).
CONCLUSION
Assessment of interim metabolic response by −Menton 2011 criteria is quite reproducible and translatable to routine practice. However, we recommend the use of the ΔSUVmax alone for interim PET evaluation in DLBCL as many patients with PET-negative scan results and interim SUVmax of greater than 5.0 are considered to have PET-positive scan results when Menton 2011 is used. These special cases demonstrate similar or better outcome when using the ΔSUVmax alone and better agreement between local and central readers. The only situation where ΔSUVmax should be interpreted with caution is when baseline SUVmax is less than 10.0.
DISCLOSURE
This work was supported in part by grants from the French National Agency for Research “France 2030 Investment Plan,” Labex IRON (ANR-11-LABX-18-01), and INCa-DGOS-Inserm_12558 (SIRIC ILIAD). Steven Le Gouill reports grants, personal fees, or nonfinancial support from Roche Genentech during the conduct of the study; personal fees from Celgene; and grants and personal fees from Janssen-Cilag, GILEAD/kite, and Servier outside the submitted work. René-Olivier Casasnovas reports grants, personal fees, and nonfinancial support from Roche Genentech during the conduct of the study; personal fees from MSD, BMS, Abbvie, Amgen, Celgene, Janssen, and Astra Zeneca; and grants and personal fees from Takeda and GILEAD/kite outside the submitted work. No other potential conflict of interest relevant to this article was reported.
KEY POINTS
QUESTION: Is the prognostic value of the ΔSUVmax alone as efficient as the use of both ΔSUVmax and the Deauville score for early metabolic response evaluation in DLBCL?
PERTINENT FINDINGS: ΔSUVmax is reproducible and can identify early slow metabolic responders for whom a therapy escalation scheme can be proposed. The only situation in which the Deauville score is preferable for evaluating the interim metabolic response is when the baseline SUVmax is less than 10.0.
IMPLICATIONS FOR PATIENT CARE: We recommend the use of the ΔSUVmax alone in routine practice for interim PET evaluation of the therapeutic response in DLBCL to identify patients for whom therapy escalation can be proposed, according to the GAINED trial.
ACKNOWLEDGMENTS
We thank Romain Ricci, Flavie Corbin, and Bastien Lesne for their support with data analysis and the associate central readers Thomas Eugène, Axel Van Der Gucht, and Myriam Sasanelli.
Footnotes
↵† Deceased.
Published online Sep. 21, 2023.
- © 2023 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication April 14, 2023.
- Revision received August 18, 2023.