Abstract
The value of interim 18F-FDG PET/CT (iPET)–guided treatment decisions in patients with diffuse large B-cell lymphoma (DLBCL) has been the subject of much debate. This investigation focuses on a comparison of the Deauville score and the change-in-SUVmax (ΔSUVmax) approach—2 methods to assess early metabolic response to standard chemotherapy in DLBCL. Methods: Of 609 DLBCL patients participating in the PET-Guided Therapy of Aggressive Non-Hodgkin Lymphomas trial, iPET scans of 596 patients originally evaluated using the ΔSUVmax method were available for post hoc assessment of the Deauville score. A commonly used definition of an unfavorable iPET result according to the Deauville score is an uptake greater than that of the liver, whereas an unfavorable iPET scan with regard to the ΔSUVmax approach is characterized as a relative reduction of the SUVmax between baseline and iPET staging of less than or equal to 66%. We investigated the 2 methods’ correlation and concordance by Spearman rank correlation coefficient and the agreement in classification, respectively. We further used Kaplan–Meier curves and Cox regression to assess differences in survival between patient subgroups defined by the prespecified cutoffs. Time-dependent receiver-operating-characteristic curve analysis provided information on the methods’ respective discrimination performance. Results: Deauville score and ΔSUVmax approach differed in their iPET-based prognosis. The ΔSUVmax approach outperformed the Deauville score in terms of discrimination performance—most likely because of a high number of false-positive decisions by the Deauville score. Cutoff-independent discrimination performance remained low for both methods, but cutoff-related analyses showed promising results. Both favored the ΔSUVmax approach, for example, for the segregation by iPET response, where the event-free survival hazard ratio was 3.14 (95% confidence interval, 2.22–4.46) for ΔSUVmax and 1.70 (95% confidence interval, 1.29–2.24) for the Deauville score. Conclusion: When considering treatment intensification, the currently used Deauville score cutoff of an uptake above that of the liver seems to be inappropriate and associated with potential harm for DLBCL patients. The ΔSUVmax criterion of a relative reduction in SUVmax of less than or equal to 66% should be considered as an alternative.
- diffuse large B-cell lymphoma
- early metabolic response to therapy
- interim PET
- Deauville score
- deltaSUVmax approach
Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma, showing a widely varying response to standard chemoimmunotherapy usually encompassing 6 cycles of cyclophosphamide, doxorubicin, vincristine, prednisone, and, for patients positive for the cluster of differentiation molecule 20, rituximab (rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone [R-CHOP]) (1). Although approximately one third of all patients progress after 6 cycles of R-CHOP, a substantial proportion of patients might be overtreated (2,3). Thus, risk-adapted treatment approaches are urgently needed but demand precise and reliable tools to guide therapy.
18F-FDG PET has been shown to predict outcome in aggressive lymphomas (4). After 1–4 cycles of treatment, an interim PET/CT (iPET) scan can determine the degree of remaining glucose metabolism (5). Different methods for 18F-FDG PET response assessment at interim staging exist: staging guidelines recommend the Deauville score, a 5-point ordinal scale based mainly on a visual comparison between the glucose uptake of lymphoma tissue and the uptake of liver or mediastinum, respectively (6). A cutoff for the definition of an unfavorable prognosis or a positive iPET response is commonly defined as an uptake greater than that of the liver. An alternative method is the change-in-SUVmax (ΔSUVmax) approach using the SUVmax of the hottest tumor lesion (7). This approach compares SUVmax at baseline and iPET staging. An unfavorable iPET result is defined as a relative SUVmax reduction of less than or equal to 66%—a cutoff that has been confirmed in several studies (3,8–10). Advantages of the Deauville score are that it is easy to apply and requires only the iPET scan. It is, however, associated with an increased false-positive rate and susceptibility for interreader variability (11–13). A disadvantage of the ΔSUVmax approach is that it requires a baseline scan as a reference. Moreover, some have argued that it classifies too few patients to an unfavorable prognosis to be useful to guide therapy (14). In contrast to the Deauville score, it provides semiquantitative assessment that is independent of any background noise and less prone to interreader variability.
Despite their competing nature, little work on a direct comparison of the 2 methods is available in the literature (9,10,12). The PET-Guided Therapy of Aggressive Non-Hodgkin Lymphomas (PETAL) trial has recently shown that iPET response predicts outcome when assessed using the ΔSUVmax approach. In a post hoc analysis of this study, iPET scans were reassessed for the Deauville score; results for the entire trial population consisting of a variety of aggressive B-cell and T-cell lymphoma subtypes have been described before (4). Here, we focus on DLBCL, providing data on the concordance between Deauville score and ΔSUVmax method and their respective discrimination performance.
MATERIALS AND METHODS
Study Population
The PETAL trial (registered under ClinicalTrials.gov NCT00554164 and EudraCT 2006-001641-33) was a multicenter randomized controlled study for patients with newly diagnosed aggressive non-Hodgkin lymphomas investigating treatment options in patients stratified by iPET response (15). The Federal Institute for Drugs and Medical Devices (reference no. 61-3910-4032976) and the ethics committees of all participating sites (reference no. 07-3366) approved the study, and all patients provided written informed consent.
Study Design
Patients were treated with R-CHOP-14 but with 3 wk between cycles 2 and 3 to prevent false-positive results in iPET staging, which uniformly took place after the second cycle (16). Patients with a favorable iPET response (ΔSUVmax > 66%) received either 4 more cycles of R-CHOP or the same treatment plus 2 extra doses of rituximab. In patients with an unfavorable iPET response (ΔSUVmax ≤ 66%), treatment options included continuation of R-CHOP for 6 additional cycles and receipt of 6 blocks of a more intensive protocol originally intended for Burkitt lymphoma (17). Outcome, however, remained unaffected by treatment changes, thus providing an opportunity to use the entire study population to assess the prognostic value of iPET (4).
18F-FDG PET/CT Imaging
In the PETAL trial, 23 nuclear medicine institutions participated. Their local nuclear medicine specialists performed and evaluated 18F-FDG PET images according to the PETAL study protocol as described previously (4). iPET was required to be performed under the same conditions as at baseline staging, and the same PET scanner and reconstruction method had to be used. All scans had to cover a body area at least from the skull base to the mid thigh, PET scans had to be acquired 60 ± 10 min after tracer injection, patients had to have fasted for at least 4 h, and blood glucose levels could not exceed 200 mg/dL. The median chemotherapy-free interval before iPET scanning was 20 d, and no patient’s individual chemotherapy-free interval was shorter than 10 d.
iPET Evaluation
During the trial, iPET scans were evaluated decentrally by local nuclear medicine physicians using the ΔSUVmax method. An iPET response was regarded as unfavorable when the relative SUVmax reduction, compared with baseline, was 66% or less (4,7). Unfavorable iPET scans without unphysiologic 18F-FDG uptake according to visual criteria were also regarded as negative. This modification of the ΔSUVmax approach considered that a return to physiologic activity may require less than a 66% SUVmax reduction in patients with an iPET lacking unphysiologic 18F-FDG uptake. After conclusion of the trial, for 502 of 609 DLBCL patients, iPET scans were reevaluated by any 1 of 3 experienced nuclear medicine physicians using the Deauville scale and defining an unfavorable iPET result as a Deauville score of more than 3—an uptake greater than liver SUVmax (6). If retrievable, iPET scans not available for centralized evaluation were analyzed in the same way by local nuclear medicine experts, yielding 94 additional Deauville scale assessments. Thus, the ΔSUVmax evaluation was uniformly performed decentrally, whereas the Deauville scale evaluation was done in a predominantly centralized manner. A diagram providing an overview of the patient flow in terms of iPET assessments is shown in Figure 1.
Outcome Variables
The prespecified primary endpoint of the PETAL trial (event-free survival defined as the time from iPET staging to disease progression, treatment discontinuation due to excessive toxicity, switch to a nonprotocol treatment, relapse, or death from any cause) was also the main focus of this investigation. We assessed the robustness of our results across more regularly used outcomes and also included the secondary endpoints time to progression, overall survival, and progression-free survival—respectively defined as the time from iPET staging to disease progression, to death from any cause, and to disease progression or death from any cause.
Statistical Analysis
We used the reverse Kaplan–Meier method to calculate the patients’ median follow-up time. The Spearman rank correlation coefficient assessed the association between the 2 iPET methods in general, whereas agreement in classification indicated concordance between the subgroups defined by the cutoffs for Deauville score and ΔSUVmax approach. Kaplan–Meier curves provided the possibility to investigate differences in outcome between these subgroups, and hazard ratios obtained by Cox regression quantified these differences. To characterize the discrimination performance of the ΔSUVmax approach and Deauville score, we used time-dependent receiver-operating-characteristic (ROC) analysis to estimate the area under the ROC curve (AUC), sensitivity, and specificity, as well as the 2 methods’ predictive values (18). We here made use of the nearest-neighbor estimator with the time point of interest being 2 y after iPET staging. A simple bootstrap with 10,000 iterations allowed for the construction of empiric 95% confidence intervals (CIs) for all measures of discrimination performance. In terms of these discrimination measures, we defined an unfavorable iPET response with any of the 2 methods as a positive test result. Note that for the analyses relating to the 66% cutoff and dividing the population into 2 parts (concordance with the Deauville score cutoff, Kaplan–Meier estimation, hazard ratio, sensitivity, specificity, and predicted values), the above-mentioned modification of the ΔSUVmax approach for patients with an iPET lacking unphysiologic 18F-FDG uptake was used. For the correlation with the ordinal Deauville score variable as well as for the ROC curve and the AUC, however, this was not feasible because these analyses are based on the continuous ΔSUVmax variable, not making any binary distinction into good and poor prognosis. We used R, version 3.5.1 (R Core Team), to perform all statistical analyses.
RESULTS
Clinical Characteristics and Follow-up
Our investigation was restricted to DLBCL patients from the PETAL intention-to-treat population with available data from post hoc Deauville score analysis, that is, 596 of 609 (97.9%) DLBCL patients participating in the PETAL trial (Fig. 1). The median follow-up time in the restricted population was 51.4 mo (95% CI, 49.7–53.7 mo), which was comparable to the whole DLBCL subgroup of the PETAL trial. Overall, differences in the characteristics of the subgroup studied here and the entire DLBCL population of the PETAL trial were negligible (Table 1 and Hüttmann et al. (16)). With regard to event-free survival, for 207 patients an event terminated their follow-up time—for 164 of them before the ROC analysis time point of interest at 2 y after iPET staging. Kaplan–Meier curves for the entire cohort can be found for all endpoints in Supplemental Figure 1 (supplemental materials are available at http://jnm.snmjournals.org).
Ninety-two of 596 patients had an SUVmax reduction of 66% or less. In 29 of these, iPET scans were devoid of unphysiologic 18F-FDG uptake, resulting in their reassignment to the favorable prognosis group according to the modification of the ΔSUVmax method described before. Patients thus reclassified tended to have a very low baseline SUVmax (median, 7.2; first quartile, 5.4; third quartile, 9.8). Their outcome resembled that of patients with an SUVmax reduction of more than 66% (Supplemental Fig. 2).
Correlation and Concordance
The Spearman ρ between Deauville score and ΔSUVmax approach was 0.31 (95% CI, 0.23–0.38). The number of patients with an unfavorable iPET response given the respective cutoffs was more than 4 times higher with the Deauville score (45.3%; 270/596) than with the ΔSUVmax approach (10.4%; 62/596). Cutoff-based concordance was 63.1% (376/596), with more than a third of the patients having a ΔSUVmax-favorable but a Deauville score–unfavorable iPET response (Fig. 2A). Looking at the event-free survival curves by concordance, we found that patients with a doubly favorable iPET response had the best outcome and that doubly unfavorable patients had the worst. The event-free survival curve for patients with a ΔSUVmax-favorable but Deauville score–unfavorable iPET response, however, was rather close to the survival curve of doubly favorable patients (Fig. 2B).
Discrimination Performance
The event-free survival Kaplan–Meier estimator at the ROC analysis time point of interest (2 y after iPET) was 71.6% (95% CI, 67.0%–76.6%). Global cutoff-independent discrimination performance as indicated by the AUC was poor for both approaches in all 4 endpoints but for the ΔSUVmax approach consistently higher than for the Deauville score (Fig. 3A). Accordingly, both ROC curves tended to be flat for all endpoints (Supplemental Fig. 3). Regarding the given cutoffs, Kaplan–Meier event-free survival curves graphically showed more pronounced segregation of patients with favorable and unfavorable iPET response with the ΔSUVmax approach than with the Deauville score (Fig. 4). The same was true for the 3 secondary endpoints (Supplemental Fig. 4). Associated hazard ratios were in line with these findings—for example, for event-free survival with a hazard ratio between unfavorable and favorable patients of more than 3 with the ΔSUVmax approach and less than 2 with the Deauville score (Fig. 3B). Sensitivity was higher for the Deauville score (52.5%; 95% CI, 45.5%–59.3%) than for the ΔSUVmax approach (24.6%; 95% CI, 18.6%–31.2%), whereas specificity was lower for the Deauville score (57.5%; 95% CI, 52.8%–62.2%, vs. 88.8%; 95% CI, 85.9%–91.7%), indicating a higher false-positive rate. The positive predictive value favored the ΔSUVmax cutoff over all possible realizations of the unknown event prevalence. In contrast, the negative predictive value slightly favored the Deauville cutoff over ΔSUVmax (Supplemental Fig. 5). For all endpoints, the numeric results of time-dependent ROC analyses and Cox regression are available in Supplemental Table 1.
DISCUSSION
In this comparison of methods assessing early metabolic response to standard R-CHOP treatment in DLBCL patients, we showed that the ΔSUVmax approach outperformed the Deauville score in terms of discrimination performance for event-free survival, progression-free survival, overall survival, and time to progression. This applied to the global discrimination measure AUC as well as to the hazard ratios between subgroups defined by the prespecified cutoffs of the ΔSUVmax approach and Deauville score. The concordance of iPET response with regard to the 2 methods’ most commonly used definitions was relatively low, most likely because of a high false-positive rate associated with the Deauville score definition that an unfavorable iPET response consists of uptake greater than liver SUVmax.
Our observations complement smaller studies comparing ΔSUVmax and Deauville score for evaluation of iPET scans after 2 cycles of R-CHOP or similar regimens. In one study, iPET was prognostic only when the scans were evaluated by the ΔSUVmax method, whereas application of the Deauville criteria failed to yield statistically significant outcome differences (8). In another study, the Deauville scale appeared to predict event-free survival better than the ΔSUVmax approach, but an effect of iPET on overall survival was seen only with the latter method (19). Any comparison of these studies with ours, however, must be exercised with caution, because they differed with regard to treatment performance, use of granulocyte colony-stimulating factor, and iPET timing (4).
The high false-positive rate of the Deauville score has been reported before and is of utter importance in the given setting (11). The general aim of early response to treatment assessments is identifying DLBCL patients who do not respond sufficiently to standard R-CHOP therapy and guiding them to different treatment approaches. Such alternative therapies, however, are usually more aggressive or rather expensive. With the high false-positive rate of the Deauville score, more aggressive approaches would imply the ethical issue of increased toxicity in patients who would also have responded satisfactorily to standard therapy, whereas expensive treatments such as modern cellular therapies would result in a waste of scarce resources. Given its higher specificity, the ΔSUVmax cutoff spares these patients from this potential harm—at the price of a smaller fraction of patients being selected for alternative treatment approaches.
There is another commonly used cutoff for the Deauville score, that is, a Deauville score of more than 2, defined as an uptake greater than that of the mediastinum. Given its definition, the false-positive rate observed with a Deauville score of more than 3 even increases with a Deauville score of more than 2 (Supplemental Fig. 6); we therefore do not recommend its use in the identification of R-CHOP nonresponders. Two smaller studies suggest cutoffs above liver activity to be more appropriate in segregating the DLBCL population after 2 treatment cycles than a Deauville score of more than 3, but neither of the 2 cutoffs proposed (1.4- and 1.6-fold liver SUVmax, respectively) has been validated so far (20,21). Nevertheless, it appears appealing to translate the Deauville score from its ordinal scale to a quantitative scale similar to the ΔSUVmax approach to have more potential cutoffs to choose from (22,23).
Although the cutoff-based Kaplan–Meier curves and associated hazard ratios between unfavorable and favorable patients indicate good segregation, the AUC is relatively poor for both the Deauville score approach and the ΔSUVmax approach. In our opinion, this is a negligible concern because the aim is not to achieve high global discrimination performance across all possible cutoffs as assessed by the AUC. The aim rather is to identify patients at high risk of failing R-CHOP treatment, that is, to realize high local discrimination performance associated with a given criterion—focusing on the analyses associated with the 2 methods’ cutoffs. Another limitation of our comparison of ΔSUVmax approach and Deauville score is the modification of the ΔSUVmax approach in patients with an iPET scan lacking unphysiologic 18F-FDG uptake. Although this modification does not affect concordance, hazard ratios, sensitivity, specificity, or predictive values, it may have an impact on correlation coefficient, ROC curve, and AUC. In the latter analyses, we used the relative reduction of SUVmax on the continuous scale for all patients, that is, also including those patients subsequently being reclassified to the favorable prognosis group because of lack of unphysiologic 18F-FDG uptake according to visual criteria. Overall, this reclassification occurred in 29 patients with an unfavorable iPET response according to their actually measured relative SUVmax reduction. We again would like to highlight that the focus of this investigation was on the cutoff-based analyses, as the question of a therapy switch requires a binary yes or no decision. And, although numbers are small, the outcome of reclassified patients appeared to be similar to that of patients with an SUVmax reduction of more than 66%. Thus, the modification of the ΔSUVmax method introduced in the PETAL trial may also be of value in future investigations.
Despite the good local discrimination performance of the ΔSUVmax approach, much is still unknown about its properties. Although several authors (3,8–10) confirmed the 66% SUVmax reduction cutoff originally proposed by Lin et al. (7), available data on the interrater reliability and reproducibility of the ΔSUVmax approach are scarce. In our investigation, the Deauville score assessment is a predominantly centralized post hoc analysis, and consequently, different nuclear medicine specialists were involved in the Deauville score and ΔSUVmax assessments. Given the higher number of ΔSUVmax than Deauville score raters, this factor gives room for possibly increased interrater variation with the ΔSUVmax approach. Itti et al., however, found the 66% SUVmax reduction to be associated with a higher interrater reproducibility than a Deauville score of more than 3; overall, they rated interobserver agreement as “almost perfect” with the ΔSUVmax approach but only “substantial” when the Deauville score was applied (12). In the PETAL trial, 10% of all iPET scans’ ΔSUVmax results were reevaluated by nuclear medicine physicians from other trial sites, with the concordance between the first and second readers being 97.7% (4). By contrast, the agreement within pairs of experienced nuclear medicine physicians using the Deauville score has been reported to be 77%–90% (24). Nonetheless, this issue calls for additional investigations.
CONCLUSION
The ΔSUVmax definition stating that an unfavorable iPET response consists of a relative SUVmax reduction of 66% or less appears to be a more suitable tool to assess early metabolic response to standard R-CHOP therapy in DLBCL patients than the Deauville score, as the Deauville score definition stating that an unfavorable iPET response consists of uptake above that of the liver (Deauville score > 3) seems to be associated with a high false-positive rate. When therapy intensification or a switch to an experimental treatment is considered, we recommend the ΔSUVmax approach instead of the Deauville score as a prognostic instrument in first-line DLBCL treatment guidance. Whether this is as a standalone tool or must be combined with other patient, tumor, or treatment characteristics requires further study.
DISCLOSURE
Deutsche Krebshilfe provided financial support (grants 107592 and 110515). No other potential conflict of interest relevant to this article was reported.
KEY POINTS
QUESTION: Should the Deauville score (>3) or the ΔSUVmax approach (ΔSUVmax ≤ 66%) be the preferred method to evaluate early response to standard R-CHOP therapy in DLBCL?
PERTINENT FINDINGS: In a post hoc analysis of the PETAL trial, the ΔSUVmax approach had higher discrimination performance than the Deauville score. This was especially true for local discrimination measures associated with the 2 methods’ most commonly used cutoffs, because of an increased false-positive rate for the Deauville score.
IMPLICATIONS FOR PATIENT CARE: To prevent DLBCL patients with a favorable prognosis from harm resulting from unjustified iPET-based treatment intensification, the ΔSUVmax cutoff (ΔSUVmax ≤ 66%) should be considered a standard tool for the assessment of early metabolic treatment response.
Acknowledgments
We thank the patients and investigators for their participation.
Footnotes
Published online May 8, 2020.
- © 2021 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication March 2, 2020.
- Accepted for publication April 9, 2020.