Abstract
When evaluating 18F-FDG PET images with the Deauville score (DS), the quantification of tumor and reference organs limits the problem of optical misinterpretation. Compared with conventional reconstruction algorithms, point-spread function (PSF) modeling increases SUVs significantly in tumors but only moderately in the liver, which could affect the DS. We investigated whether the choice of the reconstruction algorithm affects the DS and whether discordance affects the capability of 18F-FDG PET to stratify lymphoma patients. Methods: Overall, 126 patients with diffuse large B-cell lymphoma were included (56 female and 70 male; median age, 65 y; range, 20–88 y). PET data were reconstructed with the unfiltered PSF method. Additionally, a 6-mm filter was applied to PSF images to meet the requirements of the EANM Research Ltd. (EARL) harmonization program from the European Association of Nuclear Medicine (EANM) (PSFEARL). One hundred interim PET (i-PET) and 95 end-of-treatment PET (EoT-PET) studies were analyzed. SUVmax in the liver and aorta was determined using automatic volumes of interest and compared with SUVmax in the residual mass with the highest 18F-FDG uptake. Results: For i-PET, using PSF and PSFEARL, we classified patients as responders and nonresponders in 60 and 40 cases versus 63 and 37 cases, respectively. Five cases of major discordance (5.0%) occurred (i.e., changes from responder to nonresponder). For Eot-PET, patients were classified using PSF and PSFEARL as responders and nonresponders in 69 and 26 cases versus 72 and 23 cases, respectively. Three cases of major discordance (3.2%) occurred. Concordance (Cohen unweighted κ) between the PSF and the PSFEARL DS was 0.82 (95% confidence interval, 0.73–0.91) for i-PET and 0.89 (95% confidence interval, 0.81–0.96) for EoT-PET. The median follow-up periods were 28.4 and 27.4 mo for i-PET and EoT-PET, respectively. Kaplan–Meier analysis showed statistically significant differences in progression-free survival and overall survival among responders and nonresponders no matter which reconstruction was used for i-PET and EoT-PET. Conclusion: Neither DS nor risk stratification of diffuse large B-cell lymphoma patients is affected by the choice of PET reconstruction. Specifically, the use of PSF is not an issue in routine clinical processes or in multicenter trials. These findings have to be confirmed in escalation and deescalation procedures based on early i-PET.
In the staging, monitoring, and restaging of Hodgkin and non-Hodgkin lymphoma patients, 18F-FDG PET has become the standard procedure (1). During therapy assessment at mid treatment (interim PET [i-PET]) and after completion of chemotherapy (end-of-treatment PET [EoT-PET]), the Deauville score (DS) is used to discriminate between responders and nonresponders (2–4). Responders are usually defined as DS1–DS3, whereas nonresponders are defined as DS4 and DS5, except for deescalating trials in which DS2 may be required at i-PET for entry into the deescalated arm (1,5). Quantitative evaluation of residual tumor masses and reference organs (liver and blood pool) is required when scoring 18F-FDG PET with the DS because this approach limits the problem of optical misinterpretation due to the influence of background activity (1,4).
Over the past few years, new reconstruction algorithms have been released and shown to improve diagnostic accuracy in various solid tumors. In particular, point-spread function (PSF) reconstruction is available from the 3 major PET vendors and is progressively replacing conventional ordered-subset expectation maximization either alone or in addition to time-of-flight (TOF) capability (6). In addition to the improvement in diagnostic performance (7), PSF modeling significantly increases SUV metrics in tumor lesions compared with conventional reconstruction algorithms, but only moderately in the liver and in the vascular background (8,9). Consequently, these increases could affect DS by systematically increasing the score. This issue was exemplified by the RATHL lymphoma trial, which mandated that centers with PSF reconstruction or TOF disable these features when participating in the study (10). The EANM Research Ltd. (EARL) harmonization program from the European Association of Nuclear Medicine (EANM) has been shown to efficiently overcome the issue of reconstruction inconsistencies in nonhematologic solid tumors (11). EARL-accredited centers tend to use 2 PET datasets: one optimized for diagnostic purposes and one using a filter chosen so that the reconstruction meets the EANM/EARL harmonizing standards.
The aim of the present study was to investigate whether the choice of reconstruction algorithm may affect DS in a significant number of patients compared with a reconstruction meeting the EANM/EARL harmonizing standards. The clinical relevance of induced changes was assessed by studying progression-free survival (PFS) and overall survival (OS) in responders versus nonresponders to determine whether potential discordance would affect the risk stratification capability of 18F-FDG PET in lymphoma patients.
MATERIALS AND METHODS
Patient Selection
Patients newly diagnosed with diffuse large B-cell lymphoma (DLBCL) between October 2008 and September 2015 were included in this retrospective study. The patients were referred to our PET unit for baseline PET, i-PET after 4 courses of chemotherapy, or EoT-PET.
For each patient, baseline data were recorded, including age, sex, body mass index, ECOG performance status, lactate dehydrogenase rate, Ann Arbor stage, extranodal involvement, presence or absence of bone marrow involvement, age-adjusted international prognosis index, presence or absence of B symptoms, and presence or absence of bulky disease.
All patients were treated with standard chemotherapy depending on the stage, age, and site of initial involvement. Either the patients received rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone, or they received rituximab, doxorubicin, cyclophosphamide, vindesine, bleomycin, and prednisone. Follow-up data were recorded at scheduled visits.
According to European regulations, French observational studies without any additional therapy or monitoring procedures do not need ethics approval. Nonetheless, approval to collect data for our study was obtained from the national committee for data privacy (registration no. 2084622v0).
PET Acquisition and Reconstruction Parameters
PET studies were performed per the EANM guidelines for PET tumor imaging (12). Patients fasted at least 6 h before receiving an intravenous injection of 18F-FDG (mean injected dose ± SD, 4.00 ± 0.22 MBq/kg). The mean blood glucose level was 1.03 ± 0.23 g/L at the time of injection. According to the EANM guidelines, all PET studies were performed after 60 ± 5 min of rest in a warm room (mean delay between injection and acquisition, 61.4 ± 6.5 min) on a Biograph TrueV system (Siemens Medical Solutions) with a 6-slice spiral CT component.
First, a free-breathing CT acquisition was performed using the following parameters: 60 mAs, 130 kVp, pitch of 1, and 6 × 2 mm collimation. Then, the PET emission acquisition was performed in 3-dimensional mode. Patients were scanned from the base of their skull to their mid thighs, with acquisition times per bed position of 160 s for normal-weight patients (body mass index ≤ 25 kg/m2) and 220 s for overweight patients (body mass index > 25 kg/m2), respectively.
Raw data were reconstructed with a PSF reconstruction algorithm (HD⋅PET data reconstructed with TrueX algorithm; Siemens Medical Solutions) with 3 iterations and 21 subsets and no filtering. The matrix size was 168 × 168 voxels, which resulted in voxels of 4.07 × 4.07 × 4.07 mm. Scatter and CT attenuation correction based on the CT scan was applied.
PET Analysis
PET analysis was performed using Syngo.via and EQ⋅PET software (Siemens Medical Solutions). EQ⋅PET computes SUV on images optimized for diagnosis (in our center, unfiltered PSF) while having access to SUVs to meet the EANM/EARL harmonizing standards with no need for a second reconstruction dataset (13). EQ⋅PET has been shown to provide SUV metrics that are consistent with those produced by a second reconstruction (14).
All PET examinations were reviewed by a single board-certified PET reader using PSF images for optimal lesion detection. Subsequently, all quantitative values were recorded after application of a 6-mm gaussian filter to obtain harmonized SUV according to the EARL harmonization program (PSFEARL).
For each study, both PSF and PSFEARL quantitative values were recorded. Liver SUVmax (SUVliver) was automatically measured using a 3-cm-diameter volume of interest placed in the right liver lobe. In cases of focal liver involvement, liver lesions were avoided, and in cases of diffuse liver involvement, examinations were excluded. Mediastinum SUVmax (SUVmediastinum) was automatically measured using a 1-cm-diameter and a 2-cm-high cylinder in the descending aorta. Lesion SUVmax (SUVlesion) was measured with a manual volume of interest placed in the most intense residual target lesions. For this purpose, the baseline scan was systematically reviewed to make sure this site was initially involved.
i-PET and EoT-PET results were scored according to the DS (4) as DS1 (no residual uptake), DS2 (uptake ≤ mediastinum), DS3 (uptake > mediastinum but ≤ liver), DS4 (moderately increased uptake > liver), or DS5 (markedly increased uptake > liver, defined as SUVlesion ≥ 2 × SUVliver) or new lesions related to lymphoma.
Patients with DS1–DS3 were classified as responders. Patients with DS4 or DS5 were classified as nonresponders. ΔSUVmax was computed as follows:The cutoff to be classified as responder was ΔSUVmax greater than 70% (15).
Statistical Analysis
Quantitative data are presented as the mean ± SD or, when needed, as the median and range. Agreement between PSF and PSFEARL reconstructions was assessed by calculating the Cohen unweighted κ. The prognostic value of i-PET and EoT-PET was assessed using PFS and OS. PFS was defined as the time from diagnosis to progression or death, and OS was the time from diagnosis to death from the lymphoma disease (lymphoma itself or treatment side effects). Survival curves were obtained with the Kaplan–Meier test and compared with the log-rank test. Statistical significance was considered present at a P value of less than 0.05. Graphs and analyses were produced using GraphPad Prism and MedCalc software.
RESULTS
Patient Characteristics
A database of 195 PET examinations (100 i-PET and 95 EoT-PET) for 126 patients was created. Patient clinical characteristics are displayed in Table 1. Thirty-one patients underwent only i-PET, 26 patients underwent only EoT-PET, and 69 patients underwent both i-PET and EoT-PET. Diffuse liver involvement did not occur, and therefore no examination was excluded.
PSF and PSFEARL Agreement for Determination of DS
I-PET Examinations
For i-PET examinations, patients were classified using PSF SUVs as DS1, DS2, DS3, DS4, and DS5 in 24, 13, 23, 31, and 9 cases, respectively. These patients were classified as 60 responders and 40 nonresponders. Patients were classified using PSFEARL SUVs as DS1, DS2, DS3, DS4, and DS5 in 24, 17, 22, 31, and 6 cases, respectively, which resulted in 63 responders and 37 nonresponders. DS discordance between PSF and PSFEARL occurred in 14 cases (14.0%): 1 patient moved from DS2 to DS3, 5 from DS3 to DS2, 1 from DS3 to DS4, 4 from DS4 to DS3, and 3 from DS5 to DS4 (Table 2). Concordance between PSF and PSFEARL was almost perfect, with a Cohen κ of 0.82 (95% confidence interval, 0.73–0.91). Noticeably, only 5 cases of major discordance (5.0%) occurred: 4 patients classified as nonresponders with PSF became responders with PSFEARL, and 1 patient classified as a responder with PSF became a nonresponder with PSFEARL.
Table 3 presents the clinical and PET characteristics, as well as the outcomes, for these patients. When focusing on the ΔSUVmax of these 5 patients (Table 3), we found that PSF and PSFEARL ΔSUVmax status was concordant in all cases. Interestingly, the ΔSUVmax status always matched the PSFEARL DS status. Also, an excellent correlation was found between ΔSUVmax for all the patients of the i-PET group as shown in Supplemental Figure 1 (supplemental materials are available at http://jnm.snmjournals.org).
EoT-PET Examinations
For EoT-PET examinations, patients were classified using PSF SUVs as DS1, DS2, DS3, DS4, and DS5 in 36, 14, 19, 17, and 9 cases, respectively. This result led to 69 responders and 26 nonresponders. Patients were classified using PSFEARL SUVs as DS1, DS2, DS3, DS4, and DS5 in 36, 18, 18, 15, and 8 cases, respectively, which resulted in 72 responders and 23 nonresponders. Eight discordant cases occurred (8.4%) between PSF and PSFEARL DSs: 4 patients moved from DS3 to DS2, 3 from DS4 to DS3, and 1 from DS5 to DS4 (Table 4). Concordance between PSF and PSFEARL was almost perfect, with a κ of 0.89 (95% confidence interval, 0.81–0.96). Notably, only 3 cases of major discordance occurred among the patients (3.2%), with these involving a change from nonresponders according to PSF to responders according to PSFEARL.
Table 3 presents the clinical and PET characteristics, as well as the outcomes, for these patients. Figure 1 displays a representative example of a patient who had major discordance between PSF and PSFEARL DSs. Figure 2 displays the lesion-to-liver ratios in DS3–DS5 patients.
Survival Analysis
I-PET Examinations
For i-PET examinations, the median follow-up time for patients was 28.4 mo (range, 3–84 mo). At 2 y, 15 patients (15.0%) experienced progression or relapse of their DLBCL, and 12 patients (12.0%) died from lymphoma disease (lymphoma itself or treatment side effects). For the whole group, the estimated PFS at 2 y was 80.2% ± 4.1%, and the estimated OS at 2 y was 87.3% ± 3.4%. When PSF was used to classify patients, there was a significant difference between the OS and PFS of responders versus nonresponders (Figs. 3A and 3C). The same degree of significance was observed with PSFEARL (Figs. 3B and 3D).
EoT-PET Examinations
For EoT-PET examinations, the median follow-up time for patients was 27.4 mo (range, 6–84 mo). At 2 y, 15 patients (15.8%) experienced progression or relapse of their DLBCL, and 10 patients (10.5%) died from the lymphoma disease (either from lymphoma itself or treatment side effects). For the whole group, the estimated PFS at 2 y was 80.3% ± 4.2%, and the estimated OS at 2 y was 88.5% ± 3.4%. When PSF was used to classify patients, there was a significant difference between the OS and PFS of responders versus nonresponders (Figs. 4A and 4C). Similar results were observed with PSFEARL (Figs. 4B and 4D).
DISCUSSION
In this study, using a methodology to avoid any inter- or intraobserver variability, the proportion of discordant cases in DS when reconstructing PET raw data with either unfiltered PSF reconstruction or EARL-compliant reconstruction was moderate (14.0% in i-PET and 8.4% in EoT-PET). When classifying patients as responders (DS1–DS3) versus nonresponders (DS4 and DS5), the frequency of discordance was even lower: 5.0% in i-PET and 3.2% in EoT-PET. More important, we assessed the clinical relevance of changes induced by the use of PSF modeling and found that the risk stratification capability of 18F-FDG PET in DLBCL patients was not affected by the choice of the reconstruction algorithm because similar 2-y PSF and OS were observed for both algorithms. Of note, most of the discordant cases were related to minimal changes in lesion-to-liver ratio on PSF versus EARL-compliant images and would have been eliminated by using a less stringent cutoff, such as a 1.4 lesion-to-liver ratio for i-PET (16).
It is noteworthy that the PSF reconstruction setting used in our PET center can be considered the worst-case scenario in terms of SUV reconstruction dependency because we do not use any postfiltering. This approach leads to the highest differences in SUV in residual masses compared with a standard algorithm, such as ordered-subset expectation maximization, or an EARL-compliant reconstruction. Our group has reported an increase in SUVmax equal to 66% for unfiltered PSF compared with an ordered-subset expectation maximization reconstruction meeting the EARL requirements for small nodal metastases (7). Many centers tend to use a filter with a small kernel (2–3 mm) (13), and therefore it is likely that the number of discordant cases in DS would have been even lower in these centers.
On the basis of the present results, one could conclude that, unlike what was cautiously recommended by some multicenter studies such as the RATHL trial (10), disabling the PSF or PSF+TOF capability of a PET system seems not to be warranted when obtaining DSs for a PET response–adapted trial. Furthermore, because PSF and PSF+TOF quantification capabilities have been shown to be almost similar in terms of activity recovery (13), this statement could be extrapolated to PSF+TOF. Going further, and keeping in mind that quantitative data from a baseline scan are not required to obtain DSs on i-PET or EoT-PET (only visualization of the baseline scan is still highly recommended to identify the initial locations of the disease and select the target lesion for a DS assessment, which results in better interobserver concordance (17)), one could consider that a patient could in fact be scanned on different PET systems at baseline and follow-up.
However, several points mitigate the use of PSF in PET-driven trials and tolerance of reconstruction inconsistencies between baseline and posttreatment scans. First, scanning a patient on different PET systems would be an issue when computing ΔSUV between baseline scan and i-PET (18). For example, the use of an EARL-compliant system for baseline, along with an advanced algorithm, such as PSF for i-PET, would artificially lower the ΔSUV and provide an inaccurate therapeutic assessment, which could potentially lead to inadequate changes in patient case management. Additionally, the use of SUV metrics as prognostic factors when pooling data from several PET systems would be affected by reconstruction inconsistencies. Finally, the delineation of metabolically active tumor volume, which is increasingly being used as a prognosticator in lymphoma patients (19,20), is strongly affected by the use of advanced reconstruction algorithms (21). For these reasons, it seems preferable to pursue ongoing harmonization efforts with programs such as EARL or QIBA.
In favor of the harmonized reconstructions, it is noteworthy that analysis of discordant cases in the i-PET group (Table 3) showed that ΔSUV in those cases was always concordant between both algorithms (i.e., above or below the 70% validated threshold after 4 cycles of chemotherapy in DLBCL patients (18)) and was also concordant with the DS provided by the EARL-compliant reconstruction. This finding has to be confirmed, of course, by further studies investigating the impact of the reconstruction algorithm on ΔSUV.
Finally, although this was not investigated in the present study, it is noteworthy that discordance between PSF and EARL-compliant reconstructions led to changes from DS2 to DS3 that would affect the outcome of deescalation trials. Indeed, in those trials, DS3 on early i-PET is generally not considered to be sufficient to allow entry into the deescalated arm, and DS2 is required (1,5). Further research is therefore needed before the findings of the present study are applied to deescalation trials.
CONCLUSION
In this study, the i-PET and EoT-PET DSs of DLBCL patients were minimally affected by the choice of PET reconstruction method, and the observed changes did not affect the risk stratification capability of 18F-FDG PET. The use of advanced reconstruction algorithms, such as PSF modeling, seems not to be an issue in routine clinical procedures or in multicenter trials when the DS is computed. However, it seems preferable to pursue ongoing harmonization efforts, at least when computing ΔSUV or pooling SUVs provided by different PET systems.
DISCLOSURE
No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Dec. 14, 2017.
- © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication September 27, 2017.
- Accepted for publication November 28, 2017.