Abstract
Considering the different treatment strategy for transformed follicular lymphoma (TF) as opposed to follicular lymphoma (FL), diagnosing transformation early in the disease course is important. There is evidence that 18F-FDG has utility as a biomarker of transformation. However, quantitative thresholds may require inclusion of homogeneous non-Hodgkin lymphoma subtypes to account for differences in tracer uptake per subtype. Moreover, because proliferation is a hallmark of transformation, 3′-deoxy-3′-18F-fluorothymidine (18F-FLT) might be superior to 18F-FDG in this setting. To define the best tracer for detection of TF, we performed a prospective a head-to-head comparison of 18F-FDG and 18F-FLT in patients with FL and TF. Methods: 18F-FDG and 18F-FLT PET scans were obtained in 17 patients with FL and 9 patients with TF. We measured the highest maximum standardized uptake value (SUVmax), defined as the lymph node with the highest uptake per patient, and SUVrange, defined as the difference between the SUVmax of the lymph node with the highest and lowest uptake per patient. To reduce partial-volume effects, only lymph nodes larger than 3 cm3 (A50 isocontour) were analyzed. Scans were acquired 1 h after injection of 185 MBq of 18F-FDG or 18F-FLT. To determine the discriminative ability of SUVmax and SUVrange of both tracers for TF, receiver-operating-characteristic curve analysis was performed. Results: The highest SUVmax was significantly higher for TF than FL for both 18F-FDG and 18F-FLT (P < 0.001). SUVrange was significantly higher for TF than FL for 18F-FDG (P = 0.029) but not for 18F-FLT (P = 0.075). The ability of 18F-FDG to discriminate between FL and TF was superior to that of 18F-FLT for both the highest SUVmax (P = 0.039) and the SUVrange (P = 0.012). The cutoff value for the highest SUVmax of 18F-FDG aiming at 100% sensitivity with a maximum specificity was found to be 14.5 (corresponding specificity, 82%). For 18F-FLT, these values were 5.1 and 18%, respectively. When the same method was applied to SUVrange, the cutoff values were 5.8 for 18F-FDG (specificity, 71%) and 1.5 for 18F-FLT (specificity, 36%). Conclusion: Our data suggest that 18F-FDG PET is a better biomarker for TF than 18F-FLT PET. The proposed thresholds of highest SUVmax and SUVrange should be prospectively validated.
Follicular lymphoma (FL) is the most common form of indolent B cell non-Hodgkin lymphoma, accounting for about 30% of all non-Hodgkin lymphomas. Its clinical course varies and is characterized by repeated but transient responses to therapy. Histologic transformation into an aggressive lymphoma occurs in 17%, 28%, and 37% of FL patients after 5, 10, and 15 y, respectively, with an apparent plateau at 15 y, after which transformation rarely seems to occur (1). There is increasing evidence that autologous consolidation of transformation of FL (TF) patients as first-line treatment may improve survival (2–4). Furthermore, retrospective analyses suggest that patients can be cured more often when transformation is diagnosed at an early stage (5,6). Consequently, correct and early diagnosis is a prerequisite for adequate treatment of patients with TF. Transformation can be heralded by rapid growth of lymph nodes, an elevated lactate dehydrogenase, or development of systemic symptoms (7). Histology remains the gold standard, defining transformation as the presence of sheets of blastic cells or frank diffuse large B cell lymphoma in a patient diagnosed with FL. Therefore, it is mandatory to perform biopsy at the slightest suspicion of transformation. However, because transformation may not involve all lymph nodes, sampling errors can lead to a significant diagnostic delay.
This problem might be overcome by the use of PET because this technique allows for whole-body tissue characterization, enabling determination of areas of high metabolic or proliferative activity. Currently, 18F-FDG PET is used for staging and response evaluation in both aggressive and more indolent types of lymphoma (8). There is a clear trend toward higher 18F-FDG uptake in more aggressive histologic subtypes. Therefore, a high uptake in an indolent lymphoma could support the suspicion of transformation. However, there is a considerable overlap in 18F-FDG uptake between aggressive and indolent lymphomas, potentially impairing its utility to detect transformation (9–11). To overcome this problem, alternative tracers might be useful. Conceptually, 3′-deoxy-3′-18F-fluorothymidine (18F-FLT) reflects proliferation more closely than 18F-FDG (12,13). The limited data on 18F-FLT PET in patients with transformed FL suggest a higher 18F-FLT uptake in aggressive lymphoma than in indolent lymphoma, albeit with overlap (14,15).
Studies on the role of PET in the detection of transformation typically comprise a spectrum of histologic subtypes, reporting considerable variability in uptake of 18F-FDG. However, because 18F-FDG uptake may strongly vary among histologic subtypes of indolent lymphoma (16) and their transformation (10), thresholds of 18F-FDG uptake (standardized uptake value) to detect transformation may be a function of the subtype.
To define the best discriminative tracer for the detection of TF, we performed a prospective study with a head-to-head comparison of 18F-FDG and 18F-FLT in a homogeneous patient group consisting of patients with FL and TF only. In addition to maximum tracer uptake, the intrapatient variability of tracer uptake was determined because this parameter might be a more accurate indicator for transformation.
MATERIALS AND METHODS
Patients with untreated histologically proven FL and patients with histologically proven TF were eligible. FL patients underwent a biopsy to establish the diagnosis, defined according to the World Health Organization classification (17), and were included based on this histology. Because it is unethical to obtain a biopsy of all involved lymph nodes in FL patients to rule out histologic transformation in every separate lymph node, we defined FL as a pathologically proven diagnosis of FL in a lymph node, confirmed retrospectively by a clinical course fitting FL. The clinical course comprised no need for therapy for at least 1 y after inclusion in the study OR a complete remission or partial remission on CT scan after therapy for indolent lymphoma (i.e., therapy without anthracyclines) and a subsequent treatment-free period of more than 3 mo.
In TF patients, a biopsy was taken because of clinical symptoms suggesting transformation (B symptoms, localized tumor mass growth, or elevated lactate dehydrogenase). Transformation was defined as (areas of) diffuse large B cell lymphoma in a biopsy obtained from a patient previously diagnosed with FL.
The treating hematologist was masked to all data except for the staging results (qualitative assessment) of the 18F-FDG scan, in the context of standard patient care.
Patients were included when they had at least 1 lymph node with a diameter of at least 2 cm (measured on CT scan or ultrasound). Patients were excluded if treatment was started before PET/CT or if they had (transformation of) types of indolent non-Hodgkin lymphomas other than FL. In accordance with the Declaration of Helsinki, all patients gave written informed consent to participate in this single-center study, which was approved by the institutional review board. This trial was registered in the Dutch trial register (NTR code 1487).
PET
Each patient underwent 18F-FDG as well as 18F-FLT PET/CT within 1 wk, in random order, depending on logistics. After at least 6 h of fasting, patients were injected with approximately 185 MBq of 18F-FDG or 18F-FLT intravenously. All studies were performed on a Gemini TOF-64 PET/CT scanner (Philips). Low-dose CT was collected using a beam current of 30–50 mAs at 120 keV. Images (3 min per bed position) covered the mid thigh to skull vertex trajectory, starting 60 min after injection. Plasma glucose levels were routinely obtained before 18F-FDG PET/CT. Calibration and scanning procedures complied with the guidelines of the European Association of Nuclear Medicine (18).
CT images were reconstructed using an image matrix size of 512 × 512, resulting in voxel sizes of 1.17 × 1.17 mm and a slice thickness of 5 mm. For PET, data were reconstructed by means of a raw action ordered-subset expectation maximization algorithm using default reconstruction parameters. Time-of-flight information was used during reconstruction. Reconstructed images had an image matrix size of 144 × 144, a voxel size of 4 × 4 mm, and a slice thickness of 4 mm. The postreconstruction image resolution was 7 mm in full width at half maximum.
PET images were evaluated by 2 independent observers. Nodal 18F-FDG uptake was classified as positive if uptake exceeded that of liver. 18F-FLT uptake was positive if uptake was enhanced, compared with local background.
18F-FDG and 18F-FLT uptake as defined with standardized uptake value (SUV) (maximum SUV [SUVmax] and 50% and 70% of the sum of maximum and background values [SUV A50% and SUV A70%, respectively]) were measured for all visually positive lymph nodes of at least 3 cm3 (as defined with A50 volume-of-interest isocontouring, to account for partial-volume effects) (19,20).
Tumor volumes of interest were defined using a 3-dimensional (3D) region-growing algorithm, as described previously (21). This algorithm is based on the 3D search algorithm in the IDL software package (Interactive Data Language, version 6.3; Research Systems Inc.). In short, the program first searched for the location of the maximum voxel value within a (semiautomatically or manually) predefined region. Next, using this maximum value (SUVmax) and its location as a starting point, a 3D volume of interest was defined automatically using a 3D region-growing algorithm, including all voxels above a specified threshold. This threshold was set at SUV A50% and SUV A70%. The local background value was derived automatically using a 3D shell of 1 voxel thickness at 1.5 cm from the border of the initially estimated or predefined tumor volume. This initial estimate was based on the 70% of maximum pixel value 3D isocontour (22,23). SUVs were normalized to body weight and to serum glucose for 18F-FDG.
Because transformation in patients with FL might not occur in all lymph nodes simultaneously, we hypothesized that the intrapatient variability of tracer uptake might reflect the process of transformation. For either tracer, and for each patient, apart from measuring the SUVmax of the most avid lymph node (highest SUVmax) we calculated the SUVrange, defined as the difference between maximum and minimum uptake within an individual patient.
Statistics
Correlations were calculated using the Pearson r method. To compare follow-up times and SUVs between FL and TF groups, we used the nonparametric Mann–Whitney U test. The discriminative ability of the highest SUVmax and SUVrange to distinguish the absence and presence of transformation were quantified by means of the area under the receiver-operating-characteristic curve, using our definition of transformation (see the “Materials and Methods” section) as the reference test. From this receiver-operating-characteristic curve analysis, we also determined a cutoff value for detection of transformation. The cutoff value chosen was the smallest cutoff value for which sensitivity in the sample was 100% (i.e., maximizing specificity under the restriction of no false-negatives).
Sample size was based on the comparison of mean SUVmax between the FL and TF groups. The planned number of 17 per group would provide 80% power to detect a difference of 1 SD (∼5 units) in mean SUVmax, assuming 2-sided testing at a significance level of 5%. To protect patients from both the physical and the radiation burden of 2 consecutive PET scans, the institutional review board requested an analysis after inclusion of half of the TF patients. This paper presents the results of the study after inclusion of 9 (of a planned number of 17) TF patients. By that time the planned inclusion of 17 FL patients had already been completed. Statistical analyses were performed using the SPSS statistical package (version 20.0; IBM), except for comparison of areas under the curve (AUCs) between 18F-FDG and 18F-FLT, which was performed in SAS (version 9.2; SAS Institute Inc.).
RESULTS
From November 2008 until June 2011, we included 17 patients with FL and 9 with histologically proven TF. Median clinical follow-up of all patients was 31.5 mo (range, 14–43 mo). Follow-up time was similar for FL and TF patients (P = 0.79, Table 1). All patients with FL histology at the time of PET/CT satisfied our definition of FL during their subsequent disease course: 6 did not need immediate treatment, 2 of them eventually required treatment during follow-up (after 17 and 21 mo), and 1 of them was diagnosed with TF after 21 mo (sudden increase of a previously stable lymph node). The remaining 11 FL patients reached complete remission on CT scan after chemoimmunotherapy, with a median response duration of 30 mo (range, 14–43 mo). All FL patients were alive at last follow-up.
Eight of 9 TF patients reached complete remission on PET/CT after induction therapy, 7 of whom were eligible for consolidation with autologous stem cell transplantation. Of these 7 patients, only 1 patient relapsed after 30 mo. The patient without consolidation died of secondary acute myeloid leukemia 34 mo after her treatment. In the single patient who obtained a partial remission only on PET/CT after induction therapy, the autologous stem cell transplantation did not result in an improvement of response and progression occurred 3 mo after transplant, eventually leading to death. Median progression-free survival and overall survival for TF patients were both 29 mo (Table 1).
For either tracer, the mean uptake interval between injection and image acquisition was 61 min (SD, 7.9 min). During 18F-FDG PET examination, serum glucose levels ranged from 5.4 to 7.2 mmol/L, except in 1 diabetic TF patient who had a plasma glucose level of 16 mmol/L.
The number of visually positive lymph nodes was similar for 18F-FDG and 18F-FLT PET.
We measured an SUV of 259 lymph nodes in the 26 patients (median, 9 per patient; range, 2–23). Because results of the various SUV metrics were highly concordant for either tracer, r = 0.99, P < 0.01, we report only the SUVmax-based data. SUV A50% can be inferred by multiplying SUVmax by 0.68.
In individual patients, the most avid lymph node was the same for 18F-FDG and 18F-FLT in only 42% (11/26 patients; 5 FL and 6 TF).
The highest intrapatient SUVmax was significantly higher for TF than FL for both 18F-FDG and 18F-FLT (Table 2; both P < 0.001). However, there was a considerable overlap between the SUVmax of TF and FL, for both tracers (Fig. 1). The intrapatient SUVrange of 18F-FDG was significantly higher for TF than FL (Table 2; P = 0.029) but not for 18F-FLT (Table 2; P = 0.075). Values for each individual patient are depicted in Figure 1.
In receiver-operating-characteristic analysis, we found that the ability of 18F-FDG to discriminate between FL and TF was superior to that of 18F-FLT for the highest SUVmax (Table 3; P = 0.039) and for the SUVrange (Table 3, P = 0.012). The cutoff value for the highest SUVmax of 18F-FDG aiming at 100% sensitivity with a maximum specificity was 14.5, with a corresponding specificity of 82% (for 18F-FLT, 5.1 and 18%, respectively). When the same method was applied to the intrapatient SUVrange, the cutoff values were 5.8 for 18F-FDG (corresponding specificity, 71%) and 1.5 for 18F-FLT (corresponding specificity, 36%).
DISCUSSION
In view of the different treatment strategy for TF as opposed to FL, diagnosing transformation early in the course of the disease is of utmost importance. Our head-to-head comparison of 18F-FDG and 18F-FLT in a homogeneous group of patients with either FL or histologically proven TF suggests that when the highest SUVmax or the SUVrange is used, 18F-FDG is superior to 18F-FLT in the detection of TF. When thresholds maximizing sensitivity were used, 18F-FDG’s highest SUVmax and SUVrange correctly identified all transformed patients, misclassifying 3 and 5 FL patients as TF, respectively. In contrast, the highest SUVmax and SUVrange of 18F-FLT were not suited to detect transformation: here, with the aim at detecting all transformed patients, 14 and 12 FL patients were erroneously classified as TF, respectively.
Other studies using 18F-FDG in this setting included mixtures of several lymphoma subtypes, and this heterogeneity may have contributed to the lack of consistency of thresholds of highest SUVmax or SUVrange (9–11). For example, our median 18F-FDG highest SUVmax for FL (10.9, Table 2) is higher than the threshold of 10 proposed by Schöder et al., excluding indolent lymphoma with a specificity of 81% (9). 18F-FDG avidity seems to be related to the histologic subtype of indolent lymphoma and its transformation (16,24,25). Noy et al. reported higher 18F-FDG uptake in transformed FL than in transformed marginal zone lymphoma and chronic lymphocytic leukemia (10). We therefore suggest that thresholds indicating transformation should be investigated in homogeneous patient cohorts. Research on absolute thresholds will strongly benefit from the implementation of standardization of quantitative procedures as proposed in the guidelines of the European Association of Nuclear Medicine (18).
Because of biologic reasons, we hypothesized that 18F-FLT would be superior to 18F-FDG in detecting transformation. 18F-FLT has been reported as a specific biomarker of proliferation (12,13,15). However, we could neither determine a cutoff value for highest SUVmax nor find a significant difference between the SUVrange of TL and FL, allowing differentiation. In our series, at optimal sensitivity, the specificity of only 36% would imply an unacceptably high proportion of patients requiring a biopsy to exclude transformation. The 58% discordance rate between nodal sites of highest 18F-FDG and 18F-FLT uptake confirms that these tracers reflect different biologic processes. The poor performance of 18F-FLT may question its specificity for proliferation. In an earlier study on FL patients, we showed that 18F-FLT uptake was poorly associated with Ki-67 expression. The observed high 18F-FLT uptake in FL may also be due to 18F-FLT being a substrate for DNA repair (26). The reverse of this hypothesis would be that TF shows a lower uptake than expected based on proliferation. It has been shown that 18F-FLT uptake is underestimated if the tumor relies primarily on de novo thymidine synthesis, thereby bypassing the thymidine salvage pathway that is also used by 18F-FLT (27). It is not known to what extent TF uses this de novo pathway, and consequently these TF show lower 18F-FLT uptake although they are highly proliferative. Moreover, in preclinical models high intrinsic thymidine levels can also inversely affect 18F-FLT uptake, leading to less uptake despite a high tumor proliferation rate. The clinical impact of this phenomenon remains to be determined (28).
In our original study protocol, we had not specified an α-spending function for the interim analysis requested by the ethics committee. In a formal interim analysis, the P values for comparing AUCs that were found would likely have been too large to conclude significance and so strictly we would have had to include an additional 8 TF patients. However, after weighing the burden for the additional patients and our assessment of the probability that in the final analysis a significant difference would have been found in favor of 18F-FLT, it was decided to end the study prematurely.
Obviously, our data and thresholds need to be validated, for example, by prospectively implementing 18F-FDG PET routinely on suspected FL transformation. We speculate that in such a setting performance might be better than we have currently observed: our study design did not allow inclusion of critically ill TF patients with high disease burden (and most likely high uptake) because it was unethical to delay treatment until both PET/CT scans had been obtained. Additionally, we cannot exclude that our threshold results were quantitatively biased by the fact that at the time of PET/CT the largest or most rapidly growing lymph node had been excised for histology in the TF patients. Such bias would likely lead to underestimated sensitivity and specificity of highest intrapatient SUVmax and SUVrange (9). On the basis of our data, we suggest that for optimal detection of TF, PET/CT should be performed before the biopsy. At that moment the diagnostic accuracy is optimal; moreover, given the high intraindividual heterogeneity in uptake, PET will be helpful in the decision of where to biopsy. Although no study showed biopsies of all lymph nodes in a patient, we share the opinion that the lymph node with the highest uptake is most likely the transformed lymph node, also considering data showing uptake correlating with aggressiveness (9–11,16).
CONCLUSION
Our data suggest that 18F-FDG PET is a better biomarker of TF than 18F-FLT PET. Our proposed SUV-based thresholds indicate that TF should be prospectively validated in a real-life clinical setting that is compliant with prevailing guidelines for quantitative 18F-FDG PET.
DISCLOSURE
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734. No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Jan. 15, 2015.
- © 2015 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
REFERENCES
- Received for publication October 7, 2014.
- Accepted for publication December 4, 2014.