Abstract
Evaluation of tumor heterogeneity based on texture parameters has recently attracted much interest in the PET imaging community. However, the impact of reconstruction settings on texture parameters is unclear, especially relating to time-of-flight and point-spread function modeling. Their effects on 55 texture features (TFs) and 6 features based on first-order statistics (FOS) were investigated. Standardized uptake value (SUV) measures were also evaluated as peak SUV (SUVpeak), maximum SUV, and mean SUV (SUVmean). Methods: This study retrospectively recruited 20 patients with lesions in the lung who underwent whole-body 18F-FDG PET/CT. The coefficient of variation (COV) of each feature across different reconstructions was calculated. Results: SUVpeak, SUVmean, 18 TFs, and 1 FOS were the most robust (COV ≤ 5%) whereas skewness, cluster shade, and zone percentage were the least robust (COV > 20%) with respect to reconstruction algorithms using default settings. Heterogeneity parameters had different sensitivities to iteration number. Twenty-four parameters including SUVpeak and SUVmean exhibited variation with a COV less than 5%; 28 parameters, including maximum SUV, showed variation with a COV in the range of 5%–10%. In addition, skewness, cluster shade, and zone percentage were the most sensitive to iteration number. In terms of sensitivity to full width at half maximum (FWHM), 15 TFs and 1 FOS had the best performance with a COV less than 5%, whereas SUVpeak and SUVmean had a COV between 5% and 10%. Grid size had the largest impact on image features, which was demonstrated by only 11 features, including SUVpeak and SUVmean, having a COV less than 10%. Conclusion: Different image features have different sensitivities to reconstruction settings. Iteration number and FWHM of the gaussian filter have a similar impact on the image features. Grid size has a larger impact on the features than iteration number and FWHM. The features that exhibited large variations such as skewness in FOS, cluster shade, and zone percentage should be used with caution. The entropy in FOS, difference entropy, inverse difference normalized, inverse difference moment normalized, low gray-level run emphasis, high gray-level run emphasis, and low gray-level zone emphasis are the most robust features.
Recent studies have shown that tumors often display startling intratumoral heterogeneity, which is often associated with adverse tumor biology (1,2). Unfortunately, it is difficult to assess intratumoral heterogeneity with random sampling or biopsy as this does not represent the full extent of phenotypic or genetic variation within a tumor. Given the limitations of current biopsy strategies, there is an important potential for medical imaging, which has the ability to capture intratumoral heterogeneity in a noninvasive way.
In oncology, 18F-FDG PET is currently playing a major role in clinical diagnosis, staging, prognosis, and assessment of response to treatment (3). The use of standardized uptake value (SUV) is now routine in clinical 18F-FDG PET/CT oncology imaging for quantifying glucose metabolic activity in tissues. Maximum SUV (SUVmax), mean SUV (SUVmean), and peak SUV (SUVpeak) are the commonly used metrics (4). However, none of these SUV measures reflect the underlying spatial distribution of 18F-FDG. There is now growing interest in using texture analysis to assess tumor heterogeneity in the field of oncologic imaging.
The first attempt to use the texture features (TFs) in PET imaging was to predict treatment outcome in cervical and head and neck cancer (5). Subsequently, there are increasing numbers of publications related to the application of TFs in PET tumor imaging, mostly with 18F-FDG (5–21). These previous studies mainly focused on the predictive and prognostic value (5–10,13–25) of TF and also the potential use in radiotherapy planning (20,21). The test–retest reproducibility of TF from 18F-FDG PET was shown to be comparable to SUV (16). In addition, Galavis et al. reported the variability of TF as a function of different acquisition methods, reconstruction parameters, and postreconstruction filtering parameters (11). However, only filtered backprojection and ordered-subset expectation maximization (OSEM) were investigated. The techniques of time-of-flight (TOF) and point-spread function (PSF) modeling have been shown to clearly improve image quality in terms of signal-to-noise ratio (SNR), lesion contrast, and coefficient of variation (COV) on background and are now standard in most scanners (22–26). TOF improves SNR and thus reduces heterogeneity due to noise. PSF modeling typically leads to improved resolution by modeling in the system matrix physical processes that degrade image resolution and thus improves evaluation of tumor heterogeneity by delineating higher definition structure within a lesion.
In this context, the objective of this study was to investigate variability of TF in 18F-FDG PET images due to different reconstruction settings including postreconstruction gaussian filters with different full width at half maximum (FWHM), different iteration numbers, and voxel size. The filtered backprojection algorithm was not included in this study, because it has been adequately addressed in a previous study (11).
MATERIALS AND METHODS
Patients
This retrospective study was approved by the Domain Specific Review Board and Institutional Review Board of National Health Care Group, Singapore, and the requirement to obtain informed consent was waived. Patients with lesions in the lung who underwent whole-body 18F-FDG PET/CT at the National University Hospital, Singapore, from February 2013 to May 2014 were retrospectively enrolled while excluding those patients with a lesion volume less than 5 cm3 (14). Only the lesions located in the lung were included for analysis. Finally, this study included 17 patients with non–small cell lung cancer (mean age, 70 y; age range, 50–82 y; 12 men, 5 women), 1 female patient with nasopharyngeal cancer (age, 71 y), and 2 female patients with lymphoma (age, 60 and 75 y). Clinical staging for the non–small cell lung cancer cohort was as follows: stage 1A (n = 1), stage 2B (n = 1), stage 3A (n = 1), stage 3B (n = 4), and stage 4 (n = 10). The patients with nasopharyngeal cancer and lymphoma were found to be stage 4.
Data Acquisition
All patients underwent a whole-body 18F-FDG PET/CT using a Biograph 64 mCT scanner (Siemens). Patients fasted for at least 8 h to ensure a serum glucose level less than 10 mmol/L. The time difference between injection and acquisition was 71 ± 9 min (range, 52–88 min) after injection of 229.4 ± 22.2 MBq (6.2 ± 0.6 mCi; range, 181.3–270.1 MBq [4.9–7.3 mCi]) of 18F-FDG. PET raw data were acquired for 1 min per bed position with the exception of the bed position covering the liver, which was acquired for 3 min. CT was performed with a tube voltage of 120 kV and a tube current of 50 mAs.
Image Reconstruction
Four investigations listed in Table 1 were performed to explore the impact of reconstruction settings on image features. The numbers of subsets for reconstructions without TOF and with TOF were 24 and 21, respectively. The image slice thickness was 5 mm. In total, 96 images were produced for each patient. The PET images were corrected for attenuation and scatter based on the CT scan.
Tumor Segmentation
In this study, SUV was calculated as 18F-FDG uptake with decay correction normalized to injected dose and patient body weight. A tumor VOI was delineated using a fixed threshold set to 40% of the SUVmax in the lesion and followed by a morphologic closing operation to include necrotic regions since tumor heterogeneity is often associated with hypoxia and subsequent necrosis (27). In addition, a manual adjustment to exclude neighboring nodes or metastases was made for each VOI if necessary. As in the study by Orlhac et al. (14), only lesions with a volume greater than 5 cm3 were included in the subsequent analysis, because the impact of the partial-volume effect and textural measurements may not be relevant in small regions. Thus, the impact of the partial-volume effect can be neglected. Previous studies (22–26) demonstrated that OSEM + PSF + TOF produced better image quality than other methods in terms of SNR, contrast, and lesion detectability. Therefore, in this study, the tumor VOIs were delineated on the image reconstructed by OSEM + PSF + TOF with default settings and then applied to the other methods. This yielded 20 VOIs in non–small cell lung cancer, 2 VOIs in nasopharyngeal cancer, and 2 VOIs in lymphoma with a volume of 36.1 ± 42.3 cm3 (range, 5.3–153.1 cm3). SUVmean, SUVpeak, and SUVmax of the lesions reconstructed by OSEM + PSF + TOF with default settings were 6.4 ± 2.7 (range, 3.1–12.0), 8.5 ± 4.0 (range, 4.0–18.8), and 10.5 ± 4.5 (range, 5.3–21.2), respectively.
Image Features
The TFs being used in 18F-FDG PET imaging can be categorized into 2 types: second-order and high-order features. A second-order feature is usually calculated from the gray level cooccurrence matrix (GLCM) (28). There are various matrices to calculate the high-order features including the gray-level run length matrix (GLRLM) (29), gray-level size zone matrix (GLSZM) (30), neighboring gray-level dependence matrix (NGLDM) (31), and neighbor gray-tone difference matrix (NGTDM) (32). To reduce image noise, resampling the original PET values is necessary. Orlhac et al. (14) suggested that at least 32 gray levels should be used in the computation of TF. A bin size of 32, 64, and 128 was investigated in this study. In addition, 6 first-order statistics (FOS) based on histogram analysis were also included. The TF and FOS and their acronyms are summarized in Supplemental Table 1 (supplemental materials are available at http://jnm.snmjournals.org). In addition, SUVmax, SUVmean, and SUVpeak (a spheric VOI having a volume of 1 mL in a position that provides the maximal VOI average) were computed for comparison.
The 3-dimensional GLCM used in this study was extended from the original 2-dimensional GLCM by summing voxel triplet probabilities in a 2-dimensional image. Thirteen different directions and a spatial distance of 1 voxel displacement were selected (14). The average TF over the 13 directions was used in this study, and the same 13 directions were used to calculate GLRLM. The 26 nearest neighbors in 3 dimensions were used for NGLDM and NGTDM. In addition, only the intensity difference of zero between the voxel and its neighbors was considered for NGTDM.
Data Analysis
To characterize the variation of image features over the different reconstruction parameters, we calculated the COV as follows:where SD and mean are the SD and the mean value of each image feature, respectively, over the different reconstruction settings. The mean value of the COV of all lesions for each TF and FOS was used to characterize feature variability. All features were categorized into 4 groups based on COV: very small (COV ≤ 5%), small (5% < COV ≤ 10%), intermediate (10% < COV ≤ 20%), and large (COV > 20%) range of variation. In the last 3 studies, all features were categorized into 4 groups based on the following rule: for each feature f, if it had a higher value of COV for one reconstruction algorithm than for the other reconstruction algorithms, it would be assigned to the group based on this higher value.
In addition, the quality of PET images reconstructed with default settings were analyzed using the SNR in the liver. A spheric mask with a diameter of 3 cm was placed in the liver, and the 4 default reconstructions were evaluated using the same mask. The SNR was defined as the ratio of the mean to the SD of the uptake within the mask.
RESULTS
Change of Image Features over Default Reconstruction Settings
PET images of a representative lung cancer patient reconstructed by the default reconstruction settings for OSEM, OSEM + TOF, OSEM + PSF, and OSEM + TOF + PSF are shown in Figure 1. The delineated lesion is also shown in green in the figure. Consistent with the previous findings (22–26), OSEM + TOF + PSF produced visually the best image among the chosen 4 reconstruction algorithms with their default settings. In addition, quantitative assessment supported the fact that OSEM + TOF + PSF produced the best image with an SNR of 9.1 in the liver, whereas the SNR of OSEM, OSEM + PSF, and OSEM + TOF were 5.2, 5.5, and 8.5, respectively. The results with a bin size of 32, 64, and 128 were similar with only a few differences. For example, the COV of energy from GLCM changed from 5% for a bin size of 64 to a range of 5%–10% for a bin size of 32. In this study, 64 was selected as the reference bin size. SUVmean and SUVpeak exhibited a small variation, with a COV less than 5%, and SUVmax had a COV in the range of 5%–10%. Most of the features (58/61) had a COV less than 20%, indicating small differences across the 4 algorithms (Table 2).
Impact of Iteration Number on Image Features
Three features (skewness, cluster shade [CS], and zone percentage [ZP]) demonstrated a large variation (COV > 20%). SUVmean, SUVpeak, and 22 features measuring heterogeneity showed a small variation (Table 3). The entropy from FOS, GLCM, and NGLDM were all insensitive to iteration number. In addition, there were 27 features in the group of small variations including contrast from GLCM and NGTDM. SUVmax had a COV less than 10%. The remaining 9 features had a COV of between 10% and 20%. The results of each individual reconstruction algorithm were comparable (Supplemental Table 2), and the image features with different range of variation across the 4 reconstruction algorithms are shown in Figure 2A. There were only 8 features displaying differences between OSEM and OSEM + PSF. In addition, CS was the only parameter exhibiting a large variation in all reconstruction algorithms.
Impact of FWHM on Image Features
The number of features (TF and FOS) in the 4 groups (very small, small, intermediate, and large variation) were 16, 16, 21, and 8, respectively (Table 4). All of the TFs from GLRLM exhibited a COV less than 20%. GLRLM also yielded the TF with variation less than 20%, except for CS. Among the features derived from FOS, only variance and skewness had large variations (COV ≥ 20%), and 4 other features (entropy, COV, kurtosis, and energy) had small variations (COV ≤ 10%). The TF from GLSZM had a large range of COV, including 4 features (short-zone emphasis, ZP, short-zone low grey-level emphasis, and long-zone high grey-level emphasis) with COV larger than 20%. The results of impact of the FWHM on TF and FOS for each individual reconstruction algorithm are shown in Supplemental Table 3. In addition, 20 image features including SUVmean, SUVpeak, and SUVmax exhibited different ranges of variation (Fig. 2B). Variance, skewness, CS, and ZP were in the large-variation group for all reconstruction algorithms. SUVmean and SUVpeak exhibited small variation, and SUVmax showed intermediate variation. Overall, the reconstruction algorithms all demonstrated similar performance.
Impact of Grid Size on Image Features
Two grid sizes (128 × 128 and 256 × 256) were used to evaluate the impact on TF and FOS with default iteration number and a FWHM of 2.5 mm (Table 5). More than half of the features (35/61) were sensitive to grid size, as demonstrated by the high COV (≥20%), including 1 of 6 features from FOS, 11 of 21 features from GLCM, 8 of 11 features from GLRLM, 6 of 13 features from GLSZM, 4 of 5 features from NGLDM, and 5 of 5 features from NGTDM. The entropy from FOS, difference entropy, inverse difference normalized (IDN), and inverse difference moment normalized (IDMN), low grey-level run emphasis, high grey-level run emphasis, and low grey-level zone emphasis were stable with respect to the grid size (COV ≤ 5%). Moreover, only sum average and sum entropy showed small variations (5% < COV ≤ 10%). Others (17/61 features) exhibited intermediate sensitivity to the grid size (10% < COV ≤ 20%). The grid size had a similar impact on TF and FOS for all the reconstruction algorithms (Supplemental Table 4), and the corresponding features, with different variations for these 4 reconstruction algorithms, are shown in Figure 2C. More than half of the 61 features (TF and FOS) demonstrated a large variation. SUVmean and SUVpeak had a COV less than 10%. The COV of SUVmax was between 10% and 20%.
DISCUSSION
Our primary aim was to investigate the impact of reconstruction settings on image features derived from 18F-FDG PET tumor images acquired with a PET/CT scanner incorporating PSF and TOF. Filtered backprojection was not included in this study because it is now rarely used clinically. Our results demonstrated that most of the features (58/61) had a COV less than 20% across the 4 algorithms with default settings, which were different from the results of variation across all possible reconstruction settings derived from FBP and OSEM in a previous study (11).
Although the results with a bin size of 32, 64, and 128 are similar, choosing the optimal bin size is still challenging. In a recent investigation of the prognostic value of the TF obtained from 18F-FDG PET images in patients with oropharyngeal carcinoma (12), uniformity from GLCM was found to be a significant prognostic factor whereas its predictive value depended on the bin size. The optimal discretization size may depend on the noise and resolution of the PET images and should be carefully considered for different objectives.
Coarseness from NGTDM was considered to be correlated with human perception of image granularity and was deemed clinically useful. The coarseness from NGTDM can be used not only to predict response of chemoradiotherapy for non–small cell lung cancer (10) and esophageal cancer (18) but also to differentiate normal tissues from head and neck tumors and lymph nodes (8). However, it was found to be sensitive to reconstruction settings with a small variation (5% < COV ≤ 10%) over the 4 default reconstruction settings, intermediate variation (10% < COV ≤ 20%) for both impact of iteration number and FWHM, and large variation (COV > 20%) to grid size. Therefore, it is desirable to standardize the reconstruction for multicenter clinical trials if such imaging biomarkers are to be used.
The smaller grid size meant that the bigger voxel size was more affected by the tissue fraction effect, and the image had a more uniform intensity distribution, which subsequently influenced TF and FOS. The grid size had the largest effect on image features, as demonstrated by 17 of the features with a COV between 10% and 20% and 35 of 61 features with a COV greater than 20% in this study (Table 5). Postreconstruction gaussian filters can improve the SNR of an image. However, there is a trade-off between noise and spatial resolution. A larger FWHM leads to poorer spatial resolution although the resultant image has less noise. The FWHM of the gaussian filter was found to be the second largest factor contributing to the variation of image features. Compared with grid size and FWHM, the iteration number had less impact on image features. In addition, TF and FOS were not sensitive to reconstruction algorithms with default settings and iteration number.
Volumetric tumor delineation is a prerequisite for computing TF and FOS. Although a large number of approaches have been proposed to segment tumors in PET images, accurate tumor segmentation is still a challenging task (33). A simple thresholding method (40% of SUVmax) was used here, possibly leading to an imperfect tumor delineation. However, the inaccurate segmentation is unlikely to change the relative difference due to different reconstruction settings for each feature. Other limitations of the present study include a relatively small patient cohort and a single lesion site. Therefore, a larger patient cohort study with different tumor types needs to be performed in the future. Moreover, it is absolutely necessary to investigate the clinical relevance and biologic basis of the PET texture features.
A new PET imaging biomarker should be insensitive to reconstruction settings or at least not worse than conventional indices such as SUV measures. In this work, as expected, SUVmax, measuring single-pixel uptake, was more sensitive to reconstruction settings than SUVmean and SUVpeak. However, SUVmax is still more robust than some TF and FOS such as skewness from FOS, CS from GLCM, and ZP from GLSZM. The image features with a smaller variation than SUVmax are indicated for further investigation as to their clinical relevance. These features include the entropy from FOS, difference entropy, IDN and IDMN from GLCM, LGRE and HGRE from GLRLM, and LGZE from GLSZM.
CONCLUSION
In this study, we analyzed the effect of reconstruction settings, including different algorithms, iteration number, FWHM of the postreconstruction gaussian filter, and grid size, on TFs and first-order features. Different TF and FOS had different sensitivities to reconstruction settings. Iteration number and the FWHM of the gaussian filter had a similar impact for all 4 reconstruction algorithms. Compared with iteration number and FWHM, grid size had a larger impact on image features. Careful selection of image features for research, clinical practice, and clinical trials is essential.
DISCLOSURE
The costs of publication of this article were defrayed in part by the payment of page charges. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734. No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Jul. 30, 2015.
- © 2015 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
REFERENCES
- Received for publication March 2, 2015.
- Accepted for publication July 6, 2015.