Skip to main content

Main menu

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
    • Continuing Education
    • JNM Podcasts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Journal Claims
    • Corporate & Special Sales
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI

User menu

  • Subscribe
  • My alerts
  • Log in
  • Log out
  • My Cart

Search

  • Advanced search
Journal of Nuclear Medicine
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI
  • Subscribe
  • My alerts
  • Log in
  • Log out
  • My Cart
Journal of Nuclear Medicine

Advanced Search

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
    • Continuing Education
    • JNM Podcasts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Journal Claims
    • Corporate & Special Sales
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • View or Listen to JNM Podcast
  • Visit JNM on Facebook
  • Join JNM on LinkedIn
  • Follow JNM on Twitter
  • Subscribe to our RSS feeds
Research ArticlePhysics and Instrumentation

Experimental Multicenter and Multivendor Evaluation of the Performance of PET Radiomic Features Using 3-Dimensionally Printed Phantom Inserts

Elisabeth Pfaehler, Joyce van Sluis, Bram B.J. Merema, Peter van Ooijen, Ralph C.M. Berendsen, Floris H.P. van Velden and Ronald Boellaard
Journal of Nuclear Medicine March 2020, 61 (3) 469-476; DOI: https://doi.org/10.2967/jnumed.119.229724
Elisabeth Pfaehler
1Department of Nuclear Medicine and Molecular Imaging, Medical Imaging Center, University Medical Center Groningen, Groningen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joyce van Sluis
1Department of Nuclear Medicine and Molecular Imaging, Medical Imaging Center, University Medical Center Groningen, Groningen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bram B.J. Merema
2Department of Oral and Maxillofacial Surgery, University Medical Center Groningen, Groningen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Peter van Ooijen
3Department of Radiology, University Medical Center Groningen, Groningen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ralph C.M. Berendsen
4Department of Medical Physics, Zuyderland Medical Center, Heerlen, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Floris H.P. van Velden
5Section of Nuclear Medicine, Department of Radiology, Leiden University Medical Center, Leiden, The Netherlands; and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ronald Boellaard
1Department of Nuclear Medicine and Molecular Imaging, Medical Imaging Center, University Medical Center Groningen, Groningen, The Netherlands
6Department of Radiology and Nuclear Medicine, VU University Medical Center, Amsterdam, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • PDF
Loading

Abstract

The sensitivity of radiomic features to several confounding factors, such as reconstruction settings, makes clinical use challenging. To investigate the impact of harmonized image reconstructions on feature consistency, a multicenter phantom study was performed using 3-dimensionally printed phantom inserts reflecting realistic tumor shapes and heterogeneity uptakes. Methods: Tumors extracted from real PET/CT scans of patients with non–small cell lung cancer served as model for three 3-dimensionally printed inserts. Different heterogeneity pattern were realized by printing separate compartments that could be filled with different activity solutions. The inserts were placed in the National Electrical Manufacturers Association image-quality phantom and scanned various times. First, a list-mode scan was acquired and 5 statistically equal replicates were reconstructed. Second, the phantom was scanned 4 times on the same scanner. Third, the phantom was scanned on 6 PET/CT systems. All images were reconstructed using EANM Research Ltd. (EARL)–compliant and locally clinically preferred reconstructions. EARL-compliant reconstructions were performed without (EARL1) or with (EARL2) point-spread function. Images were analyzed with and without resampling to 2-mm cubic voxels. Images were discretized with a fixed bin width (FBW) of 0.25 and a fixed bin number (FBN) of 64. The intraclass correlation coefficient (ICC) of each scan setup was calculated and compared across reconstruction settings. An ICC above 0.75 was regarded as high. Results: The percentage of features yielding a high ICC was largest for the statistically equal replicates (70%–91% for FBN; 90%–96% for FBW discretization). For scans acquired on the same system, the percentage decreased, but most features still resulted in a high ICC (FBN, 52%–63%; FBW, 75%–85%). The percentage of features yielding a high ICC decreased more in the multicenter setting. In this case, the percentage of features yielding a high ICC was larger for images reconstructed with EARL-compliant reconstructions: for example, 40% for EARL1 and 60% for EARL2 versus 21% for the clinically preferred setting for FBW discretization. When discretized with FBW and resampled to isotropic voxels, this benefit was more pronounced. Conclusion: EARL-compliant reconstructions harmonize a wide range of radiomic features. FBW discretization and a sampling to isotropic voxels enhances the benefits of EARL-compliant reconstructions.

  • 18F-FDG PET/CT radiomic features
  • feature harmonization
  • image reconstruction

Personalized cancer treatment is one of the main promises of modern medicine. Analyzing the combinations of patient genetics and tumor phenotype in medical images can provide additional information on treatment response and diagnosis and therefore has the potential to help in clinical decision making (1). One part of this approach is the rapidly growing field of radiomics, which aims to extract a large number of feature values from medical images describing tumor phenotype and tumor inter- and intraheterogeneity (2–4). In PET/CT images, radiomics has shown promising results in the assessment of treatment response and patient survival for several cancer types, such as head-and-neck or lung cancer (5,6).

Besides these positive results, many studies reported on the limitations and challenges of radiomics, including the sensitivity of feature values to differences in reconstruction algorithm, voxel size, smoothing, and discretization method (7–9). To make radiomic studies comparable over patients, institutions, and scanners, it is essential that radiomic features be harmonized across centers. The European Association of Nuclear Medicine (EANM) attempts to reduce this variability of measurements in multicenter clinical trials in its EANM Research Ltd. (EARL) accreditation program (10). For this purpose, it harmonizes basic SUV features based on the SUVmax, SUVmean, and SUVpeak by comparing phantom scans of the National Electrical Manufacturers Association (NEMA) NU2-2012 image-quality phantom. For this purpose, centers choose 1 reconstruction setting that is in line with the standards provided by EARL and uses an iterative reconstruction algorithm (EARL1). It has been shown that reconstructions including resolution modeling (based on the point-spread function [PSF]) can be used to harmonize PET/CT systems (EARL2) (11). Additional to the EARL-compliant reconstructions, every center usually also applies 1 reconstruction with settings leading to optimal lesion detection, which is used for clinical reads. As illustrated in Figure 1, the quality of a PET/CT image differs across these 3 reconstruction settings, which therefore have a high impact on the extracted radiomic features (Table 1).

FIGURE 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 1.

In patient with non–small cell lung cancer, Biograph Vision PET scan reconstructed with EARL2, EARL1, and clinically preferred reconstruction (from left to right).

View this table:
  • View inline
  • View popup
TABLE 1

Radiomic Features of Patient Displayed in Figure 1 Found to Give Valuable Information About Survival in Lung Cancer Patients (31) for Different Reconstruction Settings

The EARL harmonization is based on basic SUV features. To the best of our knowledge, no multicenter experimental study has yet investigated the effect of EARL harmonization on the variability of complex radiomic features. For this purpose, 1 object that reflects realistic heterogeneity uptake has to be scanned at multiple centers, and the feature values across centers have to be compared. Commercially available phantoms such as the NEMA image-quality phantom are not optimal, as they contain only spheric and homogeneous-uptake objects. Therefore, in this study, 3-dimensionally printed phantom inserts were designed and built according to tumors extracted from typical PET scans and reflecting more realistic uptake distributions than seen with spheres. These inserts were scanned at 3 institutions on 6 different PET/CT systems. Feature values were extracted from EARL-compliant (EARL1 and EARL2) and local clinically preferred reconstructions. The reliability, repeatability, and reproducibility of radiomic features were reported.

MATERIALS AND METHODS

Phantom Design and 3-Dimensional Printing

Three 3-dimensionally printed phantom inserts were used in this study. PET scans of patients with non–small cell lung cancer served as models for the inserts. For this purpose, several non–small cell lung cancer tumors showing various heterogeneity uptake pattern were visually checked. Three tumors with different shapes and uptake characteristics were selected as models for the 3-dimensional printing. These tumors were segmented, slightly smoothed, scaled, and converted to a stereolithography file to make the printing possible. Differences in heterogeneity uptake were realized by printing 2 separate compartments that could be filled with different activity solutions. The heterogeneity uptake patterns include a homogeneous tumor (tumor 1), a tumor with heterogeneity uptake in the sagittal view (tumor 2), and a tumor with a necrotic core (tumor 3). The sizes of the inserts are displayed in Table 2. The printing was performed by a Form 2 printer (Formlabs Inc.), which relies on a stereolithography technique to cure its photopolymeric clear resin (FLGPCL02; Formlabs Inc.). A picture of the 3-dimensional inserts and the corresponding tumors is displayed in Figure 2. The inserts were placed at equal distances in the NEMA NU-2 image-quality phantom. The feature values of the phantom inserts were verified to be within the range of radiomic feature values extracted from 10 18F-FDG PET/CT studies of non–small cell lung cancer patients (12). More than 82% of the features are well within the clinically expected range, and only 1.6% show a large variation from the clinical data. Therefore, the inserts generate feature values that are representative of clinical data.

View this table:
  • View inline
  • View popup
TABLE 2

Size of 3-Dimensionally Printed Inserts

FIGURE 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 2.

(Top) PET/CT images of original tumor (left) and phantom insert (right) for tumors 1, 2, and 3 (from left to right). (Bottom) Corresponding stereolithographed models with tumor-to-background ratio (TBR).

Phantom Scans

To obtain features comparable across institutions and PET/CT systems, only features that are reliable, repeatable, and reproducible should be used. Reliable features are defined as those yielding only marginal differences when extracted from images obtained under exactly the same conditions, and repeatable features are features that result in small differences when extracted from various scans of the same subject. Reproducibility refers to features that remain almost the same when acquired using different PET/CT systems, image acquisition settings, and reconstruction settings.

To measure reliability, the NEMA image-quality phantom containing the inserts was scanned once on a Biograph mCT64 (Siemens Healthcare). The scan was acquired in list mode, and 5 statistical replicates of 60 s were reconstructed. Three different reconstruction settings were applied: An EARL-compliant reconstruction (EARL1, time of flight [TOF] with gaussian smoothing of 5 mm in full width at half maximum), an EARL-compliant reconstruction including PSF (EARL2, PSF + TOF with gaussian smoothing of 5 mm in full width at half maximum), and the clinically preferred setting of this institution (PSF + TOF with gaussian smoothing of 7 mm in full width at half maximum). The homogeneous insert, the outer part of the necrotic core, and the lower part of the third insert were filled with an activity solution that achieved a tumor-to-background ratio of around 10:1. The upper part of the third tumor was filled with an activity solution leading to a tumor-to-background ratio of 5:1, and the necrotic core of the tumor and spheres were filled with water (Fig. 2). The 5 statistically equal replicates represent an ideal situation because the 5 images differ only in noise pattern.

To measure repeatability, the phantom was scanned 4 times on the same system (Biograph mCT64) independently. That is, for every scan, the phantom was filled with an activity solution and placed at a slightly different position in the scanner. For differences in phantom filling, the scan duration was adjusted so that statistically equal replicates were obtained. The exact amount of activity in tumors, spheres, and background is listed in Table 3 for each scan. Images were reconstructed using the same reconstruction settings as described above. For every scan, the inserts were delineated separately, which could lead to slightly different delineations. Therefore, this scenario reflects a more realistic clinical setup.

View this table:
  • View inline
  • View popup
TABLE 3

Activity in Phantom Background and Tumor Inserts for 4 Scans Acquired on Same Scanner and Multicenter Setting

Furthermore, a multicenter study was performed to measure reproducibility. The inserts were scanned at 3 institutions on 6 PET/CT systems including 4 manufactured by Siemens Healthcare (Biograph mCT40, Biograph mCT64, Horizon with an extra ring of detectors [TrueV option], and Biograph Vision), 1 by Philips Healthcare (Vereos), and 1 by GE Healthcare (Discovery MI 4 ring). The data were reconstructed with a clinically relevant scan duration of 60 s. The scan duration was adjusted for differences in phantom filling across centers. Table 3 lists the phantom fillings for each scan. Also, images were reconstructed using the scanner-defined reconstruction settings complying with the EANM standards (EARL1 and EARL2), as well as using the locally clinically preferred settings of each institution. The applied reconstruction algorithm, matrix size, and smoothing kernel for the reconstructed images are listed in Table 4. The inserts were segmented separately for each scan.

View this table:
  • View inline
  • View popup
TABLE 4

Applied Reconstruction Algorithm, Matrix Size, and Smoothing Factor for Each Scanner

PET Analysis

Segmentations were performed with in-house–developed software for the analysis and segmentation of PET images. Segmentations were done manually on the low-dose CT portion of each scan.

In-house–developed software for the calculation of radiomic features programmed in C++ was used for feature calculation (13). All calculated feature values follow the definitions of the Image Biomarker Standardization Initiative and have been tested to be in compliance with the available benchmarks (14). In total, 436 radiomic features were extracted. Before feature calculation, the images were converted to SUVs so that the phantom background had an SUVmean of 1. Features were calculated for images consisting of the original voxel size, as well as for images resampled to 2-mm cubic voxels as recommended (15). Image and binary segmentation masks were resampled using trilinear interpolation. Before the extraction of textural features, images were discretized using a fixed bin number (FBN) of 64 and a fixed bin width (FBW) of 0.25.

Statistical Analysis

Data analysis was performed with Python, version 3.6.3, using the packages numPy, sciPy, and matplotlib (16) for figure plotting. Statistical analysis was performed using R within the Python environment with the Python-R interface rPy2.

Feature Reliability, Repeatability, and Reproducibility

To measure feature consistency (i.e., reliability, repeatability, and reproducibility) for the 3 different scan setups, the intraclass correlation coefficient (ICC) was calculated using the irr package (version 0.84), available from the Comprehensive R Archive Network (http://www.r-project.org). A 2-way single-measure model was used to evaluate the consistency of features for all scans. Every 3-dimensionally printed insert was regarded as a tumor in a patient, and each scan was regarded as 1 observer. The ICC is defined as the ratio of intercluster variability and the sum of intercluster and intracluster variability. Therefore, ICCs vary from 0 to 1, with 1 representing perfect agreement. Furthermore, a high ICC implies that the intracluster variability is low when compared with the intercluster variability, indicating that a feature with a high ICC can distinguish well between inserts. An ICC higher than 0.9 is regarded as excellent, values between 0.75 and 0.9, between 0.6 and 0.75, and below 0.6 are regarded as good, moderate, and poor, respectively (17).

ICCs were compared between reconstruction settings, discretization methods, and original versus resampled data using a nonparametric permutation test. A permutation test compares 2 groups by checking differences in test statistics for the groups. The test randomly swaps the elements of both groups for all possible combinations. If the statistics do not change after swapping, the null hypothesis cannot be rejected. All P values below 0.01 were considered statistically significant. A Benjamini–Hochberg procedure with a false discovery rate of 0.25 was performed to diminish the chance of a type I error for multiple comparisons. The permutation test was performed using the R package perm (version 1.0-0.0) for each feature group separately.

RESULTS

All calculated radiomic features are listed in Supplemental Files 1, 2, and 3 (for EARL1, EARL2, and clinical reconstructions, respectively; supplemental materials are available at http://jnm.snmjournals.org), including their ICCs for each reconstruction setting and discretization method.

Figure 3 displays the percentage of features resulting in an excellent, good, moderate, or bad ICC sorted by feature groups for the statistically equal replicates and both discretization methods. The total percentage of excellent, good, and moderate ICCs was comparable across all reconstruction settings, with the highest values being for FBW discretization (96.7% for EARL1, 97.4% for EARL2, and 97.9% for the clinically preferred setting vs. 83.2%, 94.2%, and 94.7%, respectively, for FBN discretization) (Supplemental Table 1). The EARL1 setting yielded the lowest percentage of features with an excellent ICC. When the feature groups were compared, the differences in ICCs were significant only for gray-level run-length matrix features (P < 0.01). A discretization with FBW resulted in more reliable features than FBN discretization, but the ICCs resulted in significant differences only for gray-level cooccurrence matrix features. Resampling to cubic voxels had almost no effect on reliability, although it led to a slight increase in the number of reliable features (Supplemental Fig. 1) with no significant differences in ICCs.

FIGURE 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 3.

Percentage of features extracted from 5 statistically equal replicates yielding excellent, good, moderate, or bad ICC for FBN and FBW discretization for different feature groups. GLCM = gray-level cooccurrence matrix; GLRLM = gray-level run-length matrix; NGLDM = neighboring gray-level dependence matrix; GLSZM = gray-level size-zone matrix; GLDZM = gray-level distance-zone matrix; NGTDM = Neighboring gray-tone difference matrix; Stat = intensity-based statistics; Morph = morphology; LocInt = local intensity; IntHist = intensity histogram; IntVol = intensity volume.

By comparison, the percentages of features yielding excellent, good, moderate, or bad ICCs for the 4 scans acquired on the same system are displayed in Figure 4. The number of features yielding an excellent ICC decreased when compared with the 5 statistically equal replicates. However, most features still resulted in a good or moderate ICC. Also, discretization with FBW led to the highest percentage of features with a moderate or better ICC (87.8% for EARL1, 90.3% for EARL2, and 91.8% for the clinically preferred reconstruction vs. 78.2%, 82.1%, and 77.1%, respectively, for FBN discretization), a slight increase after resampling (Supplemental Table 2), and significant differences for gray-level cooccurrence matrix features (P < 0.01). The differences between clinically preferred and EARL-compliant reconstructions also were not significant, but the clinically preferred reconstruction yielded the highest percentage, and the EARL1 setting the lowest percentage, of repeatable features. The only feature group whose features were less repeatable after resampling were the morphologic features (Supplemental Fig. 2).

FIGURE 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 4.

Percentage of features extracted from 4 scans acquired on same PET/CT system yielding excellent, good, moderate, or bad ICC for FBN and FBW discretization. GLCM = gray-level cooccurrence matrix; GLRLM = gray-level run-length matrix; NGLDM = neighboring gray-level dependence matrix; GLSZM = gray-level size-zone matrix; GLDZM = gray-level distance-zone matrix; NGTDM = Neighboring gray-tone difference matrix; Stat = intensity-based statistics; Morph = morphology; LocInt = local intensity; IntHist = intensity histogram; IntVol = intensity volume.

In the multicenter setting, the percentage of features yielding a moderate or better ICC was low when compared with the other scan settings (Fig. 5). Also, discretization with FBW led to the largest percentage of features with an ICC higher than 0.6 (71.7% for EARL1, 84.9% for EARL2, and 32.3% for the clinically preferred setting vs. 49.3%, 49.5%, and 38%, respectively, for FBN discretization). Significant differences in ICCs between the 2 discretization methods were found only for the EARL-compliant reconstructions and some textural feature groups (gray-level cooccurrence matrix and gray-level run-length matrix features for both EARL-compliant reconstructions, neighboring gray-level dependence matrix and gray-level size-zone matrix for EARL2). For discretization with FBN, only small and nonsignificant discrepancies could be observed between the reconstruction settings. However, for FBW discretization, the difference between EARL-compliant reconstructions and clinically preferred reconstructions led to significant differences for most textural feature groups. In the multicenter setting, the local clinically preferred reconstructions differed substantially between sites and scanners, whereas this was not the case in the single-scanner experiments described. Significant differences in ICCs between EARL1 and EARL2 were observed only for gray-level cooccurrence matrix features and gray-level run-length matrix features when discretized with FBW. A resampling to cubic voxels was beneficial, especially for textural feature groups, although the differences were not significant (Supplemental Fig. 3). In addition, the only feature group resulting in less reproducible features after resampling was the group of morphologic features, for which a significant difference was observed (Supplemental Table 3).

FIGURE 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 5.

Percentage of features extracted from multicenter setting yielding excellent, good, moderate, or bad ICC for FBN and FBW discretization. GLCM = gray-level cooccurrence matrix; GLRLM = gray-level run-length matrix; NGLDM = neighboring gray-level dependence matrix; GLSZM = gray-level size-zone matrix; GLDZM = gray-level distance-zone matrix; NGTDM = Neighboring gray-tone difference matrix; Stat = intensity-based statistics; Morph = morphology; LocInt = local intensity; IntHist = intensity histogram; IntVol = intensity volume.

DISCUSSION

To the best of our knowledge, this was the first multicenter and multivendor experimental study to investigate the impact of EARL-compliant reconstructions on the repeatability and reproducibility of radiomic features. Our results suggest that in a multicenter setting, the use of EARL-compliant reconstructions leads to a larger number of reproducible features. A reason might be that the clinically preferred reconstructions varied widely in spatial resolution and contrast recovery across PET/CT systems. Because radiomic features are sensitive to resolution and image noise, these variations could be the reason for a higher variation in radiomic features (18). This possibility is in line with the fact that differences in feature consistency between reconstruction settings were not visible in the 5 statistically equal replicates and the 4 scans acquired on the same scanner, for which the same local clinically preferred reconstruction was applied.

In the multicenter setting, EARL-compliant images yield comparable image quality. This might be the reason for the low differences in reliability, repeatability, and reproducibility for these 2 reconstruction settings. This result is in line with the findings of Kaalep et al., who reported that a harmonization of PET/CT systems using PSF reconstructions is feasible (11). Furthermore, our results support the findings of Lasnon et al., who showed that images reconstructed with PSF and in line with the EARL standard can be used for the harmonization of radiomic features (19).

Although EARL-compliant reconstructions yield similar contrast recoveries, the amount of smoothing for clinically preferred settings differed across PET/CT systems. The lower spatial resolution with EARL-compliant reconstructions seems to be beneficial in terms of repeatability and reproducibility but might also eliminate important heterogeneity information that is visible in some of the clinically preferred reconstructions. This effect is lower in the updated EARL standards (EARL2), which yield higher contrast recoveries and spatial resolution and are therefore preferred for future multicenter studies. One limitation of this study is that we do not report the accuracy of feature values. Because it was demonstrated before that radiomic features are biased as a function of acquisition parameters, image reconstruction settings, and noise (18,20,21), there is an urgent need for standardization of feature values to reduce the variability (in bias) of radiomic features across centers. Therefore, we focused on feature consistency and the feasibility of using existing harmonization procedures to improve the reproducibility of radiomic features. Nonetheless, because a high ICC also indicates that features can differentiate well between inserts, our results suggest that EARL-compliant reconstructions also result in more meaningful features, especially when using the EARL2 settings. This is in line with the findings of Aide et al., who showed that images reconstructed with higher-resolution reconstructions improved the characterization of breast tumors when compared with EARL1 (22).

Use of physical phantoms also has limitations, as the 3-dimensionally printed inserts reflect only 3 coarse heterogeneity patterns. However, they provide a more realistic scenario than publicly available phantoms containing only spheres. Furthermore, phantoms have the advantage of providing a more reproducible setting than patient scans, because the activity solution within the spheres and background can be matched closely across experiments performed in different institutions.

Moreover, our study confirms previous findings (on clinical datasets) such as the impact of image discretization on the reliability and repeatability of radiomic features. Previous studies reported better repeatability and less sensitivity to differences in delineations for FBW discretization (7,10,23). Furthermore, Orlhac et al. demonstrated that discretization with FBW led to more meaningful features—that is, features that can distinguish well between tumor types (23). Our results also confirm the benefit of discretization with FBW, as it resulted in more consistent features, especially for EARL-compliant reconstructions.

The impact of voxel size on radiomic feature values has also been studied before (24,25). Hatt et al. recommended the use of isotropic voxels with voxel size of 2 mm (15). Our study supports this recommendation. Especially in the multicenter setting, a resampling to cubic voxels led to better reproducibility of radiomic features. A possible explanation might be that a common voxel size might lead to more comparable features because a large number of features are sensitive to differences in slice thickness and voxel size (26,27). The only feature group not benefiting from resampling were the morphologic features. This effect was observed only in the scan setups in which each scan was segmented separately. A possible reason might be that the resampling of the tumor segmentation might lead to different results depending on the initial position of the delineation in the image.

The impact of tumor delineation on the sensitivity of radiomic features was also reported previously (7,28,29). Our results confirm this finding, as the number of features yielding an excellent ICC decreased from the 5 statistically equal replicates to the 4 scans acquired on the same system (with repositioning and thus redefinition of tumor delineation). However, differences in number of features resulting in a moderate or better ICC might also be caused by differences in phantom filling and phantom positioning. Mansor et al. demonstrated that basic SUV features (SUVmax, SUVpeak, and SUVmean) are affected by phantom repositioning (30), so it is likely that repositioning also affects more complex textural features. However, as patient repositioning and differences in tumor delineation across institutions are part of the general clinical workflow, it is questionable if features highly sensitive to these changes are feasible for use in radiomic analysis in the clinic.

CONCLUSION

This study reports on the impact of EARL-compliant reconstructions on the reliability, repeatability, and reproducibility of radiomic features in comparison with clinically preferred reconstructions. Our results show that the use of EARL-compliant reconstructions is beneficial and leads to a larger number of reliable, repeatable, and reproducible features. Discretization with FBW and resampling to cubic 2-mm voxels increases the percentage of consistent features. The study suggests that EARL-compliant reconstructions should be used for radiomic analysis, especially in a multicenter setting. Use of the updated EARL2 standards is preferred because they have higher contrast recovery and spatial resolution while providing radiomic performance similar to the EARL1 standards (11).

DISCLOSURE

This work is part of the STRaTeGy research program (project 14929), which is (partly) financed by The Netherlands Organisation for Scientific Research (NWO). This study was financed by the POINTING project of the Dutch Cancer Society (grant 10034). No other potential conflict of interest relevant to this article was reported.

KEY POINTS

  • QUESTION: Which reconstruction algorithm leads to the most stable radiomic features in a multicenter and multivendor setting?

  • PERTINENT FINDINGS: Harmonized image reconstructions (EARL-compliant) led to a larger number of reliable, repeatable, and reproducible radiomic features. This effect increased when images were discretized with a FBW and resampled to isotropic voxels before feature extraction.

  • IMPLICATIONS FOR PATIENT CARE: To make radiomic features comparable across multiple centers, multicenter radiomic studies should be performed using harmonized (EARL-compliant) reconstructions, and images should be discretized using a FBW and resampled to isotropic voxels.

Acknowledgments

We thank Hinke Schokker and Johan R. de Jong for help with the phantom scans.

Footnotes

  • Published online Aug. 16, 2019.

  • © 2020 by the Society of Nuclear Medicine and Molecular Imaging.

REFERENCES

  1. 1.↵
    1. Aerts HJWL,
    2. Velazquez ER,
    3. Leijenaar RTH,
    4. et al
    . Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:4006.
    OpenUrlCrossRefPubMed
  2. 2.↵
    1. Avanzo M,
    2. Stancanello J,
    3. El I
    . Beyond imaging: the promise of radiomics. Phys Med. 2017;38:122–139.
    OpenUrlCrossRef
  3. 3.
    1. Lambin P,
    2. Rios-Velazquez E,
    3. Leijenaar R.
    Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–446.
    OpenUrlCrossRefPubMed
  4. 4.↵
    1. Gillies RJ,
    2. Kinahan PE,
    3. Hricak H
    . Radiomics: images are more than pictures, they are data. Radiology. 2016;278:563–577.
    OpenUrlCrossRefPubMed
  5. 5.↵
    1. Zhang Y,
    2. Oikonomou A,
    3. Wong A,
    4. Haider MA,
    5. Khalvati F
    . Radiomics-based prognosis analysis for non-small cell lung cancer. Sci Rep. 2017;7:46349.
    OpenUrl
  6. 6.↵
    1. Parmar C,
    2. Leijenaar RTH,
    3. Grossmann P,
    4. et al
    . Radiomic feature clusters and prognostic signatures specific for lung and head & neck cancer. Sci Reports. 2015;5:11044.
    OpenUrl
  7. 7.↵
    1. van Velden FHP,
    2. Kramer GM,
    3. Frings V,
    4. et al
    . Repeatability of radiomic features in non-small-cell lung cancer [18F]FDG-PET/CT studies: impact of reconstruction and delineation. Mol Imaging Biol. 2016;18:788–795.
    OpenUrl
  8. 8.
    1. Leijenaar RTH,
    2. Carvalho S,
    3. Velazquez ER,
    4. et al
    . Stability of FDG-PET radiomics features: an integrated analysis of test-retest and inter-observer variability. Acta Oncol. 2013;52:1391–1397.
    OpenUrlCrossRefPubMed
  9. 9.↵
    1. Desseroit M-C,
    2. Tixier F,
    3. Weber WA,
    4. et al
    . Reliability of PET/CT shape and heterogeneity features in functional and morphologic components of non–small cell lung cancer tumors: a repeatability analysis in a prospective multicenter cohort. J Nucl Med. 2017;58:406–411.
    OpenUrlAbstract/FREE Full Text
  10. 10.↵
    1. Leijenaar RTH,
    2. Nalbantov G,
    3. Carvalho S,
    4. et al
    . The effect of SUV discretization in quantitative FDG-PET radiomics: the need for standardized methodology in tumor texture analysis. Sci Rep. 2015;5:11075.
    OpenUrlCrossRefPubMed
  11. 11.↵
    1. Kaalep A,
    2. Sera T,
    3. Rijnsdorp S,
    4. et al
    . Feasibility of state of the art PET/CT systems performance harmonisation. Eur J Nucl Med Mol Imaging. 2018;45:1344–1361.
    OpenUrl
  12. 12.↵
    1. Kolinger GD,
    2. Vállez García D,
    3. Kramer GM,
    4. et al
    . Repeatability of [18F]FDG PET/CT total metabolic active tumour volume and total tumour burden in NSCLC patients. EJNMMI Res. 2019;9:14.
    OpenUrl
  13. 13.↵
    1. Pfaehler E,
    2. Zwanenburg A,
    3. de Jong JR,
    4. Boellaard R
    . RaCaT: an open source and easy to use radiomics calculator tool. PLoS One. 2019;14:e0212223.
    OpenUrl
  14. 14.↵
    1. Zwanenburg A,
    2. Leger S,
    3. Vallières M,
    4. Löck S
    . The image biomarker standardisation initiative. arXiv.org website. https://arxiv.org/pdf/1612.07003.pdf. Published 2016. Accessed October 16, 2019.
  15. 15.↵
    1. Hatt M,
    2. Tixier F,
    3. Pierce L,
    4. et al
    . Characterization of PET/CT images using texture analysis: the past, the present…any future? Eur J Nucl Med Mol Imaging. 2017;44:151–165.
    OpenUrl
  16. 16.↵
    1. Oliphant TE
    . Python for scientific computing. Comput Sci Eng. 2007;9:10–20.
    OpenUrlCrossRef
  17. 17.↵
    1. Koo TK,
    2. Li MY
    . A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–163.
    OpenUrlCrossRefPubMed
  18. 18.↵
    1. Pfaehler E,
    2. Beukinga RJ,
    3. de Jong JR,
    4. et al
    . Repeatability of 18F-FDG PET radiomic features: a phantom study to explore sensitivity to image reconstruction settings, noise, and delineation method. Med Phys. 2019;46:665–678.
    OpenUrl
  19. 19.↵
    1. Lasnon C,
    2. Majdoub M,
    3. Lavigne B,
    4. et al
    . 18F-FDG PET/CT heterogeneity quantification through textural features in the era of harmonisation programs: a focus on lung cancer. Eur J Nucl Med Mol Imaging. 2016;43:2324–2335.
    OpenUrl
  20. 20.↵
    1. Nyflot MJ,
    2. Yang F,
    3. Byrd D,
    4. Bowen SR,
    5. Sandison GA,
    6. Kinahan PE
    . Quantitative radiomics: impact of stochastic effects on textural feature analysis implies the need for standards. J Med Imaging (Bellingham). 2015;2:041002.
    OpenUrl
  21. 21.↵
    1. Yan J,
    2. Chu-Shern JL,
    3. Loi HY,
    4. et al
    . Impact of image reconstruction settings on texture features in 18F-FDG PET. J Nucl Med. 2015;56:1667–1673.
    OpenUrlAbstract/FREE Full Text
  22. 22.↵
    1. Aide N,
    2. Salomon T,
    3. Blanc-Fournier C,
    4. Grellard J-M,
    5. Levy C,
    6. Lasnon C
    . Implications of reconstruction protocol for histo-biological characterisation of breast cancers using FDG-PET radiomics. EJNMMI Res. 2018;8:114.
    OpenUrl
  23. 23.↵
    1. Orlhac F,
    2. Soussan M,
    3. Chouahnia K,
    4. Martinod E,
    5. Buvat I
    . 18F-FDG PET-derived textural indices reflect tissue-specific uptake pattern in non-small cell lung cancer. PLoS One. 2015;10:e0145063.
    OpenUrlCrossRefPubMed
  24. 24.↵
    1. Orlhac F,
    2. Nioche C,
    3. Soussan M,
    4. Buvat I
    . Understanding changes in tumor texture indices in PET: a comparison between visual assessment and index values in simulated and patient data. J Nucl Med. 2017;58:387–392.
    OpenUrlAbstract/FREE Full Text
  25. 25.↵
    1. Orlhac F,
    2. Theze B,
    3. Soussan M,
    4. Boisgard R,
    5. Buvat I
    . Multiscale texture analysis: from 18F-FDG PET images to histologic images. J Nucl Med. 2016;57:1823–1828.
    OpenUrlAbstract/FREE Full Text
  26. 26.↵
    1. Vallières M,
    2. Freeman CR,
    3. Skamene SR,
    4. El Naqa I
    . A radiomics model from joint FDG-PET and MRI texture features for the prediction of lung metastases in soft-tissue sarcomas of the extremities. Phys Med Biol. 2015;60:5471–5496.
    OpenUrlCrossRefPubMed
  27. 27.↵
    1. Papp L,
    2. Rausch I,
    3. Grahovac M,
    4. Hacker M,
    5. Beyer T
    . Optimized feature extraction for radiomics analysis of 18F-FDG-PET imaging. J Nucl Med. 2019;60:864–872.
    OpenUrlAbstract/FREE Full Text
  28. 28.↵
    1. Bashir U,
    2. Azad G,
    3. Siddique MM,
    4. et al
    . The effects of segmentation algorithms on the measurement of 18F-FDG PET texture parameters in non-small cell lung cancer. EJNMMI Res. 2017;7:60.
    OpenUrl
  29. 29.↵
    1. Altazi BA,
    2. Zhang GG,
    3. Fernandez DC,
    4. et al
    . Reproducibility of F18-FDG PET radiomic features for different cervical tumor segmentation methods, gray-level discretization, and reconstruction algorithms. J Appl Clin Med Phys. 2017;18:32–48.
    OpenUrl
  30. 30.↵
    1. Mansor S,
    2. Pfaehler E,
    3. Heijtel D,
    4. Lodge MA,
    5. Boellaard R,
    6. Yaqub M
    . Impact of PET/CT system, reconstruction protocol, data analysis method, and repositioning on PET/CT precision: an experimental evaluation using an oncology and brain phantom. Med Phys. 2017;44:6413–6424.
    OpenUrl
  31. 31.↵
    1. Sollini M,
    2. Cozzi L,
    3. Antunovic L,
    4. Chiti A,
    5. Kirienko M
    . PET radiomics in NSCLC: state of the art and a proposal for harmonization of methodology. Sci Rep. 2017;7:358.
    OpenUrl
  • Received for publication April 11, 2019.
  • Accepted for publication July 24, 2019.
PreviousNext
Back to top

In this issue

Journal of Nuclear Medicine: 61 (3)
Journal of Nuclear Medicine
Vol. 61, Issue 3
March 1, 2020
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Journal of Nuclear Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Experimental Multicenter and Multivendor Evaluation of the Performance of PET Radiomic Features Using 3-Dimensionally Printed Phantom Inserts
(Your Name) has sent you a message from Journal of Nuclear Medicine
(Your Name) thought you would like to see the Journal of Nuclear Medicine web site.
Citation Tools
Experimental Multicenter and Multivendor Evaluation of the Performance of PET Radiomic Features Using 3-Dimensionally Printed Phantom Inserts
Elisabeth Pfaehler, Joyce van Sluis, Bram B.J. Merema, Peter van Ooijen, Ralph C.M. Berendsen, Floris H.P. van Velden, Ronald Boellaard
Journal of Nuclear Medicine Mar 2020, 61 (3) 469-476; DOI: 10.2967/jnumed.119.229724

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Experimental Multicenter and Multivendor Evaluation of the Performance of PET Radiomic Features Using 3-Dimensionally Printed Phantom Inserts
Elisabeth Pfaehler, Joyce van Sluis, Bram B.J. Merema, Peter van Ooijen, Ralph C.M. Berendsen, Floris H.P. van Velden, Ronald Boellaard
Journal of Nuclear Medicine Mar 2020, 61 (3) 469-476; DOI: 10.2967/jnumed.119.229724
Twitter logo Facebook logo LinkedIn logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Bookmark this article

Jump to section

  • Article
    • Abstract
    • MATERIALS AND METHODS
    • RESULTS
    • DISCUSSION
    • CONCLUSION
    • DISCLOSURE
    • Acknowledgments
    • Footnotes
    • REFERENCES
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • PDF

Related Articles

  • This Month in JNM
  • PubMed
  • Google Scholar

Cited By...

  • Is PET Radiomics Useful to Predict Pathologic Tumor Response and Prognosis in Locally Advanced Cervical Cancer?
  • Stacking Ensemble Learning-Based [18F]FDG PET Radiomics for Outcome Prediction in Diffuse Large B-Cell Lymphoma
  • Effects of Tracer Uptake Time in Non-Small Cell Lung Cancer 18F-FDG PET Radiomics
  • Quantitative Radiomics Features in Diffuse Large B-Cell Lymphoma: Does Segmentation Method Matter?
  • A Guide to ComBat Harmonization of Imaging Biomarkers in Multicenter Studies
  • Google Scholar

More in this TOC Section

Physics and Instrumentation

  • Performance Evaluation of the uMI Panorama PET/CT System in Accordance with the National Electrical Manufacturers Association NU 2-2018 Standard
  • A Multicenter Study on Observed Discrepancies Between Vendor-Stated and PET-Measured 90Y Activities for Both Glass and Resin Microsphere Devices
  • Ultra-Fast List-Mode Reconstruction of Short PET Frames and Example Applications
Show more Physics and Instrumentation

Basic

  • Dopamine D1 Receptor Agonist PET Tracer Development: Assessment in Nonhuman Primates
  • Optical Navigation of the Drop-In γ-Probe as a Means to Strengthen the Connection Between Robot-Assisted and Radioguided Surgery
  • Synthesis and Preclinical Evaluation of a 68Ga-Labeled Adnectin, 68Ga-BMS-986192, as a PET Agent for Imaging PD-L1 Expression
Show more Basic

Similar Articles

Keywords

  • 18F-FDG PET/CT radiomic features
  • feature harmonization
  • Image Reconstruction
SNMMI

© 2025 SNMMI

Powered by HighWire