Quantitative Radiomics Features in Diffuse Large B-Cell Lymphoma: Does Segmentation Method Matter?

Jakoba J. Eertink; Elisabeth A.G. Pfaehler; Sanne E. Wiegers; Tim van; de Brug; Pieternella J. Lugtenburg; Otto S. Hoekstra; Josée M. Zijlstra; Henrica C.W. de Vet; Ronald Boellaard

doi:10.2967/jnumed.121.262117

Visual Abstract

Abstract

Radiomics features may predict outcome in diffuse large B-cell lymphoma (DLBCL). Currently, multiple segmentation methods are used to calculate metabolic tumor volume (MTV). We assessed the influence of segmentation method on the discriminative power of radiomics features in DLBCL at the patient level and for the largest lesion. Methods: Fifty baseline ¹⁸F-FDG PET/CT scans of DLBCL patients with progression or relapse within 2 years after diagnosis were matched on uptake time and reconstruction method with 50 baseline PET/CT scans of DLBCL patients without progression. Scans were analyzed using 6 semiautomatic segmentation methods (SUV threshold of 4.0 [SUV4.0], SUV threshold of 2.5, 41% of SUV_max, 50% of SUV_peak, a majority vote segmenting voxels detected by ≥2 methods, and a majority vote segmenting voxels detected by ≥3 methods). On the basis of these segmentations, 490 radiomics features were extracted at the patient level, and 486 features were extracted for the largest lesion. To quantify the agreement between features extracted from different segmentation methods, the intraclass correlation (ICC) agreement was calculated for each method compared with SUV4.0. The feature space was reduced by deleting features that had high Pearson correlations (≥0.7) with the previously established predictors MTV or SUV_peak. Model performance was assessed using stratified repeated cross validation with 5 folds and 2,000 repeats, yielding the mean receiver-operating-characteristics curve integral for all segmentation methods using logistic regression with backward feature selection. Results: The percentage of features yielding an ICC of at least 0.75, compared with the SUV4.0 segmentation, was lowest for 50% of SUV_peak both at the patient level and for the largest lesion, with 77.3% and 66.7% of the features yielding an ICC of at least 0.75, respectively. Features did not correlate strongly with MTV, with at least 435 features at the patient level and 409 features for the largest lesion for all segmentation methods having a correlation coefficient of less than 0.7. Features correlated strongly with SUV_peak (at least 190 at patient level and 134 for the largest lesion were uncorrelated to SUV_peak, respectively). Receiver-operating-characteristics curve integrals ranged between 0.69 ± 0.11 and 0.84 ± 0.09 at the patient level and between 0.69 ± 0.11 and 0.73 ± 0.10 at the lesion level. Conclusion: Even though there are differences in the actual radiomics feature values derived and selected features among segmentation methods, there is no substantial difference in the discriminative power of radiomics features among segmentation methods.

Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of non-Hodgkin lymphoma. To improve the outcome of patients with DLBCL, early identification of patients at risk of treatment failure is of the utmost importance, as 25%–40% of patients experience relapse or progression in the first years after diagnosis (1). Recent data suggest that baseline radiomics features are promising biomarkers to predict treatment outcome in DLBCL (2–4), as they can predict outcome beyond metabolic tumor volume (MTV) and the international prognostic index (5).

Radiomics features can be calculated from the baseline ¹⁸F-FDG PET/CT scans and capture detailed and quantitative information on, for example, texture, intensity, and shape of lesions. Currently, radiomics analyses in lymphoma are based on predefined tumor segmentations. Segmentations are usually performed using absolute SUV thresholds (6) or percentages of SUV_max or SUV_peak (2,7). For the calculation of radiomics features, some studies use the hottest lesion (4), whereas others use the largest lesion (3,8) or tumor segmentations at the patient level (2,9). The largest lesion and MTV at the patient level had the highest predictive value (9). Therefore, in this study we concentrated on the largest lesion and radiomics features extracted from tumor segmentations at the patient level.

One of the main problems with generating a multitude of features is the high false-detection rate caused by multiple testing. Moreover, several features may represent similar characteristics that are often highly correlated and therefore redundant (10). Redundant features may induce a correlation bias (11), and models become difficult to interpret (12).

Therefore, reducing the feature space to a degree feasible for clinical use without losing important information is essential. One method to reduce feature space is hierarchical clustering, based on correlation analysis or distance metrics (13).

Previous DLBCL studies showed that MTV measured with different segmentation methods, albeit at different cutoffs, showed comparable discriminative power to predict survival (6,7). However, it is unclear to what extent the discriminative power of other radiomics features is affected by the method used to segment the lesions. Therefore, our main objective was to assess the effects of applying 6 frequently used segmentation methods on the discriminative power for 2-year time to progression of baseline PET/CT radiomics features in DLBCL both at the patient level and for the largest lesion.

MATERIALS AND METHODS

Study Population

For this case-control study, 100 patients with newly diagnosed DLBCL from the HOVON-84 study (Haemato Oncology Foundation for Adults in the Netherlands; European Union Drug Regulating Authorities Clinical Trials Database identifier 2006-005174-42) with baseline PET/CT scans available were included. Fifty patients with progressive disease or relapse within 2 years after diagnosis were matched on scan interval and reconstruction method (European Association of Nuclear Medicine Research GmbH [EARL]/non-EARL) (14) with 50 patients without progression. For this analysis, we combined R-CHOP14 (14-d cycles of rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone) and RR-CHOP14 (rituximab-intensified R-CHOP-14), because outcomes were similar between treatment arms (15). The HOVON-84 study was approved by the institutional review board, and all participants gave informed consent.

Quantitative Analysis

Quantitative PET/CT analysis was performed using the quantitative oncology molecular analysis suite (ACCURATE) (16). To match quality criteria, PET and low-dose CT scans should be complete, and liver SUV_mean and plasma glucose should be within the ranges suggested by the European Association of Nuclear Medicine guidelines (14). If liver SUV_mean was outside the suggested ranges but total image activity was between 50% and 80% of the injected activity, the scans were still included. All scans were reviewed by nuclear medicine physicians, and delineations were performed under their supervision. The following frequently used semiautomatic segmentation methods were applied to delineate lesions: an SUV threshold of 2.5, an SUV threshold of 4.0 (SUV4.0). 50% of SUV_peak (17), 41% of SUV_max, a majority vote segmenting voxels detected by at least 2 methods, and a majority vote segmenting voxels detected by at least 3 methods (supplemental materials; available at http://jnm.snmjournals.org).

Lesions were delineated with a fully automated preselection of lesions with a volume threshold of at least 3 cm³. Lymphoma lesions smaller than 3 cm³ were added by observer selection, and nontumor regions were deleted with single mouse-clicks for all 6 segmentation methods (18). Lesions for which automatic segmentation was successful were added to the patient-level volume of interest. If lesion selection resulted in flooding (i.e., selection of large parts of nontumor regions, such as liver, spleen, or skeleton), the lesion was not added. Adjacent nontumor ¹⁸F-FDG–avid regions (e.g., bladder or kidney) were manually removed. For the fixed SUV4.0 method, we also generated segmentations with a volume threshold of at least 3 cm³. Two observers selected the method with the highest visual agreement (best method) for each patient, resolving initial discrepancies in consensus meetings.

Feature Extraction

Four hundred eighty radiomics features (texture [n = 408], morphology [n = 22], intensity-based statistics [n = 18], intensity histogram [n = 24], intensity–volume histogram [n = 6], and local intensity [n = 2]) and 6 conventional PET uptake metrics before rebinning were extracted for both the patient level and the largest lesion for each segmentation method. The patient-level volume of interest included all segmented lesions and was generated by assigning all voxels within the individual lesions to one and all voxels outside any of the segmented individual lesions to zero. At the patient level, 4 additional dissemination features were calculated. All image-processing and feature calculations were performed using RaCat software (19), which complies with the imaging biomarker standardization initiative criteria (20). Details on feature calculation are presented in the supplemental materials.

Statistical Analysis

All statistical analyses were performed for radiomics features at the patient level and for the largest lesion using R (version 4.0.3). The paired Student t test was used to compare the MTV and SUV_peak of all segmentation methods with the best segmentation. On the basis of recent studies, the SUV4.0 segmentation was chosen as a reference (7,18). First, if the distribution of the radiomics feature values had skewness greater than 0.5 for the SUV4.0 segmentation method, they were log-transformed for all segmentations using the natural logarithm. The agreement between radiomics features extracted from different segmentations was quantified by calculating the intraclass correlation (ICC) compared with the SUV4.0 segmentation. ICCs were categorized as having reliability that was poor (<0.5), moderate (0.5–0.74), good (0.75–0.89), or excellent (≥0.90) (21). Two texture features at the patient level and 3 texture features at the lesion level did not show any variation and were therefore excluded.

MTV and SUV_peak have been shown to be predictive in DLBCL (9). To avoid overfitting and to remove redundancy, the feature space was reduced by deleting features that correlated strongly with either MTV or SUV_peak. The Pearson correlation coefficient between MTV and other radiomics features, and between SUV_peak and other radiomics features, was calculated for each segmentation method. A correlation was considered high if the Pearson correlation coefficient was at least 0.7 (22).

For each segmentation method, the mutual correlations between features that did not correlate with MTV and SUV_peak were calculated using Pearson correlation. For clusters of features with high mutual correlations, as identified with hierarchical clustering using Euclidian distance as a distance measure, the feature with the lowest correlation to MTV or SUV_peak was preserved.

Discriminative power (progression vs. nonprogression) was assessed using logistic regression with backward feature selection based on the Akaike information criteria (23). We included all independent features, MTV, and SUV_peak for all segmentations. Stratified repeated cross validation with 5 folds and 2,000 repeats was applied, yielding the mean receiver-operating-characteristic curve integral (CV-AUC) and the SD of AUCs between repeats. Comparing CV-AUCs is a known difficulty because of the inherent dependency of train-test iterations and complex relations between the trained models (24). Currently, there is no valid statistical approach to compare CV-AUCs.

As a sensitivity evaluation, all analyses were repeated for features that were reliable, repeatable, and reproducible in a multicenter setting (25).

RESULTS

Patient characteristics are summarized in Table 1. Sixty-four scans were semiautomatically analyzed and adapted with single mouse-clicks only. Thirty-six scans required manual editing because tumor and nontumor regions were adjacent. SUV4.0 was selected most frequently as the best method for both the patient level and the lesion level (49% and 64%, respectively).

View this table:

TABLE 1

Characteristics of Included Patients

MTV Analysis

The method using an SUV threshold of 2.5 resulted in MTV flooding for 44 patients, leading to exclusion of this method for further analysis. At the patient and lesion levels, MTV was highest for the segmentation using a majority vote segmenting voxels detected by at least 2 methods and was lowest for the method using 50% of SUV_peak (Table 2). Using the best visual segmentation as a reference, MTV was significantly higher for the segmentation using a majority vote segmenting voxels detected by at least 2 methods and was significantly lower using all other segmentation methods (all P < 0.05; Table 2; Fig. 1). SUV_peak was comparable among segmentation methods (all P > 0.05).

View this table:

TABLE 2

SUV_peak and MTV per Segmentation Method

FIGURE 1.

Maximum-intensity PET projections of patient with lesion segmentations indicated in red for all applied methods using SUV scale of 0–10. 41%max = 41% of SUV_max; A50P = 50% of SUV_peak; MV2 = majority vote segmenting voxels detected by ≥2 methods; MV3 = majority vote segmenting voxels detected by ≥3 methods; SUV2.5 = SUV threshold of 2.5.

Patient Level

Radiomics features based on a SUV4.0 preselection with a 3-cm³ volume threshold resembled the features of the SUV4.0 segmentation most, with excellent reliability for 414 features (84.8%), followed by the best segmentation. For the segmentation using 50% of SUV_peak, similarity was lowest, with only 218 features (44.7%) having excellent reliability (Fig. 2; Supplemental Table 1).

FIGURE 2.

Percentage of radiomics features yielding excellent, good, moderate, or poor ICC agreement between SUV4.0 segmentation and the other methods at the patient level. 41%max = 41% of SUV_max; A50P = 50% of SUV_peak; MV2 = majority vote segmenting voxels detected by ≥2 methods; MV3 = majority vote segmenting voxels detected by ≥3 methods.

For all segmentation methods, at least 435 features (89.3%) did not correlate strongly with MTV (Table 3), of which 433 (88.9%) did not correlate strongly with MTV for any segmentation method. At least 190 features (38.9%) did not correlate strongly with SUV_peak, of which 175 (35.9%) did not correlate strongly with SUV_peak for any segmentation. One hundred ninety-seven features (40.5%) did not correlate with MTV and SUV_peak for at least 1 method, of which 125 (25.7%) correlated neither with MTV nor with SUV_peak for any segmentation method. For each segmentation method, at least 25 features (5.1%) did not show high mutual correlations and did not correlate with MTV or SUV_peak. After backward feature selection, the SUV4.0 segmentation method yielded a CV-AUC of 0.74 ± 0.10; 41% of SUV_max had the highest CV-AUC (0.84 ± 0.09), the visually best segmentation method had the lowest CV-AUC (0.69 ± 0.11). Selected features after backward selection differed among segmentation methods and varied between 4 and 20 features (Table 3; Supplemental Table 2). For all segmentation methods, the morphologic feature “center of mass shift” and the texture feature “first measure of information correlation” were retained in the linear regression model.

View this table:

TABLE 3

Number of Independent Features per Segmentation Method, Number of Included Features, and Predictive Value at Patient Level for All Extracted Features (n = 488) and All Reliable, Repeatable, and Reproducible Features (n = 103)

Largest Lesion

Radiomics features of the segmentation using a majority vote segmenting voxels detected by at least 2 methods resembled those of the SUV4.0 method most, with excellent reliability for 389 features (80.5%). For the segmentation using 50% of SUV_peak, similarity was lowest, at only 83 features (17.2%) with excellent reliability (Fig. 3; Supplemental Table 3).

FIGURE 3.

Percentage of radiomics features yielding excellent, good, moderate, or poor ICC agreement between SUV4.0 segmentation and the other methods for the largest lesion. 41%max = 41% of SUV_max; A50P = 50% of SUV_peak; MV2 = majority vote segmenting voxels detected by ≥2 methods; MV3 = majority vote segmenting voxels detected by ≥3 methods.

For all segmentations, at least 409 features (84.9%) did not correlate strongly with MTV (Table 4), of which 404 (83.8%) did not correlate strongly with MTV for any segmentation method. At least 134 features (27.8%) did not correlate strongly with SUV_peak, of which 130 features (27.0%) did not correlate strongly with SUV_peak for any segmentation. One hundred forty-nine (31.0%) features did not correlate with MTV or SUV_peak for at least 1 method, of which 61 features (12.7%) correlated neither with MTV nor with SUV_peak for any segmentation method. For each segmentation method, at least 19 features (4.0%) did not show high mutual correlations and did not correlate with MTV or SUV_peak. After backward feature selection, SUV4.0 had the highest CV-AUC (0.73 ± 0.10), whereas a majority vote segmenting voxels detected by at least 3 methods and the best segmentation method had the lowest CV-AUC (0.69 ± 0.11). Selected features after backward selection differed among segmentation methods and varied between 5 and 11 features (Table 4; Supplemental Table 4). For all segmentation methods, the texture feature “first measure of information correlation” was retained in the linear regression model, and the intensity histogram feature “minimum histogram gradient” was retained in all models except for the SUV4.0 segmentation method.

View this table:

TABLE 4

Number of Independent Features per Segmentation Method, Number of Included Features, and Predictive Value for Largest Lesion for All Extracted Features (n = 483) and All Reliable, Repeatable, and Reproducible Features (n = 99)

When starting from a selection with reliable, repeatable, and reproducible features, similar results were found both at the patient level and for the largest lesion (Table 3; Table 4).

DISCUSSION

This study showed that the discriminative power is largely independent of segmentation method. However, there are large differences in radiomics feature values derived using different segmentation methods, as shown by ICC agreement values.

Both MTV and SUV_peak have been shown to be predictive in DLBCL (9). Our study showed that most radiomics features are independent of MTV for both the patient level and the largest lesion. Hatt et al. (26) showed that textural features, which comprise more than 80% of our radiomics features, already provide clinical complementary information in addition to MTV in lesions larger than 10 cm³, with an increasing complementary prognostic value for larger MTVs, disputing the threshold for texture features of 45 cm³ (27). With only 4 patients with MTVs smaller than 10 cm³ for the largest lesion, and 1 patient with an MTV smaller than 10 cm³ at the patient level, it is to be expected that most features are independent of MTV. However, many features correlated with SUV_peak, in which case they are redundant.

Currently, there is no consensus on the best segmentation method for delineating lesions in DLBCL ¹⁸F-FDG PET/CT studies. Therefore, it is essential to study the sensitivity of radiomics features in relation to segmentation method. In several solid cancers, radiomics features, especially morphologic and texture features, are influenced by the delineation method (28–31). The number of extracted features in these studies varied widely, between 9 and 480. We extend these findings by showing that for the largest lesion in DLBCL, up to 31% of the texture features, and 68% of the morphologic features, were highly sensitive to the segmentation method, as shown by the reliability of features compared with SUV4.0 segmentation. DLBCL lesions usually are large, heterogeneous, and bulky. Larger lesions are known to exhibit higher hypoxia, necrosis, or anatomic and physiologic complexity—characteristics that logically translate to higher complexity in the spatial ¹⁸F-FDG distribution and hence sensitivity to segmentation method, leading to lower reliability of features among applied methods. Furthermore, as variations in segmentation methods have a strong effect on the outer contour of the segmentation, thus influencing the shape of the segmentation, a high sensitivity to segmentation methods for morphologic features could be expected. Because of the higher MTV, the radiomics features at the patient level were less influenced by segmentation method, with up to 20% of the texture features, and 32% of the morphology features, being sensitive to segmentation method. Because of the low similarity of some of the features between segmentations, it is not advisable to use regression coefficients from other studies that applied other segmentation methods.

However, even though values are not interchangeable, in our study the discriminative power at the lesion and patient levels was comparable among segmentations. Contrary to what we expected, choosing the segmentation method that visually best selected the tumors did not result in a higher CV-AUC. These results are in line with previous studies exploring the predictive value of radiomics features using different segmentations for other cancer types. None of these studies found significant differences in predicting outcome (28,32), metastasis, or lymph node invasion (30) using different segmentation methods. However, ICC agreement values, correlations with MTV, correlations with SUV_peak, and mutual correlations differed among segmentation methods, resulting in different preselections of features for the logistic regression model. Even though discriminative power is comparable, different features are predictive of outcome when applying different segmentation methods.

When using only previously defined reliable, repeatable, and reproducible features, discriminative power was slightly lower for all segmentation methods. However, the CIs of CV-AUCs using only reproducible features overlapped with the CIs of CV-AUCs using all features. Therefore, using only reproducible features does not affect discriminative power. In clinical practice and multicenter studies, variable image qualities are encountered. Therefore, some features that have high predictive values may in reality be difficult to measure reliably. It is thus advisable to only use reproducible features, especially in multicenter settings.

To our knowledge, this was the first study that assessed the influence of segmentation methods on PET radiomics features and their predictive power, other than MTV, in DLBCL. By applying multiple frequently used methods on the same patients, we could directly compare the effect of segmentation methods on quantitative PET radiomics features. We chose to calculate linear relations among radiomics features using Pearson correlation because we used logistic regression as a classifier, and the logistic regression model calculates linear relations with included features. This probably led to fewer included features in the logistic regression model compared with the application of Spearman correlation as data reduction method. One of the limitations of this study was that not all scans were scanned according to the EARL protocol; this inconsistency might affect the discriminative power and repeatability of features (25). Because we matched events and non-events on reconstruction method there were no difference in EARL compliance between groups. However, this matching does not preclude an effect of the reconstruction method on the discriminative power. Use of harmonization methods such as ComBat to retrospectively increase uniformity in large datasets has definitely been shown to be worthwhile (33,34). Therefore, ComBat-based data alignment would be a successful approach toward harmonizing these differences. Unfortunately, in our study the number of patients per center was too small to allow application of ComBat. Moreover, in view of the equivalent discriminative power seen in our data among various segmentation methods, ComBat-based data alignment would be a successful approach toward harmonizing databases of radiomics features analyzed using different segmentation methods. In our cohort, patients presented with high MTVs; therefore, these results need to be validated for other cohorts with smaller lesion sizes.

CONCLUSION

This study found no substantial difference in the discriminative performance of radiomics features extracted using different segmentation methods. However, there are differences in the actual radiomics feature values derived and in the selected features among segmentation methods. Until consensus on a segmentation method for DLBCL is reached, it is advisable to use only prediction models that are built using data with the same segmentation methods.

DISCLOSURE

This work was financially supported by the Dutch Cancer Society (VU-2018-11648) and partially by the research program STRaTeGy (14929), which is financed by The Netherlands Organization for Scientific Research. No other potential conflict of interest relevant to this article was reported.

KEY POINTS

QUESTION: What is the influence of segmentation methods on the discriminative power of baseline radiomics features in DLBCL?

PERTINENT FINDINGS: There is no difference in the discriminative power of radiomics features among segmentation methods. However, different features are selected when applying different segmentation methods.

IMPLICATIONS FOR PATIENT CARE: It is advisable to only use prediction models that are build using data with the same segmentation methods.

Footnotes

Published online July 16, 2021.

REFERENCES

1.↵
1. Crump M,
2. Neelapu SS,
3. Farooq U,
4. et al
. Outcomes in refractory diffuse large B-cell lymphoma: results from the international SCHOLAR-1 study. Blood. 2017;130:1800–1808.
OpenUrl Abstract/FREE Full Text
2.↵
1. Cottereau AS,
2. Nioche C,
3. Dirand AS,
4. et al
. ¹⁸F-FDG PET dissemination features in diffuse large B-cell lymphoma are predictive of outcome. J Nucl Med. 2020;61:40–45.
OpenUrl Abstract/FREE Full Text
3.↵
1. Aide N,
2. Fruchart C,
3. Nganoa C,
4. Gac AC,
5. Lasnon C.
Baseline ¹⁸F-FDG PET radiomic features as predictors of 2-year event-free survival in diffuse large B cell lymphomas treated with immunochemotherapy. Eur Radiol. 2020;30:4623–4632.
OpenUrl
4.↵
1. Ceriani L,
2. Gritti G,
3. Cascione L,
4. et al
. SAKK38/07 study: integration of baseline metabolic heterogeneity and metabolic tumor volume in DLBCL prognostic model. Blood Adv. 2020;4:1082–1092.
OpenUrl
5.↵
International Non-Hodgkin’s Lymphoma Prognostic Factors Project. A predictive model for aggressive non-Hodgkin’s lymphoma. N Engl J Med. 1993;329:987–994.
OpenUrl CrossRef PubMed
6.↵
1. Ilyas H,
2. Mikhaeel NG,
3. Dunn JT,
4. et al
. Defining the optimal method for measuring baseline metabolic tumour volume in diffuse large B cell lymphoma. Eur J Nucl Med Mol Imaging. 2018;45:1142–1154.
OpenUrl
7.↵
1. Barrington SF,
2. Zwezerijnen BG,
3. de Vet HC,
4. et al
. Automated segmentation of baseline metabolic total tumor burden in diffuse large B-cell lymphoma: which method is most successful? J Nucl Med. 2021;62:332–337.
OpenUrl Abstract/FREE Full Text
8.↵
1. Senjo H,
2. Hirata K,
3. Izumiyama K,
4. et al
. High metabolic heterogeneity on baseline ¹⁸FDG-PET/CT scan as a poor prognostic factor for newly diagnosed diffuse large B-cell lymphoma. Blood Adv. 2020;4:2286–2296.
OpenUrl
9.↵
1. Eertink JJ,
2. van de Brug T,
3. Wiegers SE,
4. et al
. ¹⁸F-FDG PET/CT baseline radiomics features are predictive of outcome in diffuse large B- cell lymphoma patients. Eur J Nucl Med Mol Imaging. August 18, 2021 [Epub ahead of print].
10.↵
1. Orlhac F,
2. Soussan M,
3. Maisonobe JA,
4. Garcia CA,
5. Vanderlinden B,
6. Buvat I.
Tumor texture analysis in ¹⁸F-FDG PET: relationships between texture parameters, histogram indices, standardized uptake values, metabolic volumes, and total lesion glycolysis. J Nucl Med. 2014;55:414–422.
OpenUrl Abstract/FREE Full Text
11.↵
1. Tolosi L,
2. Lengauer T.
Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics. 2011;27:1986–1994.
OpenUrl CrossRef PubMed
12.↵
1. Zwanenburg A.
Radiomics in nuclear medicine: robustness, reproducibility, standardization, and how to avoid data analysis traps and replication crisis. Eur J Nucl Med Mol Imaging. 2019;46:2638–2655.
OpenUrl
13.↵
1. Kumar V,
2. Gu Y,
3. Basu S,
4. et al
. Radiomics: the process and the challenges. Magn Reson Imaging. 2012;30:1234–1248.
OpenUrl CrossRef PubMed
14.↵
1. Boellaard R,
2. Delgado-Bolton R,
3. Oyen WJ,
4. et al
. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–354.
OpenUrl CrossRef PubMed
15.↵
1. Lugtenburg PJ,
2. de Nully Brown P,
3. van der Holt B,
4. et al
. Rituximab-CHOP with early rituximab intensification for diffuse large B-cell lymphoma: a randomized phase III trial of the HOVON and the Nordic Lymphoma Group (HOVON-84). J Clin Oncol. 2020;38:3377–3387.
OpenUrl
16.↵
1. Boellaard R.
Quantitative oncology molecular analysis suite: ACCURATE [abstract]. J Nucl Med. 2018;59(suppl 1):1753.
OpenUrl
17.↵
1. Frings V,
2. van Velden FH,
3. Velasquez LM,
4. et al
. Repeatability of metabolically active tumor volume measurements with FDG PET/CT in advanced gastrointestinal malignancies: a multicenter study. Radiology. 2014;273:539–548.
OpenUrl CrossRef PubMed
18.↵
1. Burggraaff CN,
2. Rahman F,
3. Kassner I,
4. et al
. Optimizing workflows for fast and reliable metabolic tumor volume measurements in diffuse large B cell lymphoma. Mol Imaging Biol. 2020;22:1102–1110.
OpenUrl
19.↵
1. Pfaehler E,
2. Zwanenburg A,
3. de Jong JR,
4. Boellaard R.
RaCaT: an open source and easy to use radiomics calculator tool. PLoS One. 2019;14:e0212223.
OpenUrl
20.↵
1. Zwanenburg A,
2. Vallieres M,
3. Abdalah MA,
4. et al
. The Image Biomarker Standardization Initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295:328–338.
OpenUrl CrossRef PubMed
21.↵
1. Koo TK,
2. Li MY.
A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–163.
OpenUrl CrossRef PubMed
22.↵
1. Mukaka MM.
Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24:69–71.
OpenUrl PubMed
23.↵
1. Parzen E,
2. Tanabe K,
3. Kitagawa G
1. Akaike H.
Information theory and an extension of the maximum likelihood principle. In: Parzen E, Tanabe K, Kitagawa G, eds. Selected Papers of Hirotugu Akaike. Springer; 1998:199–213.
24.↵
1. Dietterich TG.
Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10:1895–1923.
OpenUrl CrossRef PubMed
25.↵
1. Pfaehler E,
2. van Sluis J,
3. Merema BBJ,
4. et al
. Experimental multicenter and multivendor evaluation of the performance of PET radiomic features using 3-dimensionally printed phantom inserts. J Nucl Med. 2020;61:469–476.
OpenUrl Abstract/FREE Full Text
26.↵
1. Hatt M,
2. Majdoub M,
3. Vallieres M,
4. et al
. ¹⁸F-FDG PET uptake characterization through texture analysis: investigating the complementary nature of heterogeneity and functional tumor volume in a multi-cancer site patient cohort. J Nucl Med. 2015;56:38–44.
OpenUrl Abstract/FREE Full Text
27.↵
1. Brooks FJ,
2. Grigsby PW.
The effect of small tumor volumes on studies of intratumoral heterogeneity of tracer uptake. J Nucl Med. 2014;55:37–42.
OpenUrl Abstract/FREE Full Text
28.↵
1. Hatt M,
2. Tixier F,
3. Cheze Le Rest C,
4. Pradier O,
5. Visvikis D.
Robustness of intratumour ¹⁸F-FDG PET uptake heterogeneity quantification for therapy response prediction in oesophageal carcinoma. Eur J Nucl Med Mol Imaging. 2013;40:1662–1671.
OpenUrl CrossRef PubMed
29.
1. Belli ML,
2. Mori M,
3. Broggi S,
4. et al
. Quantifying the robustness of [¹⁸F]FDG-PET/CT radiomic features with respect to tumor delineation in head and neck and pancreatic cancer patients. Phys Med. 2018;49:105–111.
OpenUrl
30.↵
1. Cysouw MCF,
2. Jansen BHE,
3. van de Brug T,
4. et al
. Machine learning-based analysis of [¹⁸F]DCFPyL PET radiomics for risk stratification in primary prostate cancer. Eur J Nucl Med Mol Imaging. 2021;48:340–349.
OpenUrl
31.↵
1. Altazi BA,
2. Zhang GG,
3. Fernandez DC,
4. et al
. Reproducibility of F18-FDG PET radiomic features for different cervical tumor segmentation methods, gray-level discretization, and reconstruction algorithms. J Appl Clin Med Phys. 2017;18:32–48.
OpenUrl
32.↵
1. Bashir U,
2. Azad G,
3. Siddique MM,
4. et al
. The effects of segmentation algorithms on the measurement of ¹⁸F-FDG PET texture parameters in non-small cell lung cancer. EJNMMI Res. 2017;7:60.
OpenUrl
33.↵
1. Orlhac F,
2. Boughdad S,
3. Philippe C,
4. et al
. A postreconstruction harmonization method for multicenter radiomic studies in PET. J Nucl Med. 2018;59:1321–1328.
OpenUrl Abstract/FREE Full Text
34.↵
1. Dissaux G,
2. Visvikis D,
3. Da-Ano R,
4. et al
. Pretreatment ¹⁸F-FDG PET/CT radiomics predict local recurrence in patients treated with stereotactic body radiotherapy for early-stage non-small cell lung cancer: a multicentric study. J Nucl Med. 2020;61:814–820.
OpenUrl Abstract/FREE Full Text

Received for publication February 12, 2021.
Revision received June 3, 2021.

In this issue

Download PDF

Article Alerts

Email Article

Citation Tools

Bookmark this article

Cited By...

Stacking Ensemble Learning-Based [18F]FDG PET Radiomics for Outcome Prediction in Diffuse Large B-Cell Lymphoma

Google Scholar

More in this TOC Section

Show more Clinical Investigation

Keywords

[1] 1.↵
Crump M,
Neelapu SS,
Farooq U,
et al
. Outcomes in refractory diffuse large B-cell lymphoma: results from the international SCHOLAR-1 study. Blood. 2017;130:1800–1808.
OpenUrl Abstract/FREE Full Text

[2] Crump M,

[3] Neelapu SS,

[4] Farooq U,

[5] et al

[6] 2.↵
Cottereau AS,
Nioche C,
Dirand AS,
et al
. ¹⁸F-FDG PET dissemination features in diffuse large B-cell lymphoma are predictive of outcome. J Nucl Med. 2020;61:40–45.
OpenUrl Abstract/FREE Full Text

[7] Cottereau AS,

[8] Nioche C,

[9] Dirand AS,

[10] et al

[11] 3.↵
Aide N,
Fruchart C,
Nganoa C,
Gac AC,
Lasnon C.
Baseline ¹⁸F-FDG PET radiomic features as predictors of 2-year event-free survival in diffuse large B cell lymphomas treated with immunochemotherapy. Eur Radiol. 2020;30:4623–4632.
OpenUrl

[12] Aide N,

[13] Fruchart C,

[14] Nganoa C,

[15] Gac AC,

[16] Lasnon C.

[17] 4.↵
Ceriani L,
Gritti G,
Cascione L,
et al
. SAKK38/07 study: integration of baseline metabolic heterogeneity and metabolic tumor volume in DLBCL prognostic model. Blood Adv. 2020;4:1082–1092.
OpenUrl

[18] Ceriani L,

[19] Gritti G,

[20] Cascione L,

[21] et al

[22] 5.↵
International Non-Hodgkin’s Lymphoma Prognostic Factors Project. A predictive model for aggressive non-Hodgkin’s lymphoma. N Engl J Med. 1993;329:987–994.
OpenUrl CrossRef PubMed

[23] 6.↵
Ilyas H,
Mikhaeel NG,
Dunn JT,
et al
. Defining the optimal method for measuring baseline metabolic tumour volume in diffuse large B cell lymphoma. Eur J Nucl Med Mol Imaging. 2018;45:1142–1154.
OpenUrl

[24] Ilyas H,

[25] Mikhaeel NG,

[26] Dunn JT,

[27] et al

[28] 7.↵
Barrington SF,
Zwezerijnen BG,
de Vet HC,
et al
. Automated segmentation of baseline metabolic total tumor burden in diffuse large B-cell lymphoma: which method is most successful? J Nucl Med. 2021;62:332–337.
OpenUrl Abstract/FREE Full Text

[29] Barrington SF,

[30] Zwezerijnen BG,

[31] de Vet HC,

[32] et al

[33] 8.↵
Senjo H,
Hirata K,
Izumiyama K,
et al
. High metabolic heterogeneity on baseline ¹⁸FDG-PET/CT scan as a poor prognostic factor for newly diagnosed diffuse large B-cell lymphoma. Blood Adv. 2020;4:2286–2296.
OpenUrl

[34] Senjo H,

[35] Hirata K,

[36] Izumiyama K,

[37] et al

[38] 9.↵
Eertink JJ,
van de Brug T,
Wiegers SE,
et al
. ¹⁸F-FDG PET/CT baseline radiomics features are predictive of outcome in diffuse large B- cell lymphoma patients. Eur J Nucl Med Mol Imaging. August 18, 2021 [Epub ahead of print].

[39] Eertink JJ,

[40] van de Brug T,

[41] Wiegers SE,

[42] et al

[43] 10.↵
Orlhac F,
Soussan M,
Maisonobe JA,
Garcia CA,
Vanderlinden B,
Buvat I.
Tumor texture analysis in ¹⁸F-FDG PET: relationships between texture parameters, histogram indices, standardized uptake values, metabolic volumes, and total lesion glycolysis. J Nucl Med. 2014;55:414–422.
OpenUrl Abstract/FREE Full Text

[44] Orlhac F,

[45] Soussan M,

[46] Maisonobe JA,

[47] Garcia CA,

[48] Vanderlinden B,

[49] Buvat I.

[50] 11.↵
Tolosi L,
Lengauer T.
Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics. 2011;27:1986–1994.
OpenUrl CrossRef PubMed

[51] Tolosi L,

[52] Lengauer T.

[53] 12.↵
Zwanenburg A.
Radiomics in nuclear medicine: robustness, reproducibility, standardization, and how to avoid data analysis traps and replication crisis. Eur J Nucl Med Mol Imaging. 2019;46:2638–2655.
OpenUrl

[54] Zwanenburg A.

[55] 13.↵
Kumar V,
Gu Y,
Basu S,
et al
. Radiomics: the process and the challenges. Magn Reson Imaging. 2012;30:1234–1248.
OpenUrl CrossRef PubMed

[56] Kumar V,

[57] Gu Y,

[58] Basu S,

[59] et al

[60] 14.↵
Boellaard R,
Delgado-Bolton R,
Oyen WJ,
et al
. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–354.
OpenUrl CrossRef PubMed

[61] Boellaard R,

[62] Delgado-Bolton R,

[63] Oyen WJ,

[64] et al

[65] 15.↵
Lugtenburg PJ,
de Nully Brown P,
van der Holt B,
et al
. Rituximab-CHOP with early rituximab intensification for diffuse large B-cell lymphoma: a randomized phase III trial of the HOVON and the Nordic Lymphoma Group (HOVON-84). J Clin Oncol. 2020;38:3377–3387.
OpenUrl

[66] Lugtenburg PJ,

[67] de Nully Brown P,

[68] van der Holt B,

[69] et al

[70] 16.↵
Boellaard R.
Quantitative oncology molecular analysis suite: ACCURATE [abstract]. J Nucl Med. 2018;59(suppl 1):1753.
OpenUrl

[71] Boellaard R.

[72] 17.↵
Frings V,
van Velden FH,
Velasquez LM,
et al
. Repeatability of metabolically active tumor volume measurements with FDG PET/CT in advanced gastrointestinal malignancies: a multicenter study. Radiology. 2014;273:539–548.
OpenUrl CrossRef PubMed

[73] Frings V,

[74] van Velden FH,

[75] Velasquez LM,

[76] et al

[77] 18.↵
Burggraaff CN,
Rahman F,
Kassner I,
et al
. Optimizing workflows for fast and reliable metabolic tumor volume measurements in diffuse large B cell lymphoma. Mol Imaging Biol. 2020;22:1102–1110.
OpenUrl

[78] Burggraaff CN,

[79] Rahman F,

[80] Kassner I,

[81] et al

[82] 19.↵
Pfaehler E,
Zwanenburg A,
de Jong JR,
Boellaard R.
RaCaT: an open source and easy to use radiomics calculator tool. PLoS One. 2019;14:e0212223.
OpenUrl

[83] Pfaehler E,

[84] Zwanenburg A,

[85] de Jong JR,

[86] Boellaard R.

[87] 20.↵
Zwanenburg A,
Vallieres M,
Abdalah MA,
et al
. The Image Biomarker Standardization Initiative: standardized quantitative radiomics for high-throughput image-based phenotyping. Radiology. 2020;295:328–338.
OpenUrl CrossRef PubMed

[88] Zwanenburg A,

[89] Vallieres M,

[90] Abdalah MA,

[91] et al

[92] 21.↵
Koo TK,
Li MY.
A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016;15:155–163.
OpenUrl CrossRef PubMed

[93] Koo TK,

[94] Li MY.

[95] 22.↵
Mukaka MM.
Statistics corner: a guide to appropriate use of correlation coefficient in medical research. Malawi Med J. 2012;24:69–71.
OpenUrl PubMed

[96] Mukaka MM.

[97] 23.↵
Parzen E,
Tanabe K,
Kitagawa G
Akaike H.
Information theory and an extension of the maximum likelihood principle. In: Parzen E, Tanabe K, Kitagawa G, eds. Selected Papers of Hirotugu Akaike. Springer; 1998:199–213.

[98] Parzen E,

[99] Tanabe K,

[100] Kitagawa G

[101] Akaike H.

[102] 24.↵
Dietterich TG.
Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput. 1998;10:1895–1923.
OpenUrl CrossRef PubMed

[103] Dietterich TG.

[104] 25.↵
Pfaehler E,
van Sluis J,
Merema BBJ,
et al
. Experimental multicenter and multivendor evaluation of the performance of PET radiomic features using 3-dimensionally printed phantom inserts. J Nucl Med. 2020;61:469–476.
OpenUrl Abstract/FREE Full Text

[105] Pfaehler E,

[106] van Sluis J,

[107] Merema BBJ,

[108] et al

[109] 26.↵
Hatt M,
Majdoub M,
Vallieres M,
et al
. ¹⁸F-FDG PET uptake characterization through texture analysis: investigating the complementary nature of heterogeneity and functional tumor volume in a multi-cancer site patient cohort. J Nucl Med. 2015;56:38–44.
OpenUrl Abstract/FREE Full Text

[110] Hatt M,

[111] Majdoub M,

[112] Vallieres M,

[113] et al

[114] 27.↵
Brooks FJ,
Grigsby PW.
The effect of small tumor volumes on studies of intratumoral heterogeneity of tracer uptake. J Nucl Med. 2014;55:37–42.
OpenUrl Abstract/FREE Full Text

[115] Brooks FJ,

[116] Grigsby PW.

[117] 28.↵
Hatt M,
Tixier F,
Cheze Le Rest C,
Pradier O,
Visvikis D.
Robustness of intratumour ¹⁸F-FDG PET uptake heterogeneity quantification for therapy response prediction in oesophageal carcinoma. Eur J Nucl Med Mol Imaging. 2013;40:1662–1671.
OpenUrl CrossRef PubMed

[118] Hatt M,

[119] Tixier F,

[120] Cheze Le Rest C,

[121] Pradier O,

[122] Visvikis D.

[123] 29.
Belli ML,
Mori M,
Broggi S,
et al
. Quantifying the robustness of [¹⁸F]FDG-PET/CT radiomic features with respect to tumor delineation in head and neck and pancreatic cancer patients. Phys Med. 2018;49:105–111.
OpenUrl

[124] Belli ML,

[125] Mori M,

[126] Broggi S,

[127] et al

[128] 30.↵
Cysouw MCF,
Jansen BHE,
van de Brug T,
et al
. Machine learning-based analysis of [¹⁸F]DCFPyL PET radiomics for risk stratification in primary prostate cancer. Eur J Nucl Med Mol Imaging. 2021;48:340–349.
OpenUrl

[129] Cysouw MCF,

[130] Jansen BHE,

[131] van de Brug T,

[132] et al

[133] 31.↵
Altazi BA,
Zhang GG,
Fernandez DC,
et al
. Reproducibility of F18-FDG PET radiomic features for different cervical tumor segmentation methods, gray-level discretization, and reconstruction algorithms. J Appl Clin Med Phys. 2017;18:32–48.
OpenUrl

[134] Altazi BA,

[135] Zhang GG,

[136] Fernandez DC,

[137] et al

[138] 32.↵
Bashir U,
Azad G,
Siddique MM,
et al
. The effects of segmentation algorithms on the measurement of ¹⁸F-FDG PET texture parameters in non-small cell lung cancer. EJNMMI Res. 2017;7:60.
OpenUrl

[139] Bashir U,

[140] Azad G,

[141] Siddique MM,

[142] et al

[143] 33.↵
Orlhac F,
Boughdad S,
Philippe C,
et al
. A postreconstruction harmonization method for multicenter radiomic studies in PET. J Nucl Med. 2018;59:1321–1328.
OpenUrl Abstract/FREE Full Text

[144] Orlhac F,

[145] Boughdad S,

[146] Philippe C,

[147] et al

[148] 34.↵
Dissaux G,
Visvikis D,
Da-Ano R,
et al
. Pretreatment ¹⁸F-FDG PET/CT radiomics predict local recurrence in patients treated with stereotactic body radiotherapy for early-stage non-small cell lung cancer: a multicentric study. J Nucl Med. 2020;61:814–820.
OpenUrl Abstract/FREE Full Text

[149] Dissaux G,

[150] Visvikis D,

[151] Da-Ano R,

[152] et al

Main menu

User menu

Search

Quantitative Radiomics Features in Diffuse Large B-Cell Lymphoma: Does Segmentation Method Matter?

Visual Abstract

Abstract

MATERIALS AND METHODS

Study Population

Quantitative Analysis

Feature Extraction

Statistical Analysis

RESULTS

MTV Analysis

Patient Level

Largest Lesion

DISCUSSION

CONCLUSION

DISCLOSURE

KEY POINTS

Footnotes

REFERENCES

In this issue

Citation Manager Formats

Related Articles

Cited By...

More in this TOC Section

Similar Articles

Keywords

Main menu

User menu

Search

Quantitative Radiomics Features in Diffuse Large B-Cell Lymphoma: Does Segmentation Method Matter?

Visual Abstract

Abstract

MATERIALS AND METHODS

Study Population

Quantitative Analysis

Feature Extraction

Statistical Analysis

RESULTS

MTV Analysis

Patient Level

Largest Lesion

DISCUSSION

CONCLUSION

DISCLOSURE

KEY POINTS

Footnotes

REFERENCES

In this issue

Citation Manager Formats

Jump to section

Related Articles

Cited By...

More in this TOC Section

Similar Articles

Keywords