Skip to main content

Main menu

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
    • Continuing Education
    • JNM Podcasts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Journal Claims
    • Corporate & Special Sales
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI

User menu

  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Nuclear Medicine
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Journal of Nuclear Medicine

Advanced Search

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
    • Continuing Education
    • JNM Podcasts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Journal Claims
    • Corporate & Special Sales
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Reviewers
    • Permissions
    • Advertisers
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • View or Listen to JNM Podcast
  • Visit JNM on Facebook
  • Join JNM on LinkedIn
  • Follow JNM on Twitter
  • Subscribe to our RSS feeds
Research ArticleClinical Investigation

Validation of an Artificial Intelligence–Based Prediction Model Using 5 External PET/CT Datasets of Diffuse Large B-Cell Lymphoma

Maria C. Ferrández, Sandeep S.V. Golla, Jakoba J. Eertink, Sanne E. Wiegers, Gerben J.C. Zwezerijnen, Martijn W. Heymans, Pieternella J. Lugtenburg, Lars Kurch, Andreas Hüttmann, Christine Hanoun, Ulrich Dührsen, Sally F. Barrington, N. George Mikhaeel, Luca Ceriani, Emanuele Zucca, Sándor Czibor, Tamás Györke, Martine E.D. Chamuleau, Josée M. Zijlstra and Ronald Boellaard; on behalf of the PETRA Consortium
Journal of Nuclear Medicine November 2024, 65 (11) 1802-1807; DOI: https://doi.org/10.2967/jnumed.124.268191
Maria C. Ferrández
1Department of Radiology and Nuclear Medicine, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sandeep S.V. Golla
1Department of Radiology and Nuclear Medicine, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jakoba J. Eertink
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
3Department of Hematology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sanne E. Wiegers
1Department of Radiology and Nuclear Medicine, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gerben J.C. Zwezerijnen
1Department of Radiology and Nuclear Medicine, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martijn W. Heymans
4Department of Epidemiology and Data Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Pieternella J. Lugtenburg
5Department of Hematology, Erasmus MC Cancer Institute, University Medical Center Rotterdam, Rotterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lars Kurch
6Clinic and Polyclinic for Nuclear Medicine, Department of Nuclear Medicine, University of Leipzig, Leipzig, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Andreas Hüttmann
7Department of Hematology, West German Cancer Center, University Hospital Essen, University of Duisburg–Essen, Essen, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christine Hanoun
7Department of Hematology, West German Cancer Center, University Hospital Essen, University of Duisburg–Essen, Essen, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ulrich Dührsen
7Department of Hematology, West German Cancer Center, University Hospital Essen, University of Duisburg–Essen, Essen, Germany;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sally F. Barrington
8School of Biomedical Engineering and Imaging Sciences, King’s College London and Guy’s and St Thomas’ PET Centre, King’s Health Partners, King’s College London, London, United Kingdom;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
N. George Mikhaeel
9Department of Clinical Oncology, Guy’s Cancer Centre and School of Cancer and Pharmaceutical Sciences, King’s College London, London, United Kingdom;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Luca Ceriani
10Department of Nuclear Medicine and PET/CT Centre, Imaging Institute of Southern Switzerland–EOC, Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland;
11SAKK Swiss Group for Clinical Cancer Research, Bern, Switzerland;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Emanuele Zucca
11SAKK Swiss Group for Clinical Cancer Research, Bern, Switzerland;
12Department of Oncology, Oncology Institute of Southern Switzerland–EOC, Faculty of Biomedical Sciences, Università della Svizzera Italiana, Bellinzona, Switzerland;
13Faculty of Biomedical Sciences, Università della Svizzera Italiana, Lugano, Switzerland; and
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sándor Czibor
14Department of Nuclear Medicine, Medical Imaging Centre, Semmelweis University, Budapest, Hungary
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tamás Györke
14Department of Nuclear Medicine, Medical Imaging Centre, Semmelweis University, Budapest, Hungary
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Martine E.D. Chamuleau
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
3Department of Hematology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Josée M. Zijlstra
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
3Department of Hematology, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ronald Boellaard
1Department of Radiology and Nuclear Medicine, Cancer Center Amsterdam, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands;
2Imaging and Biomarkers, Cancer Center Amsterdam, Amsterdam, The Netherlands;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • PDF
Loading

Visual Abstract

Figure
  • Download figure
  • Open in new tab
  • Download powerpoint

Abstract

The aim of this study was to validate a previously developed deep learning model in 5 independent clinical trials. The predictive performance of this model was compared with the international prognostic index (IPI) and 2 models incorporating radiomic PET/CT features (clinical PET and PET models). Methods: In total, 1,132 diffuse large B-cell lymphoma patients were included: 296 for training and 836 for external validation. The primary outcome was 2-y time to progression. The deep learning model was trained on maximum-intensity projections from PET/CT scans. The clinical PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, SUVpeak, age, and performance status. The PET model included metabolic tumor volume, maximum distance from the bulkiest lesion to another lesion, and SUVpeak. Model performance was assessed using the area under the curve (AUC) and Kaplan–Meier curves. Results: The IPI yielded an AUC of 0.60 on all external data. The deep learning model yielded a significantly higher AUC of 0.66 (P < 0.01). For each individual clinical trial, the model was consistently better than IPI. Radiomic model AUCs remained higher for all clinical trials. The deep learning and clinical PET models showed equivalent performance (AUC, 0.69; P > 0.05). The PET model yielded the highest AUC of all models (AUC, 0.71; P < 0.05). Conclusion: The deep learning model predicted outcome in all trials with a higher performance than IPI and better survival curve separation. This model can predict treatment outcome in diffuse large B-cell lymphoma without tumor delineation but at the cost of a lower prognostic performance than with radiomics.

  • diffuse large B-cell lymphoma
  • maximum-intensity projection
  • convolutional neural networks
  • time to progression
  • prediction

A combination of 18F-FDG PET with CT imaging is the preferred imaging modality for staging in diffuse large B-cell lymphoma (DLBCL) patients (1). Because of the heterogeneity of the disease, the current first-line treatment strategy in DLBCL results in relapse in one third of patients within the first 2 y (2). The International Prognostic Index (IPI) is used in the clinic to estimate patient prognosis, but it has suboptimal performance (3). Different PET-derived metrics are reported to be strong prognostic factors for DLBCL, especially metabolic tumor volume (MTV) (4). These metrics can be incorporated into prediction models for patient prognosis, also known as radiomic models. However, tumor delineation is required for the extraction of these metrics, which can be labor-intensive and is prone to intra- and interreader variabilities. Thus, there is a need to develop user-independent, effective, and reliable methods that can aid the identification of high-risk DLBCL patients.

Artificial intelligence and deep learning are promising technologies in the field of medical imaging. Their clinical applications are vast and include diagnostics, postprocessing techniques, tumor detection and delineation, prognosis, and clinical decision-making (5). One of the main drawbacks of deep learning applied to medical imaging is the large computational requirements for the training of the models. PET scans are extremely large in terms of memory size, and thus the analysis of such inputs requires deep learning models with increasingly more complex layers. The use of maximum-intensity projections (MIPs) can mitigate this because PET scans are projected onto 2-dimensional images, largely reducing the computational burden (6). The advantage of deep learning over radiomic models is that the former can directly learn from the images, without the need for lesion delineation to enable prediction of disease progression. In contrast, radiomic models are based on tumor segmentations. There is increasing interest in the use of deep learning and convolutional neural networks (CNNs) in DLBCL, especially for automatic tumor segmentation (7). However, there are only a few studies that focus on the use of CNNs for prediction of tumor progression directly from segmentation-free PET images (8,9).

In a previous study, we developed a CNN for the prediction of 2-y time to progression (TTP) from coronal and sagittal MIP images of DLBCL baseline scans (i.e., MIP-CNN) (10). This model was trained on the clinical trial HOVON-84 (11) and was externally evaluated on an independent clinical trial, PETAL (12). Proper validation of such models is difficult, as the predictive performance varies across populations and target settings and can also change with, for example, time due to improvements in care (13). Therefore, external validation is an essential aspect of assessing model performance, especially with deep learning models, since the decision-making processes of these can be challenging to understand.

The aims of this study were to extend the validation of the MIP-CNN model to 5 other international clinical trials and to compare the predictive performance of the MIP-CNN model with the IPI score, which is the current clinical standard, and 2 other radiomic models that require tumor segmentation.

MATERIALS AND METHODS

Study Population

In total, 1,466 18F-FDG PET/CT baseline scans from newly diagnosed DLBCL patients were available, and after quality control, 836 scans were used in this study. Quality control exclusions are provided in the Results section. These patient scans were obtained from 6 independent clinical trials: PETAL (12), GSTT15 (14), IAEA (15), NCRI (16), SAKK (17), and HOVON-130 (18). Additionally, 296 patients from the HOVON-84 trial were used to train the models as reported previously (4,10). All patients were treated with rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone with a varying number of cycles, mostly 6 or 8. Individual trials were approved by the institutional review board, and all patients provided written informed consent. The institutional review board of the VU University Medical Center (JR/20140414) approved the use of these data. Details on the quality control of the scans can be found in Supplemental Data 1 (supplemental materials are available at http://jnm.snmjournals.org).

Prediction Models

In this study, 3 different models were implemented for the prediction of risk of progression within 2 y from the time of the baseline scan (i.e., 2-y TTP): the MIP-CNN model developed by Ferrández et al. (10) (a deep learning model that uses coronal and sagittal MIP images as inputs), the clinical PET model developed by Eertink et al. (4), and a PET model that used PET features only. The MIP-CNN was previously trained on the HOVON-84 trial and initially tested on the PETAL trial (n = 340) (10). In this study, the MIP-CNN model was externally tested on an additional 5 clinical trials, accounting for 496 patients, whereas the newly proposed PET model was not tested using the PETAL data and was therefore externally tested on all 836 scans. Yet, to allow direct comparison, we report the results for both the MIP-CNN and the new PET model for all external trial datasets. For the definition of TTP, patients who died within 2 y from the time of the baseline scan without signs of progression were excluded from the analysis. An illustration of the model’s design can be found in Figure 1. The performance of the 3 models was compared with the IPI risk score. The IPI was established using low-, low-intermediate-, high-intermediate-, and high-risk groups (3).

FIGURE 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 1.

Flowchart of steps involved in design of models. (A) MIP-CNN: MIP images are obtained from baseline PET scans and used as input of model. (B) Radiomic models: tumors are delineated for each patient, and features are extracted from these delineations. Those features are used as predictors in machine learning model. Clinical PET model includes clinical parameter as predictors (age and WHO status). All models are designed to predict probability of 2-y TTP.

MIP-CNN Model

MIP images were generated using an in-house–developed preprocessing tool in Interactive Data Language (IDL; NV5 Geospatial Solutions, Inc.). This tool produces coronal and sagittal MIPs with dimensions of 275 × 200 × 1 and a pixel size of 4 × 4 mm. Examples of these coronal and sagittal MIPs are illustrated in Figure 1A. Details on the design of the MIP-CNN are described in Supplemental Data 2 (10). The model is available for download from the supplemental files from an article by Ferrández et al. (10).

Radiomic Models

The tumors were delineated using an SUV threshold of 4.0, as recommended by Barrington et al. (19). and used previously by Eertink et al. (4). Additionally, any physiologic uptake included in tumor regions was manually removed, because of the proximity of the physiologic uptake to the tumor. All delineations were performed using the ACCURATE tool (20). More details on the delineation process and the extraction of the PET features were given in previous studies (21).

The clinical PET model was first developed using the HOVON-84 trial by Eertink et al. (4). The clinical PET model includes the following features: MTV, SUVpeak, maximum distance from the bulkiest lesion to another lesion, age, and World Health Organization performance status. These features were used as predictors in a logistic regression model to predict the probability of 2-y TTP for each patient.

The PET model followed the same design as the clinical PET model but included only PET-extracted features as predictors: MTV, SUVpeak, and maximum distance from the bulkiest lesion to another lesion.

Statistical Analysis

The receiver operator characteristic curve and the area under the curve (AUC) were used to evaluate the models’ performance. The radiomic models were internally validated with a stratified repeated 5-folds cross-validation (CV) (4). The internal validation of the MIP-CNN followed a 5-fold CV with a balancing data scheme, which was explained in detail previously (10). A 2-sided Delong test was used to assess the differences between models’ performance in terms of AUC (22). To externally test the radiomic models, the model coefficients were fixed and used to calculate the 2-y TTP probabilities. In the case of the MIP-CNN, the model weights were saved during training and later used to calculate the probabilities.

Kaplan–Meier analysis was used to obtain the survival curves for each model, and these were compared against IPI. We found that most events in this population occur during the first 2 y of treatment, and therefore, the data after this period are not included in the survival analysis. High-risk IPI is defined by the presence of 4–5 adverse factors. High-risk groups for the MIP-CNN, clinical PET, and PET prediction models were defined by the patients with the highest predicted probabilities. To facilitate comparison between the high-risk groups in the IPI and the prediction models, the high-risk patient cohort for the prediction models were matched in size to the high-risk IPI group.

To further evaluate and quantify the models’ predictive performance in terms of overall fit, calibration, and discrimination, we reported the calibration plot, the slope and intercept of the calibration, the Brier score (23), and the absolute average difference between observed and predicted probabilities for each model Supplemental Data 3 (13).

These analyses were conducted in R (version 4.3.2). A P value below 0.05 was considered statistically significant.

RELAINCE Artificial Intelligence Claim

In this paper, the externally validated artificial intelligence method was designed for predicting 2-y TTP, and it was trained and validated for baseline 18F-FDG PET/CT studies in DLBCL patients. The method was validated and tested for baseline PET studies in DLBCL patients only when PET data needed to comply with the image quality standards as previously outlined (10) and preferably applied to PET acquired following EARL standards 1. The performance of the method was evaluated using the AUC and Kaplan–Meier plots.

RESULTS

Study Population

The HOVON-84 dataset was used to train the models described in this study. In total, 296 HOVON-84 DLBCL patients were used to train the model. Details on exclusion and inclusion criteria were previously published (10).

In total, 1,466 scans were available for external validation in the PETRA database. Exclusion criteria for this study included the absence of baseline 18F-FDG PET imaging (n = 95), no follow-up data within 2 y (n = 88), missing World Health Organization performance status (n = 4), and age below 18 y (n = 1). Quality control procedures resulted in the exclusion of patients with incomplete scans (n = 235), essential Digital Imaging and Communications in Medicine information missing (n = 71), no 18F-FDG–avid lesions (n = 32), and scans outside the quality control range (n = 54). Additionally, 50 patients died without progression within 2 y. This resulted in a total of 836 patients included in the study and used for external validation of the model as shown in Figure 2. A description of patient characteristics for all clinical trials is given in Supplemental Table 2.

FIGURE 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 2.

Flowchart of selection of patients included in study for external validation. QC = quality control.

Prediction Models

The CV-AUC of the HOVON-84 trial yielded 0.67 for the IPI model. The MIP-CNN yielded a CV-AUC of 0.72, outperforming the IPI. Both the clinical PET and the PET models also outperformed IPI and the MIP-CNN, with CV-AUCs of 0.76 and 0.75, respectively. The SD, sensitivity, and specificity for each model are given in Table 1. The Youden index was used to establish the threshold for sensitivity and specificity.

View this table:
  • View inline
  • View popup
TABLE 1.

SD, Sensitivity, and Specificity for Models’ AUCs

The external validation on all patients yielded an AUC of 0.60 for the IPI model, whereas the MIP-CNN achieved an AUC of 0.66, which was significantly higher than for the IPI model (P < 0.01). This was also the case for the radiomic models, both of which had significantly higher AUCs than did the IPI: 0.69 for the clinical PET model and 0.71 for the PET model (P < 0.001). This is illustrated in Figure 3. There were no statistical differences between the AUCs of the MIP-CNN and the clinical PET model; however, the PET model yielded a significantly higher AUC than did the MIP-CNN and clinical PET models (P < 0.05). An overview of all AUCs for each individual clinical trial can be found in Figure 4 and Supplemental Table 3. Receiver operating characteristic curves considering the 5 additional clinical trials only are shown in Supplemental Figure 3. As an additional exploratory analysis, we also tested the models’ performances to predict overall survival on all external data (Supplemental Fig. 4).

FIGURE 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 3.

Receiver operating characteristic curves for 2-y TTP for all external data for 3 prediction models compared with IPI score.

FIGURE 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 4.

AUCs of IPI, MIP-CNN, clinical PET, and PET prediction models for all 7 trials, including CV-AUC for training set (H84) and for all 6 external clinical trials together (ALL).

Patients classified as high-risk showed significantly reduced survival rates compared with those classified as low-risk across all prediction models (P < 0.0001). The Kaplan–Meier curves are illustrated in Figure 5. The survival rate for patients within the IPI high-risk group was 67.5%, with a 95% CI of 61.4–74.2. The survival rates for patients within the MIP-CNN, the PET, and the clinical PET high-risk groups were 61.1% (95% CI, 54.8–67.3), 57.1% (95% CI, 50.7–64.3), and 57.1% (95% CI, 50.7–64.3), respectively.

FIGURE 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
FIGURE 5.

Kaplan–Meier survival curves of 2-y TTP stratified into low- and high-risk groups for IPI and MIP-CNN prediction models (A), IPI and clinical PET prediction models (B), and IPI and PET prediction models (C).

DISCUSSION

In this study, we evaluated the predictive performance of the MIP-CNN model in 5 independent clinical trials: GSTT15, IAEA, SAKK, NCRI, and HOVON-130. This model was previously developed and trained using the HOVON-84 trial and initially tested only on the PETAL trial (10). We have shown that the model remains predictive of outcome in all 6 independent clinical trials. Moreover, The MIP-CNN outperformed the IPI score when evaluated on all 836 external patients, and the performance remained consistently better than that of IPI for each individual clinical trial.

The use of U-nets and nnU-nets is increasingly growing, and their application on PET imaging for lymphoma is promising (24,25). Novel artificial intelligence–based methods in this field are mostly applied for tumor segmentation; however, there are only a few papers on the use of deep learning models for outcome prediction in DLBCL. To our knowledge, only 2 other studies have used CNNs with 18F-FDG PET images as the main input. Also using coronal MIPs, Rebaud et al. trained a multitask ranker neural network whose performance for progression-free survival prediction was equivalent to that of total MTV segmented by experienced nuclear medicine physicians (9). In contrast, Liu et al. (8), developed a 3-dimensional CNN for simultaneous automated lesion segmentation and prognosis prediction. These models show comparable performance to our MIP-CNN; however, unlike our study, these lack external validation and further assessment. Although there are similarities among our proposed CNN and the models used by Rebaud et al. and Liu et al., direct comparison of the 3 methods is hampered by the lack of information regarding model training and architecture in these 2 other studies. For this reason, we evaluated 2 radiomic model training procedures that could be replicated in the HOVON-84 trial. The clinical PET model, developed by Eertink et al., includes MTV, maximum distance from the bulkiest lesion to another lesion, SUVpeak, age, and World Health Organization performance status (4). The PET model is a simplified version of the latter, including only PET-extracted features (MTV, SUVpeak, and maximum distance from the bulkiest lesion to another lesion).

Another reason to include radiomic models in this study is the fact that they both require delineation of the tumors to extract prognostic information. The motivation behind this study was to build a prognostic model that would not require any delineation procedure, thus solving the issues that come with such tasks. Tumor delineation is time-consuming, ranging from 3 to 6 min per patient and up to 20 min for complicated cases (26). Moreover, there are no consensus guidelines on tumor delineation, and as a result, many different semiautomated methods are being used, providing substantially different PET uptake values and total MTVs (27).

When comparing the 3 models included in this study, we found that the radiomic models were associated with increased AUCs for 2-y TTP in all individual trials compared with the MIP-CNN. However, it is important to note that the MIP-CNN performance for the external validation on all 836 patients remained statistically equivalent to that of the clinical PET model. The performance differences between the MIP-CNN and the radiomic models may be explained by several factors. First, radiomic model predictors are extracted from manually curated tumor delineations. An SUV threshold of 4.0 is initially used to generate a tumor mask, which is then supervised by a nuclear medicine physician who edits the region to complete the final delineation. This may result in more accurate tumor delineation, with an experienced user removing adjacent physiologic uptake included in the mask and uptake likely due to causes other than lymphoma such as normal variants and infection or inflammation. Second, the use of MIP images instead of fully 3-dimensional images to build our deep learning model can have some limitations. Even though MIPs are more manageable and memory-efficient, it is possible that some relevant information is lost when using MIPs, possibly resulting in less precise predictions. Nevertheless, compared with the radiomic models, the MIP-CNN is a model free of tumor segmentation; therefore, it is easier to apply, allowing routine use in clinical practice.

The 3 models follow a similar trend in performance across the datasets, as seen in Figure 4. This trend might indicate that there are case-mix differences in the data between the clinical trials, affecting the model performance in a consistent way for all models. Those differences may arise from specific patient characteristics that are not considered when building the models. The HOVON-130 trial was the study that included only patients with MYC oncogene rearrangements, which is a well-known high-risk feature (18). This could explain the low AUCs achieved by the 3 prediction models as well as the IPI for this clinical trial. IPI also performed poorly for NCRI, SAKK, and IAEA, which all present a relatively higher proportion of low-risk patients than the other clinical trials. Moreover, SAKK and IAEA consist of a relatively younger population, especially compared with HOVON-84. For these 2 clinical trials, the clinical PET model, which includes age as a predictor, yielded the highest AUC; however, for NCRI, GSTT15, and PETAL, the PET model, without clinical predictors, performed best. These results suggest that inclusion of age and World Health Organization status may have an impact on the model’s prognostic power, also considering that the PET model achieved a significantly higher AUC when tested on all external data.

There were some limitations in this study. We assessed only 2-y TTP as the outcome parameter. TTP was chosen because progression-free and overall survival are influenced by age and because age-related comorbidities and life expectancy can affect outcome for older patients independently of their lymphoma. Moreover, this provided continuity with previous studies that have also reported TTP as the primary endpoint (4,10). Most patients included in this study received rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone treatment; however, variations in treatment regimens across studies were observed, including differences in the number of cycles and the degree of treatment intensification. Another limitation of the MIP-CNN is the lack of activation maps to explain the predictions because of the design of the CNN with 2 branches, as discussed before (10). Yet, by digitally ablating the lesions from the MIPs, it is possible to demonstrate the impact of the lesion on the predictions. This requires, however, tumor segmentation, and to this end the use of artificial intelligence–based tumor segmentations is of interest and currently being explored (24,28).

In summary, this is the first study—to our knowledge—to show the potential of CNNs for outcome prediction and its applicability on such an extensive cohort of 18F-FDG PET baseline DLBCL scans.

CONCLUSION

The MIP-CNN was predictive of outcome in 5 individual external DLBCL trials, with a higher performance than IPI. The PET model had comparable performance to the clinical PET model, which both rely on tumor delineations. Our MIP-CNN can predict treatment outcome in DLBCL without tumor delineation but at the cost of a slightly decreased prognostic performance compared with other delineation-dependent models.

DISCLOSURE

This work was financially supported by the Hanarth Fonds Fund and the Dutch Cancer Society (VU-2018-11648). The sponsor had no role in gathering, analyzing, or interpreting the data. Sally Barrington received departmental funding from Amgen, AstraZeneca, BMS, Novartis, Pfizer, and Takeda. Martine Chamuleau received financial support for the clinical trials from Celgene, BMS, and Gilead. Josée Zijlstra received financial support for clinical trials from Roche, Gilead, and Takeda. Pieternella Lugtenburg received financial support for clinical trials from Takeda and Roche. No other potential conflict of interest relevant to this article was reported.

KEY POINTS

QUESTION: Can we use a deep learning model to predict outcome in DBLCL on multiple independent datasets?

PERTINENT FINDINGS: The deep learning model previously developed in the HOVON-84 dataset remained predictive of outcome in 5 independent external datasets.

IMPLICATIONS FOR PATIENT CARE: Implementation of deep learning could automate treatment outcome prediction, replacing tumor segmentation and user dependencies.

Footnotes

  • Published online Oct. 3, 2024.

  • © 2024 by the Society of Nuclear Medicine and Molecular Imaging.

REFERENCES

  1. 1.↵
    1. Boellaard R,
    2. Delgado-Bolton R,
    3. Oyen WJ,
    4. et al
    . FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–354.
    OpenUrlCrossRefPubMed
  2. 2.↵
    1. Sehn LH,
    2. Donaldson J,
    3. Chhanabhai M,
    4. et al
    . Introduction of combined CHOP plus rituximab therapy dramatically improved outcome of diffuse large B-cell lymphoma in British Columbia. J Clin Oncol. 2005;23:5027–5033.
    OpenUrlAbstract/FREE Full Text
  3. 3.↵
    International Non-Hodgkin’s Lymphoma Prognostic Factors Project. A predictive model for aggressive non-Hodgkin’s lymphoma. N Engl J Med. 1993;329:987–994.
    OpenUrlCrossRefPubMed
  4. 4.↵
    1. Eertink JJ,
    2. van de Brug T,
    3. Wiegers SE,
    4. et al
    . 18F-FDG PET baseline radiomics features improve the prediction of treatment outcome in diffuse large B-cell lymphoma. Eur J Nucl Med Mol Imaging. 2022;49:932–942.
    OpenUrlPubMed
  5. 5.↵
    1. Yamashita R,
    2. Nishio M,
    3. Do RKG,
    4. Togashi K
    . Convolutional neural networks: an overview and application in radiology. Insights Imaging. 2018;9:611–629.
    OpenUrlCrossRefPubMed
  6. 6.↵
    1. Takehiko Fujiwara MM,
    2. Watanuki S,
    3. Mejia MA,
    4. Itoh M
    . Hiroshi Fukuda. Easy detection of tumor in oncologic whole-body PET by projection reconstruction images with maximum intensity projection algorithm. Ann Nucl Med. 1999;13:199–203.
    OpenUrlCrossRefPubMed
  7. 7.↵
    1. Blanc-Durand P,
    2. Jegou S,
    3. Kanoun S,
    4. et al
    . Fully automatic segmentation of diffuse large B cell lymphoma lesions on 3D FDG-PET/CT for total metabolic tumour volume prediction using a convolutional neural network. Eur J Nucl Med Mol Imaging. 2021;48:1362–1370.
    OpenUrlPubMed
  8. 8.↵
    1. Liu P,
    2. Zhang M,
    3. Gao X,
    4. Li B,
    5. Zheng G
    . Joint lymphoma lesion segmentation and prognosis prediction from baseline FDG-PET images via multitask convolutional neural networks. IEEE Access. 2022;10:81612–81623.
    OpenUrl
  9. 9.↵
    1. Rebaud L,
    2. Capobianco N,
    3. Sibille L,
    4. et al
    . Multitask learning-to-rank neural network for predicting survival of diffuse large B-cell lymphoma patients from their unsegmented baseline [18F]FDG-PET/CT scans [abstract]. J Nucl Med. 2022;63(suppl 2):3250.
    OpenUrl
  10. 10.↵
    1. Ferrández MC,
    2. Golla SSV,
    3. Eertink JJ,
    4. et al
    . An artificial intelligence method using FDG PET to predict treatment outcome in diffuse large B cell lymphoma patients. Sci Rep. 2023;13:13111.
    OpenUrlPubMed
  11. 11.↵
    1. Lugtenburg PJ,
    2. de Nully Brown P,
    3. van der Holt B,
    4. et al
    . Rituximab-CHOP with early rituximab intensification for diffuse large B-cell lymphoma: a randomized phase III trial of the HOVON and the Nordic Lymphoma Group (HOVON-84). J Clin Oncol. 2020;38:3377–3387.
    OpenUrlPubMed
  12. 12.↵
    1. Dührsen U,
    2. Muller S,
    3. Hertenstein B,
    4. et al
    . Positron emission tomography-guided therapy of aggressive non-Hodgkin lymphomas (PETAL): a multicenter, randomized phase III trial. J Clin Oncol. 2018;36:2024–2034.
    OpenUrlPubMed
  13. 13.↵
    1. Riley RD,
    2. Archer L,
    3. Snell KIE,
    4. et al
    . Evaluation of clinical prediction models (part 2): how to undertake an external validation study. BMJ. 2024;384:e074820.
    OpenUrlFREE Full Text
  14. 14.↵
    1. Mikhaeel NG,
    2. Smith D,
    3. Dunn JT,
    4. et al
    . Combination of baseline metabolic tumour volume and early response on PET/CT improves progression-free survival prediction in DLBCL. Eur J Nucl Med Mol Imaging. 2016;43:1209–1219.
    OpenUrlCrossRefPubMed
  15. 15.↵
    1. Carr R,
    2. Fanti S,
    3. Paez D,
    4. et al
    . Prospective international cohort study demonstrates inability of interim PET to predict treatment failure in diffuse large B-cell lymphoma. J Nucl Med. 2014;55:1936–1944.
    OpenUrlAbstract/FREE Full Text
  16. 16.↵
    1. Mikhaeel NG,
    2. Cunningham D,
    3. Counsell N,
    4. et al
    . FDG-PET/CT after two cycles of R-CHOP in DLBCL predicts complete remission but has limited value in identifying patients with poor outcome: final result of a UK National Cancer Research Institute prospective study. Br J Haematol. 2021;192:504–513.
    OpenUrlPubMed
  17. 17.↵
    1. Mamot C,
    2. Klingbiel D,
    3. Hitz F,
    4. et al
    . Final results of a prospective evaluation of the predictive value of interim positron emission tomography in patients with diffuse large B-cell lymphoma treated with R-CHOP-14 (SAKK 38/07). J Clin Oncol. 2015;33:2523–2529.
    OpenUrlAbstract/FREE Full Text
  18. 18.↵
    1. Chamuleau MED,
    2. Burggraaff CN,
    3. Nijland M,
    4. et al
    . Treatment of patients with MYC rearrangement positive large B-cell lymphoma with R-CHOP plus lenalidomide: results of a multicenter HOVON phase II trial. Haematologica. 2020;105:2805–2812.
    OpenUrlPubMed
  19. 19.↵
    1. Barrington SF,
    2. Zwezerijnen B,
    3. de Vet HCW,
    4. et al
    . Automated segmentation of baseline metabolic total tumor burden in diffuse large B-cell lymphoma: which method is most successful? A study on behalf of the PETRA Consortium. J Nucl Med. 2021;62:332–337.
    OpenUrlAbstract/FREE Full Text
  20. 20.↵
    1. Boellaard R
    . Quantitative oncology molecular analysis suite: ACCURATE [abstract]. J Nucl Med. 2018;59(suppl 1):1753.
    OpenUrl
  21. 21.↵
    1. Eertink JJ,
    2. Zwezerijnen GJC,
    3. Heymans MW,
    4. et al
    . Baseline PET radiomics outperforms the IPI risk score for prediction of outcome in diffuse large B-cell lymphoma. Blood. 2023;141:3055–3064.
    OpenUrlPubMed
  22. 22.↵
    1. DeLong ER,
    2. Delong DM,
    3. Clarke-Pearson DL
    . Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845.
    OpenUrlCrossRefPubMed
  23. 23.↵
    1. Gerds TA,
    2. Schumacher M
    . Consistent estimation of the expected Brier score in general survival models with right-censored event times. Biom J. 2006;48:1029–1040.
    OpenUrlCrossRefPubMed
  24. 24.↵
    1. Yousefirizi F,
    2. Klyuzhin IS,
    3. O JH,
    4. et al
    . TMTV-Net: fully automated total metabolic tumor volume segmentation in lymphoma PET/CT images—a multi-center generalizability analysis. Eur J Nucl Med Mol Imaging. 2024;51:1937–1954.
    OpenUrlPubMed
  25. 25.↵
    1. Leung KH,
    2. Rowe SP,
    3. Sadaghiani MS,
    4. et al
    . Deep semisupervised transfer learning for fully automated whole-body tumor quantification and prognosis of cancer on PET/CT. J Nucl Med. 2024;65:643–650.
    OpenUrlAbstract/FREE Full Text
  26. 26.↵
    1. Ilyas H,
    2. Mikhaeel NG,
    3. Dunn JT,
    4. et al
    . Defining the optimal method for measuring baseline metabolic tumour volume in diffuse large B cell lymphoma. Eur J Nucl Med Mol Imaging. 2018;45:1142–1154.
    OpenUrlPubMed
  27. 27.↵
    1. Ferrández MC,
    2. Eertink JJ,
    3. Golla SSV,
    4. et al
    . Combatting the effect of image reconstruction settings on lymphoma [18F]FDG PET metabolic tumor volume assessment using various segmentation methods. EJNMMI Res. 2022;12:44.
    OpenUrlPubMed
  28. 28.↵
    LalithShiyam/LION: lionz-v.0.9.1. Zenodo website. https://zenodo.org/records/12626789. Published July 2, 2024. Accessed September 18, 2024.
  • Received for publication June 6, 2024.
  • Accepted for publication September 9, 2024.
PreviousNext
Back to top

In this issue

Journal of Nuclear Medicine: 65 (11)
Journal of Nuclear Medicine
Vol. 65, Issue 11
November 1, 2024
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
  • Complete Issue (PDF)
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Journal of Nuclear Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Validation of an Artificial Intelligence–Based Prediction Model Using 5 External PET/CT Datasets of Diffuse Large B-Cell Lymphoma
(Your Name) has sent you a message from Journal of Nuclear Medicine
(Your Name) thought you would like to see the Journal of Nuclear Medicine web site.
Citation Tools
Validation of an Artificial Intelligence–Based Prediction Model Using 5 External PET/CT Datasets of Diffuse Large B-Cell Lymphoma
Maria C. Ferrández, Sandeep S.V. Golla, Jakoba J. Eertink, Sanne E. Wiegers, Gerben J.C. Zwezerijnen, Martijn W. Heymans, Pieternella J. Lugtenburg, Lars Kurch, Andreas Hüttmann, Christine Hanoun, Ulrich Dührsen, Sally F. Barrington, N. George Mikhaeel, Luca Ceriani, Emanuele Zucca, Sándor Czibor, Tamás Györke, Martine E.D. Chamuleau, Josée M. Zijlstra, Ronald Boellaard
Journal of Nuclear Medicine Nov 2024, 65 (11) 1802-1807; DOI: 10.2967/jnumed.124.268191

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
Validation of an Artificial Intelligence–Based Prediction Model Using 5 External PET/CT Datasets of Diffuse Large B-Cell Lymphoma
Maria C. Ferrández, Sandeep S.V. Golla, Jakoba J. Eertink, Sanne E. Wiegers, Gerben J.C. Zwezerijnen, Martijn W. Heymans, Pieternella J. Lugtenburg, Lars Kurch, Andreas Hüttmann, Christine Hanoun, Ulrich Dührsen, Sally F. Barrington, N. George Mikhaeel, Luca Ceriani, Emanuele Zucca, Sándor Czibor, Tamás Györke, Martine E.D. Chamuleau, Josée M. Zijlstra, Ronald Boellaard
Journal of Nuclear Medicine Nov 2024, 65 (11) 1802-1807; DOI: 10.2967/jnumed.124.268191
Twitter logo Facebook logo LinkedIn logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Bookmark this article

Jump to section

  • Article
    • Visual Abstract
    • Abstract
    • MATERIALS AND METHODS
    • RESULTS
    • DISCUSSION
    • CONCLUSION
    • DISCLOSURE
    • Footnotes
    • REFERENCES
  • Figures & Data
  • Supplemental
  • Info & Metrics
  • PDF

Related Articles

  • PubMed
  • Google Scholar

Cited By...

  • No citing articles found.
  • Google Scholar

More in this TOC Section

  • First-in-Human Study of 18F-Labeled PET Tracer for Glutamate AMPA Receptor [18F]K-40: A Derivative of [11C]K-2
  • Detection of HER2-Low Lesions Using HER2-Targeted PET Imaging in Patients with Metastatic Breast Cancer: A Paired HER2 PET and Tumor Biopsy Analysis
  • [11C]Carfentanil PET Whole-Body Imaging of μ-Opioid Receptors: A First in-Human Study
Show more Clinical Investigation

Similar Articles

Keywords

  • diffuse large B-cell lymphoma
  • maximum-intensity projection
  • convolutional neural networks
  • time to progression
  • prediction
SNMMI

© 2025 SNMMI

Powered by HighWire