Abstract
2588
Introduction: Total metabolic tumor volume (TMTV) as a measure of metabolically active tumor burden has shown value as a strong prognostic factor before initiating therapy; e.g. in diffuse large B-cell lymphoma (DLBCL) and primary mediastinal B-cell lymphoma (PMBCL). However, computing TMTV in lymphoma is complex at both individual lesion and whole-body levels. The estimated segmentations are usually evaluated against surrogate ground truth (GT) such as manual segmentations using metrics such as Dice similarity coefficient (DSC) and Hausdorff distance (HD). However, these metrics are not directly correlated with the clinical task of computing TMTV (Jha et al. 2021). Typical loss functions minimize the distance between the estimated segmentation and a certain ground truth with well-defined hard edges. However, there are no well-defined hard edges (due to partial volume effects) between tumor and surrounding tissues, and in fact even at the microscopic level. Thus, minimizing the distance from a segmentation with hard edges may not necessarily imply optimizing TMTV quantification. Recognizing this issue, we propose a novel loss function for a 3D U-Net model designed towards the aim of improving TMTV quantification.
Methods: Our designed loss function is based on the mean square error (MSE). Instead of using an elementwise MSE loss function (which faces issues in segmentation tasks), our proposed loss function calculates the MSE loss between the intensity of a voxel of the segmented volume and all its corresponding neighbors in the GT. This models the idea of not emphasizing hardness of edges. Two values were considered for the size of the neighborhood search (m), namely m=16 voxels and m=64, to account for different lesion sizes. This loss function was coupled with an exponential term (Pezzano et al. 2021) to minimize false negatives. Thus, the loss function was given by: L(v,v^)=(1/m)Σim (v^i-vi)2β(v^i-vi) where vi and v^i are the intensity values of the i-th voxel of the predicted and GT volumes, respectively. After calculating all the differences with neighbor voxels, the minimum of these losses is considered as the loss value for the current voxel. The hyper parameter β was chosen to yield over-segmentation over under-segmentation, with β=1.2 in this study. Our data are from three cohorts: 1) DLBCL cases from center A (n=80), 2) PMBCL cases from center A (n=125), and 3) DLBCL cases from center B (n=42). Data from cohorts 1 and 2 were used for training and validation, respectively, and data from cohort 3 were considered as the external “unseen” test set. In the three cohorts, the average voxel size was 3.9×3.9×3.36 mm3 and the median lesion size was 3.48 cc. We trained the 3D U-net model for both m values (16 and 64), and the ensemble of these two models was then applied to the external test data. The estimated versus GT TMTV values were compared via Pearson correlation (determination coefficient R-squared) including statistical analysis. The method was also compared to another approach that used Dice loss.
Results: The correlation analysis between GT and segmented TMTV on the “unseen” test data yielded improved performance R2=0.83 (p<0.001) for proposed loss as compared to dice loss (R2=0.73 (p<0.001)) (Figures 1&2). The evaluation metrics for 3D U-Net with Dice loss were DSC=0.71±0.12 and HD=24.3±6.2, and for proposed loss were DSC=0.68±0.21 and HD=30.4±11.2. These results show that a method optimized with Dice loss may yield improved DSC and HD but may not necessarily provide improved performance on the clinical task of TMTV computation.
Conclusions: The proposed deep-learning-based method, by using a modified blurred-MSE loss improved TMTV quantification performance compared to a method that used Dice loss. This shows that accounting for soft edges at lesion boundaries may improve TMTV quantification. Further, this motivates the development of deep learning methods that optimize performance on clinical tasks.