Radiomics is defined as the high-throughput extraction of quantitative metrics from medical images (1). One of its main assumptions is that medical images are considered not merely pictures for visual assessment but rather minable quantitative data (2) that may not necessarily be captured by the human eye (3).
In this issue of The Journal of Nuclear Medicine, Orlhac et al. present a study comparing visual assessment of uptake heterogeneity on PET images by experts and a subset of radiomics metrics, namely textural features (4). They exploited both clinical and simple simulated PET images, going further than previous studies performed using clinical data only (5–7). Such studies are useful because they provide additional understanding relative to the visual meaning of quantitative metrics that cannot easily be explained to nonspecialists. These studies have focused on the PET component and the 18F-FDG uptake heterogeneity. Similar analyses have been performed with CT (8) and MRI (9).
See page 387
One important finding is that textural features calculated after a relative quantization process (i.e., resampling the original image intensities into a variable number of bins of fixed width; e.g., 0.5 SUV (10)) correlate better with visual assessment than do those calculated after the usual quantization process (i.e., uniformly resampling the original intensities into a set number of bins; e.g., 64 or 128). These different observations can be related overall to different factors, such as the very different correlative relationships between texture parameters and either SUVmax or the number of voxels involved (tumor volume), which also have been previously reported (10–13). Other quantization processes (histogram equalization, Max–Loyd clustering, and others) can lead to yet further differences in distribution and associated clinical value (14).
The consensus among experts was also substantially higher than in earlier studies, mostly because only 2 categories (heterogeneous vs. homogeneous) were considered, compared with 3 (5), 4 (7), or even 5 (8) in previous studies. In one study (5), the visual assessment into 3 categories had limited prognostic value compared with textural features (5). Because there was no clinical endpoint (survival, outcome) in the study by Orhlac et al., we cannot draw conclusions about the clinical value of the features that correlated well with visual assessment, although it is safe to assume that these features will be useful in clinical applications for which there is a correlation between patient outcome and the level of uptake heterogeneity visually assessed (or SUVmax, given the observed correlations).
The primary goal of radiomics is to build clinical models using machine-learning techniques (15) to predict patient outcome, thereby allowing better patient management. These multiparametric models, which are likely to be unintelligible even to experts because they combine a large number of high-order multimodality image features (13,16), should outperform visual analysis in terms of both accuracy and reproducibility. To associate a visual meaning to such models can be even more challenging because they can also incorporate information from other fields (demographics, histopathology, genomics). The human brain can take into account only a limited number of parameters in making a decision; therefore, these multiparametric models will not be easily apprehended by end users. These models will clearly demand a high level of precision and robustness in order to be accepted and relied on to formulate a clinical decision. Within this context, a rigorous process of model development (proper training) and validation (independent large cohorts) is needed, which is still far from being a standard, although some encouraging results have been published (17,18).
The current radiomics paradigm consists in adding quantitative information to the visual analysis by radiologists and nuclear medicine physicians, rather than replacing it entirely. For instance, it was recently shown that a set of semantic features obtained from visual assessment by radiologists could beneficially complement quantitative radiomics in determining epidermal growth factor receptor mutations in lung cancer (19). However, a recent trend in medical imaging is to exploit techniques from the field of deep learning (20), with examples in image segmentation (21) or radiomics-type studies (22). This will further complicate the issue of association with visual analysis. Indeed, on the one hand the standard radiomics workflow relies on the extraction of carefully designed features based on domain expertise (e.g., a specific calculation in the intensity histogram or in a predesigned texture matrix), some of which are clearly inspired by the human visual system. On the other hand, deep learning methods automatically discover features from data and the representations useful for the task at hand using a general-purpose learning procedure such as convolutional neural networks. These require substantial amounts of data not easily available in the field of medical imaging, particularly in PET/CT. Potential solutions include transfer learning, consisting in using convolutional neural networks trained for an unrelated task using large datasets, and adapting them to a different setting (23,24). If these tools were to advantageously replace the current workflow of radiomics, removing the need for tumor segmentation or the complex task of selecting relevant and reliable features (11,25), as well as improving the ability to handle standardization issues (26), the relationship with visual analysis by experts would not simply be more difficult but certainly unnecessary to establish.
DISCLOSURE
No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Nov. 3, 2016.
- © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication October 4, 2016.
- Accepted for publication October 11, 2016.