It is now recognized that intratumoral heterogeneity is associated with more aggressive tumor phenotypes leading to poor patient outcomes (1). Medical imaging plays a central role in related investigations, because radiologic images are routinely acquired during cancer management. Imaging modalities such as 18F-FDG PET, CT, and MRI are minimally invasive and would constitute an immense source of potential data for decoding tumor phenotypes (2). Computer-aided diagnosis methods and systems exploiting medical images have been developed for decades, but their wide clinical implementation has been hampered by false-positive rates (3). As a consequence, routine clinical exploitation of images still consists mostly of visual or manual assessments. Today, the development of machine-learning techniques and the rise of computational power allow for the exploitation of a large number of quantitative features (4). This ability has led to a new incarnation of computer-aided diagnosis, “radiomics,” which refers to the characterization of tumor phenotypes via the extraction of high-dimensional mineable data—for example, morphologic, intensity-based, fractal-based, and textural features—from medical images and whose subsequent analysis aims at supporting clinical decision making.
A first proof-of-concept study dedicated to the prediction of tumor outcomes using PET radiomics-based multivariable models built via machine learning was published in 2009 (5). The term radiomics was then first used in 2010 to describe how imaging features can reflect gene expression (6). Other early radiomics studies followed (7,8), including some highlighting early on that the reliability of existing features is affected by acquisition protocol, reconstruction, test–retest consistency, preprocessing, and segmentation (9–13). The overall framework of radiomics was then explicitly described in 2012 (14), and in the years that followed, this emerging field experienced exponential growth (15).
In the context of precision oncology, the radiomics workflow for the construction of predictive or prognostic models consists of 3 major steps (Fig. 1A): medical image acquisition, computation of radiomics features, and statistical analysis and machine learning. To apply the models to new patients for treatment personalization, a prospective model evaluation (preferably in a multicenter setup) is necessary.
Radiomics research has already shown great promise for supporting clinical decision making. However, the fact that radiomics-based strategies have not yet been translated to routine practice can be partly attributed to the low reproducibility of most current studies. The workflow for computing features is complex and involves many steps (Fig. 1B), often leading to incomplete reporting of methodologic information (e.g., texture matrix design choices and gray-level discretization methods). As a consequence, few radiomics studies in the current literature can be reproduced from start to end. Other major issues include the limited number of patients available for radiomics research, the high false-positive rates (similar to those of analogous computer-aided diagnosis methods), and the reporting of overly optimistic results, all of which affect the generalizability of the conclusions reached in current studies.
Medical imaging journals are currently overwhelmed by a large volume of radiomics-related articles of variable quality and associated clinical value. The aim of this editorial is to present guidelines that we think can improve the reporting quality and therefore the reproducibility of radiomics studies, as well as the statistical quality of radiomics analyses. These guidelines can serve not only the authors of such studies but also the reviewers who assess their appropriateness for publication.
GUIDELINES FOR IMPROVING QUALITY OF RADIOMICS ANALYSES
The complexity of the radiomics workflow increases the need to standardize computation methods (16–19). Since September 2016, about 55 researchers from 19 institutions in 8 countries have participated in the Image Biomarker Standardization Initiative (IBSI), which aims at standardizing both the computation of features and the image-processing steps required before feature extraction (e.g., image interpolation and discretization). First, a simple digital phantom with few discrete image intensities was used to standardize the computation of 172 features from 11 categories. Then, a set of CT images of a lung cancer patient was used to standardize the image-processing steps. The initiative is now reaching completion, and a consensus on image processing and computation of features was reached over time (20,21). However, more work is likely necessary to define and benchmark MRI- and PET-specific image-processing steps. Nonetheless, the standardized workflow (Fig. 1B), along with benchmark values, can serve as a calibration tool for future investigations. Ultimately, it may also lead to standardized software solutions available to the community, as the widespread use of standardized computation methods would greatly enhance the reproducibility potential of radiomics studies. It would also be desirable that the code of existing software be updated to conform with future standards to be established by the IBSI. Furthermore, it is essential to rely on supplementary material (usually allowed in most journals) to provide complete methodologic details, including the comprehensive description of image acquisition protocols, sequence of operations, image postacquisition processing, tumor segmentation, image interpolation, image resegmentation and discretization, formulas for the calculation of features, and benchmark calibrations. Table 1 provides guidelines on feature computation details to be reported in radiomics studies.
After feature extraction, statistical analysis relates features to clinical outcomes. No consensus exists about what defines “good” radiomics studies. For example, the demonstration that a newly designed feature is strongly associated with a given outcome, or that a novel radiomics method holds great potential, may be of interest if compared with the most reproducible and robust features or prognostic clinical information already used. Nonetheless, for the construction of prediction models via multivariable analysis, there are two basic requirements. First, all methodologic details and clinical information must be clearly reported or described to facilitate reproducibility and comparison with other studies and metaanalyses. Second, radiomics-based models must be tested in sufficiently large patient datasets distinct from teaching (training and validation) sets to statistically demonstrate their efficacy over conventional models (e.g., existing biomarkers, tumor volume, and cancer stage). Ideally, for optimal reproducibility potential, all data and programming code related to the study should also be made available to the community. Table 2 provides guidelines based on the “radiomics quality score” (www.radiomics.world), which can help evaluate the quality of radiomics studies. More guidelines on reproducible prognostic modeling can be found in the TRIPOD statement (transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) (22).
RESPONSIBLE RESEARCH IS THE KEY
Some guiding principles already exist to help radiomics scientists further implement the responsible research paradigm into their current practice. For one, the Responsible Research and Innovation website (www.rri-tools.eu) provides useful guidelines. Furthermore, a concise set of principles for better scientific data management and stewardship—the “FAIR guiding principles” (23)—has been defined, stating that all research objects should be findable, accessible, interoperable, and reusable. Implementation of the FAIR principles within the radiomics field can facilitate its faster clinical translation. Many research tools and online repositories already implement a variety of aspects of the FAIR principles (23), and we can add two other tools of interest: the Cancer Imaging Archive (www.cancerimagingarchive.net), a service that anonymizes and hosts medical images for public download, and the Radiomics Ontology (www.bioportal.bioontology.org/ontologies/RO), a repository on the National Center for Biomedical Ontology BioPortal aiming to improve the interoperability of radiomics analyses via consistent tagging of radiomics features, segmentation algorithms, and imaging filters. This ontology could provide a standardized way of reporting radiomics data and methods, and would more concisely summarize the implementation details of a given radiomics workflow (e.g., Table 1).
To conclude, initial pioneer studies in radiomics have paved the way to an exciting field and to most promising methods for better personalizing cancer treatments. Yet, better standardization, transparency, and sharing practices in the radiomics community are required to improve the quality of published studies and to achieve a faster clinical translation. The best way to reach this goal is through responsible radiomics research, which can be summarized into three working principles that we should all try to follow as a research community: design and conduct high-quality radiomics research, write and present fully transparent radiomics research, and share data and methods.
DISCLOSURE
Alex Zwanenburg is supported by the German Federal Ministry of Education and Research (BMBF-0371N52). Martin Vallières is supported by the National Institute of Cancer (INCa project C14020NS). No other potential conflict of interest relevant to this article was reported.
Footnotes
Published online Nov. 24, 2017.
- © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
REFERENCES
- Received for publication August 31, 2017.
- Accepted for publication November 13, 2017.