Skip to main content

Main menu

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Corporate & Special Sales
    • Journal Claims
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Permissions
    • Advertisers
    • Continuing Education
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI

User menu

  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
Journal of Nuclear Medicine
  • SNMMI
    • JNM
    • JNMT
    • SNMMI Journals
    • SNMMI
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Journal of Nuclear Medicine

Advanced Search

  • Home
  • Content
    • Current
    • Ahead of print
    • Past Issues
    • JNM Supplement
    • SNMMI Annual Meeting Abstracts
  • Subscriptions
    • Subscribers
    • Institutional and Non-member
    • Rates
    • Corporate & Special Sales
    • Journal Claims
  • Authors
    • Submit to JNM
    • Information for Authors
    • Assignment of Copyright
    • AQARA requirements
  • Info
    • Permissions
    • Advertisers
    • Continuing Education
  • About
    • About Us
    • Editorial Board
    • Contact Information
  • More
    • Alerts
    • Feedback
    • Help
    • SNMMI Journals
  • Follow JNM on Twitter
  • Visit JNM on Facebook
  • Join JNM on LinkedIn
  • Subscribe to our RSS feeds
Research ArticleThe State of the Art

The T.R.U.E. Checklist for Identifying Impactful Artificial Intelligence–Based Findings in Nuclear Medicine: Is It True? Is It Reproducible? Is It Useful? Is It Explainable?

Irène Buvat and Fanny Orlhac
Journal of Nuclear Medicine June 2021, 62 (6) 752-754; DOI: https://doi.org/10.2967/jnumed.120.261586
Irène Buvat
Laboratory of Translational Imaging in Oncology, U1288 INSERM, Institut Curie, Université Paris Saclay, Orsay, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Fanny Orlhac
Laboratory of Translational Imaging in Oncology, U1288 INSERM, Institut Curie, Université Paris Saclay, Orsay, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Article
  • Info & Metrics
  • PDF
Loading

Dozens of articles describing artificial intelligence (AI) developments are submitted to medical imaging journals every month, including in the nuclear medicine field. Our mission, as a nuclear medicine community, is to contribute to a better understanding of normal and pathologic processes by probing molecular mechanisms with an unparalleled sensitivity, ultimately with the goal of improving patient care. This mission calls for research in tracer development, instrumentation, data analysis, and clinical studies. It is becoming obvious that our mission will be greatly facilitated by AI-based tools. It is far too early to estimate the exact impact AI will have on nuclear medicine research and clinical practice. Still, we can already claim that AI will assist in the automation of many tasks, including image acquisition, image interpretation, and image quantification, hence increasing the reproducibility, overall quality, and usefulness of nuclear medicine scans (1⇓–3). Less clear is whether AI can be used to further biomedical knowledge, such as through better understanding of molecular mechanisms or identification of new clinically useful biomarkers involving nuclear medicine data. So far, in nuclear medicine, no new biomarkers involving sophisticated radiomic features or deep learning models have emerged from the thousands of articles already published. None of the published promising radiomic signatures, nomograms, or AI-based models have been convincingly demonstrated by independent groups as must-have biomarkers superior to existing practice based on large-scale evaluation. Yet, we trust that this goal is within reach. AI has demonstrated its ability to identify and reveal complex information hidden in images, and it should be possible to use this information to extract clinically useful biomarkers. To get to this point, we have to be extremely demanding in terms of what is published so that the most promising findings can easily be identified by readers. This ability would allow the community to subsequently gather the large body of evidence needed to turn a promising result into an actionable biomarker, a testable assumption, or a widely used automated method. To facilitate the identification of those contributions that might be ground-breaking, we encourage the authors and reviewers of AI-based manuscripts to carefully consider a simple checklist—the T.R.U.E. checklist—in which the acronym T.R.U.E. comprises 4 questions: Is it true? Is it reproducible? Is it useful? Is it explainable? A “yes” answer to all 4 questions increases the likelihood that the reporting will be impactful. In fact, these 4 questions should be part of every professional review process of any scientific paper—whatever the research topic—and have been extensively used in the past. Yet, they are of particular and critical relevance to papers using AI-based methods, because of the specifics of AI. We now briefly elaborate on these questions to more precisely explain what they imply in the context of AI-based studies.

IS IT TRUE?

The question of truth is highly relevant because a large proportion of AI-based studies in medical imaging are still biased by issues well known to data scientists, such as bias in the training population (e.g., sex, ethnicity, and age), data leakage (i.e., test data used explicitly or implicitly during the training phase) (4), or overfitting. This bias most often results in a lack of generalizability of the AI-based model, meaning that the results and reported level of performance will not hold on different datasets (5). By default, we should assume that the findings, especially when outstanding, are biased, and we should chase potential confounding factors by all means. Control experiments (similar to experiments using a sham group or placebo arm in clinical trials) should be used and reported whenever relevant, giving enough evidence that the findings are scientifically valid. For instance, the probability of false-positive findings can be determined by repeating the extensive model-building and model-evaluation process after randomly permuting the label associated with each patient. Expert data scientists should be called on to assist in the possible identification of bias or sources of data leakage, given that these can be subtle and difficult to detect. Medical experts of course remain essential to detect bias or possible confounding effects associated with the composition of the patient samples.

IS IT REPRODUCIBLE?

The reproducibility crisis affects many fields and has been extensively studied and debated (6), including in the field of radiology (7,8). There have been laudable efforts over the last few years to increase transparency, with the very positive trend of data and models being shared more frequently, resulting in an overall improvement in radiomic and AI-based imaging study quality (9). Yet, even when authors share their models developed within well-known frameworks (e.g., TensorFlow or CAFFE [Convolutional Architecture for Fast Feature Embedding]) using one of the many resource-sharing platforms (e.g., GitHub, SourceForge, GitLab, GitKraken, or Bitbucket), this sharing is often not sufficient to actually reproduce the findings, even when the data are also provided. One of the reasons is that most AI-based models are complex and involve many steps and parameters, such as those relating to image preprocessing, data augmentation, and learning schemes, and these are usually not fully described, despite significantly impacting the results. In AI, “the devil is in the details,” as the saying goes. To overcome this reproducibility challenge and move the field forward, we strongly encourage authors to carefully describe their methods and provide data or code (either source code or executable code) that might be needed to reproduce the investigation or test the model on independent data. In addition, similar to the current practice of calling on statistical expertise to validate the statistical methodology used in scientific manuscripts, we recommend calling on specific expertise to practically check that the provided description or material makes it possible to reproduce the findings and test the models on external data. This extra workload on the reviewers would hugely increase the value of published AI-based contributions. We expect contributions that report reproducible methods to have a much greater impact than those that do not.

IS IT USEFUL?

The usefulness should be appreciated with respect to state-of-the-art knowledge and methods, and a comparison of results with previously published data is a good way to assess the usefulness of new findings. Such comparisons can be difficult when different methods are not assessed on the same dataset, because of many possible confounding factors. Sharing of datasets, which can then be used as benchmarks to compare different methods, as in medical imaging challenges (10), can facilitate fair comparison. In what respect the new findings are superior to existing, and often simpler, methods should always be demonstrated. Performance analysis should include metrics characterizing the robustness of the method with respect to potential perturbations (e.g., data of different quality) so as to properly assess the trade-off between complexity, accuracy, and robustness achieved by different models. Occam’s razor should remain the rule until well-supported evidence of the superiority of less intuitive and more complex models is obtained. Although AI is extremely powerful, its power should rather be used when conventional statistical approaches or signal-processing methods are insufficient. There can be different motivations for using an AI model: an AI-based method can save time while equaling human observer performance (11), it can equal human observer performance while reducing interobserver reproducibility (12), it might outperform existing human-based performance (13) or algorithm-based performance (14) (although this will have to be proven in prospective studies), or it might even uncover unknown phenomena (15). Whatever the scenario, the added value of the AI model should be well substantiated.

IS IT EXPLAINABLE?

AI is not a magic wand. It is a powerful set of algorithms that learn from examples and have the unique ability to identify structures in highly dimensional data. When AI is used to automate a task that humans can do by learning from many examples, the rules are deduced by the AI from the examples. The performance to be expected will depend on the representativity of the learning ensemble compared with the cases that can be encountered in practice. A more challenging application is having the AI succeed in doing something that we, as humans, cannot do (yet). As an example, we are currently at a loss when predicting why certain patients will respond to immunotherapy whereas others will not. For these applications, investigating what makes an AI algorithm successful is essential to avoid misinterpretation and prevent overestimation of the power of AI. For example, a misinterpretation of an AI decision-making process was published in a highly respected journal (16), before a reanalysis of the data elegantly demonstrated the incorrect understanding of the initial results (17). This error emphasizes the need for scrutiny of the key elements explaining the performance of an AI-based model. By better understanding the AI model and which specific information it uses, we might also gain knowledge of the biologic mechanisms involved. For this explanation step, speculation is still currently the rule. To use AI as a datascope that will help us better understand the molecular mechanisms based on image content, we have to go from speculation to hypothesis formulation and then hypothesis testing using appropriate in silico, in vitro, or in vivo experimental design.

Explainable AI is currently an extremely active area of research, with the ongoing development of numerous methods for approaching explainability (18), although fully satisfactory explanations may not always be feasible because of the high complexity and dimensionality of the data (19). The “Is it explainable?” question is thus certainly the most difficult one to answer convincingly. Yet, it should not be avoided and should be addressed whenever possible so that AI can help us learn from the data.

CONCLUSION

It is our conviction that the articles for which all 4 T.R.U.E. questions are convincingly addressed have a much higher likelihood of yielding significant advances in our field compared with papers that do not meet this requirement. We thus encourage all investigators and authors to take the time to reflect on this easy-to-remember checklist before submitting to The Journal of Nuclear Medicine, to write out well-supported evidence of their responses to these questions, and to adjust their claims accordingly. We also invite all the devoted reviewers of The Journal of Nuclear Medicine to keep this checklist in mind when reviewing articles involving AI algorithms. In addition, to further assist investigators in the development of sound and reproducible AI-based research, the AI task force of the Society of Nuclear Medicine and Molecular Imaging will soon release consensus recommendations underlying the specifics associated with nuclear medicine applications.

DISCLOSURE

No potential conflict of interest relevant to this article was reported.

NOTEWORTHY

  • ■ AI algorithms are currently proposed for many different purposes in nuclear medicine.

  • ■ The reporting of these algorithms poses special challenges that require appropriate transparency and a high level of scientific rigor.

  • ■ Any report involving an AI-based method should carefully address and discuss the scientific validity, reproducibility, usefulness, and explainability of the findings.

ACKNOWLEDGMENTS

We thank the anonymous reviewers for their insightful comments and David Wallis for carefully proofreading the manuscript.

Footnotes

  • Published online Mar. 26, 2021.

  • © 2021 by the Society of Nuclear Medicine and Molecular Imaging.

REFERENCES

  1. 1.↵
    1. Sibille L,
    2. Seifert R,
    3. Avramovic N,
    4. et al
    . 18F-FDG PET/CT uptake classification in lymphoma and lung cancer by using deep convolutional neural networks. Radiology. 2020;294:445–452.
    OpenUrl
  2. 2.↵
    1. Betancur J,
    2. Commandeur F,
    3. Motlagh M,
    4. et al
    . Deep learning for prediction of obstructive disease from fast myocardial perfusion SPECT: a multicenter study. JACC Cardiovasc Imaging. 2018;11:1654–1663.
    OpenUrlAbstract/FREE Full Text
  3. 3.↵
    1. Ding Y,
    2. Sohn JH,
    3. Kawczynski MG,
    4. et al
    . A deep learning model to predict a diagnosis of Alzheimer disease by using 18F-FDG PET of the brain. Radiology. 2019;290:456–464.
    OpenUrlCrossRefPubMed
  4. 4.↵
    1. Wen J,
    2. Thibeau-Sutre E,
    3. Diaz-Melo M,
    4. et al
    . Convolutional neural networks for classification of Alzheimer’s disease: overview and reproducible evaluation. Med Image Anal. 2020;63:101694.
    OpenUrl
  5. 5.↵
    1. Reuzé S,
    2. Orlhac F,
    3. Chargari C,
    4. et al
    . Prediction of cervical cancer recurrence using textural features extracted from 18F-FDG PET images acquired with different scanners. Oncotarget. 2017;8:43169–43179.
    OpenUrl
  6. 6.↵
    1. Wallach JD,
    2. Boyack KW,
    3. Ioannidis JPA.
    Reproducible research practices, transparency, and open access data in the biomedical literature, 2015-2017. PLoS Biol. 2018;16:e2006930.
    OpenUrlCrossRefPubMed
  7. 7.↵
    1. Wright BD,
    2. Vo N,
    3. Nolan J,
    4. et al
    . An analysis of key indicators of reproducibility in radiology. Insights Imaging. 2020;11:65.
    OpenUrl
  8. 8.↵
    1. Haibe-Kains B,
    2. Adam GA,
    3. Hosny A,
    4. et al
    . Transparency and reproducibility in artificial intelligence. Nature. 2020;586:E14–E16.
    OpenUrl
  9. 9.↵
    1. Sollini M,
    2. Antunovic L,
    3. Chiti A,
    4. Kirienko M.
    Towards clinical application of image mining: a systematic review on artificial intelligence and radiomics. Eur J Nucl Med Mol Imaging. 2019;46:2656–2672.
    OpenUrl
  10. 10.↵
    Challenges. Grand Challenge website. https://grand-challenge.org/challenges/. Accessed April 26, 2021.
  11. 11.↵
    1. Liu X,
    2. Faes L,
    3. Kale AU,
    4. et al
    . A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health. 2019;1:e271–e297.
    OpenUrlPubMed
  12. 12.↵
    1. Esteva A,
    2. Kuprel B,
    3. Novoa RA,
    4. et al
    . Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118.
    OpenUrlCrossRefPubMed
  13. 13.↵
    1. Ardila D,
    2. Kiraly AP,
    3. Bharadwaj S,
    4. et al
    . End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med. 2019;25:954–961.
    OpenUrlCrossRefPubMed
  14. 14.↵
    1. van Dijk LV,
    2. Van den Bosch L,
    3. Aljabar P,
    4. et al
    . Improving automatic delineation for head and neck organs at risk by deep learning contouring. Radiother Oncol. 2020;142:115–123.
    OpenUrlCrossRef
  15. 15.↵
    1. Poplin R,
    2. Varadarajan AV,
    3. Blumer K,
    4. et al
    . Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng. 2018;2:158–164.
    OpenUrl
  16. 16.↵
    1. Aerts HJ,
    2. Velazquez ER,
    3. Leijenaar RT,
    4. et al
    . Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:4006.
    OpenUrlCrossRefPubMed
  17. 17.↵
    1. Welch ML,
    2. McIntosh C,
    3. Haibe-Kains B,
    4. et al
    . Vulnerabilities of radiomic signature development: the need for safeguards. Radiother Oncol. 2019;130:2–9.
    OpenUrl
  18. 18.↵
    1. Molnar C.
    Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Leanpub; 2021.
  19. 19.↵
    1. Bathaee Y.
    The artificial intelligence black box and the failure of intent and causation. Harv J Law Technol. 2018;31:890–938.
    OpenUrl
  • Received for publication December 8, 2020.
  • Accepted for publication March 9, 2021.
PreviousNext
Back to top

In this issue

Journal of Nuclear Medicine: 62 (6)
Journal of Nuclear Medicine
Vol. 62, Issue 6
June 1, 2021
  • Table of Contents
  • Table of Contents (PDF)
  • About the Cover
  • Index by author
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Journal of Nuclear Medicine.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
The T.R.U.E. Checklist for Identifying Impactful Artificial Intelligence–Based Findings in Nuclear Medicine: Is It True? Is It Reproducible? Is It Useful? Is It Explainable?
(Your Name) has sent you a message from Journal of Nuclear Medicine
(Your Name) thought you would like to see the Journal of Nuclear Medicine web site.
Citation Tools
The T.R.U.E. Checklist for Identifying Impactful Artificial Intelligence–Based Findings in Nuclear Medicine: Is It True? Is It Reproducible? Is It Useful? Is It Explainable?
Irène Buvat, Fanny Orlhac
Journal of Nuclear Medicine Jun 2021, 62 (6) 752-754; DOI: 10.2967/jnumed.120.261586

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Share
The T.R.U.E. Checklist for Identifying Impactful Artificial Intelligence–Based Findings in Nuclear Medicine: Is It True? Is It Reproducible? Is It Useful? Is It Explainable?
Irène Buvat, Fanny Orlhac
Journal of Nuclear Medicine Jun 2021, 62 (6) 752-754; DOI: 10.2967/jnumed.120.261586
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One
Bookmark this article

Jump to section

  • Article
    • IS IT TRUE?
    • IS IT REPRODUCIBLE?
    • IS IT USEFUL?
    • IS IT EXPLAINABLE?
    • CONCLUSION
    • DISCLOSURE
    • ACKNOWLEDGMENTS
    • Footnotes
    • REFERENCES
  • Info & Metrics
  • PDF

Related Articles

  • This Month in JNM
  • PubMed
  • Google Scholar

Cited By...

  • The Annual Journal Impact Factor Saga
  • Google Scholar

More in this TOC Section

  • Imaging Biomarkers for Central Nervous System Drug Development and Future Clinical Utility: Lessons from Neurodegenerative Disorders
  • Fibroblast Activation Protein Inhibitor Imaging in Nonmalignant Diseases: A New Perspective for Molecular Imaging
  • Precision Surgery Guided by Intraoperative Molecular Imaging
Show more The State of the Art

Similar Articles

SNMMI

© 2023 Journal of Nuclear Medicine

Powered by HighWire