In the recent second Prostate Cancer Theranostics and Imaging Centre of Excellence Preceptorship of 2024 held in Melbourne, Australia, a presentation on artificial intelligence (AI) in prostate cancer imaging was delivered in which the Recommendations for Evaluation of AI for Nuclear Medicine (RELAINCE) guidelines from the Society of Nuclear Medicine and Molecular Imaging AI Task Force Evaluation team took center stage for the discussion (1).
The AI Task Force makes a point to provide a conceptual understanding for AI in the cancer imaging process. Categorization can be more fine-tuned than the specialist AI, professional facing, and clinical utility dichotomies discussed in the general population. The cancer imaging process can be further broken down to patient-to-image/image generation (acquiring, reconstructing, enhancing), image-to-patient (obtaining crucial image-derived information for patient care), clinical workflow (system throughput, triaging, reporting), and radiopharmaceutical therapies (drug discovery, dosimetry) (2).
Trial and study design must incorporate precise technical steps while focusing on the clinical task applied to the specific cancer patient journey (1). In prostate cancer management and theranostics, standard definitions of clinical tasks and subtasks can be categorized into the following: diagnosis (analyze intraprostatic lesions [vs. the PRIMARY score]); staging (detect disease, standardize conclusions, proofread reports); restaging (monitor metastases); theranostics (compare FDG and prostate-specific membrane antigen expression, streamline quantification [metabolic tumor volume, SUVmean], assist with patient selection; and dosimetry (measure uptake in normal organs and tumor, track disease uptake over treatments).
For AI tools to be successfully translated into practice, their clinical relevance and effectiveness needs to be clear. If aPROMISE (Pylarify AI) still reports 19.5 false-positive pelvic lesions per patient (3), then the tool is not yet practical. There is an opportunity to learn from the longer-existing computer-aided detection literature to conduct trials that combine AI tools with clinicians to affect clinical decision-making in nuclear medicine (4).
The RELAINCE framework helps categorize AI performance toward these clinical tasks according to tiered evidence, including proof of concept and technical, clinical, and postdeployment evaluation (1). Currently, most literature addresses the proof of concept and technical evaluation levels (5), including patient-level classification (accuracy), lesion-level detection (F1 score), lesion classification, and voxel-level segmentation (Dice coefficients).
The classification of clinical tasks and subtasks is part of the greater challenge of standardization and reproducibility. PET/CT is inherently a semiquantitative study type, and standardization of image generation is crucial to move to true quantification (6). ArtNET is the Australian network for standardization of PET acquisition parameters to make quantification more reproducible. Similarly, efforts to create regional standards are made by the Clinical Trials Network (Society of Nuclear Medicine and Molecular Imaging) and the European Association of Nuclear Medicine Research Ltd. However, maintaining each standard is costly, and regional differences may pose barriers to data comparability and medical image banking, which can only be overcome by international standardization.
There was also a discussion regarding reproducible methods that are applicable even if data harmonization is not fully achieved (7). Lymphoma research demonstrated that metabolic tumor volume segmentation using an absolute threshold of SUV 4 was resilient. In our institution, an absolute threshold of SUV 3 in prostate-specific membrane antigen PET/CT is applied on the basis of similar findings in prostate research. These insights must be considered when designing and evaluating imaging analysis pipelines for these to be reproducible (6).
Much like the promise of driverless cars, the technology is not quite there. However, the aim of an optimized system can be achieved through rapid and continuous learning. Getting the end-user clinician embedded within the research environment and fostering multidisciplinary learning will be decisive for future systems to be successful and sustainable. Postdeployment monitoring (e.g., ISO/IEC 42001:2023) and adjustments need to be factored into the research and development of today, which should also include the development of AI copilots (8). Nevertheless, the clinician–patient relationship will remain central. Developed AI systems should align with ideal values, but application of any AI tool might have inherent or unexpected flaws that must be rigorously tested and validated in a supervised environment. Who better to teach AI these values than clinician trainers?
The key to clinically relevant guidelines is to place the patient first, rather than the technology. Already, there is evidence of increasing technology accelerations including foundational models (9) and multimodality models, as well as further increasing performance, as seen in digital PET and whole-body imaging. In this context, streamlined data sharing will be important (10), and a multiparty ecosystem will ultimately be required for establishing trustworthy AI (2).
The promise of AI is that it will scale access to expertise and improve patient outcomes, while also reducing health-care costs. Current AI technology has not necessarily delivered on this promise, often shifting more work to the clinician with burdening amounts of false positives. We are grateful for the concerted effort of the research and clinical community pushing toward a better, more humane future and seek to be part of the interdisciplinary conversation for reproducible research in medical imaging.
DISCLOSURE
No potential conflict of interest relevant to this article was reported.
Footnotes
Published online Aug. 29, 2024.
- © 2024 by the Society of Nuclear Medicine and Molecular Imaging.