Abstract
2751
Introduction: The development and deployment of artificial intelligence (AI) within nuclear medicine involves several ethically fraught components at different stages of the machine learning pipeline, including during data collection, model training and validation, and clinical use. The AI Task Force of the Society of Nuclear Medicine and Molecular Imaging (SNMMI) has identified ethical risks that developers and users of AI algorithms may encounter, providing recommendations to mitigate their impact on patients and populations.
Methods: Drawing upon the traditional principles of medical and research ethics, and highlighting the need to ensure health justice, we identify four major ethical risks: 1) privacy of data subjects, 2) fairness towards marginalized populations, 3) accountability of physicians, and 4) conflicts between stakeholders.
Results: With respect to subject privacy, we note that, while most training datasets are formally de-identified, the scope and scale of data required for AI training makes them vulnerable to re-identification. We suggest privacy-preserving tools such as differential privacy and federated learning, and explore methods of gaining consent to secondary re-use that may mitigate the ethical risks of re-identification.
With respect to fairness, we show that issues of reproducibility and generalizability can have profound ethical impacts when systems are naively applied outside of the original training contexts (e.g., sub-Level 1 hospitals, rural, LMICs) or to under-represented racial, ethnic or gender minorities. Inappropriate application of AI may severely deepen health inequality. While technical methods to ensure “fairness” (e.g. error rate parity equalization, synthetic data construction, transparency) can quantify and (in some cases) mediate bias, careful consultation with domain experts is necessary.
<level 1="" hospitals=""><level 1="" hospitals=""></level></level>
With respect to accountability, we argue that physicians bear ethical (and often legal) responsibility for the use of AI systems in clinical contexts. This places burdens on physicians to understand the capacities and limits of algorithms, but also requires developers to clearly specify the performance of the algorithm, its intended use case and the robustness of its model across populations. While “explainability” techniques may sometimes be useful - especially if they identify the population-level salience of different factors in AI decisions - a shared understanding of the performance and limits of the AI system is at least as important.
Finally, with respect to governance, we argue that in a diverse, plural society, AI systems must necessarily grapple with conflicts between stakeholders. For example, stakeholders disagree about the appropriate diagnostic thresholds in a classification algorithm, since it requires balancing the costs of type I and II errors. AI systems often require setting a single threshold for all, and thus developers must identify a fair resolution to these conflicts. This is a hard problem, but at minimum requires explicit identification of thresholds, model uncertainty, and ideally sustained engagement with stakeholders on these questions is necessary to establish the scope of disagreement.
Conclusions: The recent expansion of AI in nuclear medicine has led to many promising technologies but also ethical risks - to privacy, health equity, physician accountability and public governance. By being aware of these risks, researchers and clinicians can identify when and how to seek ethical expertise and stakeholder input, to fully realize the promise of AI.