Abstract
241017
Introduction: With the advent of disease-modifying therapies for cardiac amyloidosis (CA), early and reliable diagnosis has become of paramount importance. The current diagnostic approach relies on difficult-to-standardize, visual interpretation of 99mTc-scintigraphy. In addition to the resulting variability in image readings, CA is sometimes an incidental finding on 99mTc-scintigraphy and is not always correctly recognized and reported. We developed an artificial intelligence (AI) system to reliably screen for CA on 99mTc-scintigraphy scans among patients undergoing bone scintigraphy. Robustness, prognostic value, safety, and clinical applicability of the AI system were assessed by a multicenter validation, prognostic assessment, and a comparison with clinical experts in the form of a multi-reader multi-case (MRMC) study.
Methods: In total, 22,796 scans from 19,636 patients originating from ten centers in Austria, the United Kingdom, China, and Italy were included in the study. Patients of six of the ten centers were consecutively recruited all-comers. The AI system was developed using data from a single center and validated on the remaining nine centers. Validation was performed on all four technetium-99m tracers (DPD, HMDP, PYP, and MDP) currently used for bone scintigraphy and capable of indicating CA. The AI system was trained to detect the presence of a CA-associated pattern (Perugini grade ≥ 2). The system's performance was compared to the diagnostic performance of five experienced nuclear medicine physicians through a MRMC study, along with an assessment of inter-rater variability. Outcome assessment was performed using two clinical endpoints, overall survival and heart failure, in univariate and multivariate Cox regression corrected for relevant confounders.
Results: The AI system achieved a 10-fold cross-validation performance of AUC 1.000 (95% CI [1.000, 1.000]) for the Austrian cohort and independent external validation AUCs of 0.997 (95% CI [0.993, 0.999]), 0.925 (95% CI [0.871, 0.971]) and 0.994 (95% CI [0.991, 0.997]) for the United Kingdom, China and Italy cohorts respectively (Figure 1A). The AI system's predictions were prognostic for overall mortality (HR 1.98; 95% CI [1.67, 2.34]; p < 0.0001; Figure 2A) and heart failure (HR 17.52; 95% CI [11.05, 27.76]; p < 0.0001; Figure 2B). Results remained significant after multivariate adjustment for demographic factors and comorbidities in terms of mortality (1.44; 95% CI [1.19, 1.74]; p < 0.001) and heart failure (3.16; 95% CI [1.90-5.25]; p < 0.001). Median follow-up was 4.6 years (IQR 1.4-5.6) after which 25.7% of patients had died. In the MRMC study, disagreement among the physicians occurred in 11% of cases (Fleiss Kappa 0.88) with a mean performance of AUC 0.945 (range 0.911-0.970), which was inferior to the AI system with an AUC of 0.997 (p 0.004; Figure 1B).
Conclusions: The AI-based CA screening approach was reliable, eliminated inter-rater variability, and indicated prognostic value, with implications for clinical identification, referral, and management. The inclusion of consecutively enrolled all-comer referrals from multiple centers suggests strong performance when deploying the developed system in a real-life setting. The AI system may be employed safely, in parallel with the imaging expert to reduce the number of CA misdiagnoses in patients undergoing 99mTc-scintigraphy.