Abstract
3236
Introduction: Transfer learning using large pretrained open-source image recognition networks offers an opportunity to build artificial intelligence (AI) models that can assist physicians in image diagnostics in cases where only small datasets are available. A pretrained convolutional neural network (CNN) is repurposed to recognize features not represented in the original network training. The aim of this study was to use this technique to develop an AI model to automatically detect and highlight hypofunctioning lesions found on thyroid scintigraphy and to evaluate the performance of this model against that of experienced nuclear physicians.
Methods: The study included 1,953 consecutive patients scanned during a 1.5-year period for thyrotoxicosis, thyroid nodules or goiter. Scans were acquired 15-30 minutes after 170 MBq +/-10% 99mTc injection using a solo-mobile (DDD Diagnostic) one-headed gamma camera with a pinhole collimator. A 256x256 matrix and stop condition were set at either 10 minutes or 200 kilocounts. The camera was positioned as close as possible to the patient, with the image of the thyroid gland expanded to the full image field. The thyroid scintigraphies were subsequently segmented by three experienced nuclear medicine physicians. A total of 229 images were excluded due to low image quality, primarily caused by low uptake. A total of 1,724 images were labeled as 'detectable hypofunctioning lesion' or 'no detectable hypofunctioning lesion'. Segmentations were based on standalone images without any further image details, acquisition data or patient history. When the three physicians disagreed about the initial labeling, the image was reevaluated, and a consensus label was agreed upon. Images were randomly assigned as either training data, validation data or test data, containing 1,206, 259 and 259 images, respectively. The model was developed in TensorFlow using the convolutional base of the VGG19 model appended with a dense classifier including 256 units, a dropout layer and a dense layer, with a single unit for binary classification. During training, the bottom 12 layers of the convolutional base were frozen, while the top four layers were fine-tuned. Training was performed and accelerated with a nomad cluster delivering 1,500+ CPU cores, 3+ TB RAM and 12,000+ CUDA cores, with >200 Gbit internal networking. The performance of the model was compared to the consensus from the evaluating physicians. Similarly, the performance of each physician's initial evaluation was compared to the consensus among the physicians.
Results: The developed VGG19-based AI model detected hypofunctioning lesions with an accuracy of 0.90 compared to 0.88-0.94 among the experienced physicians’ initial evaluations. Thus, there was no significant difference between the performances of the AI model and the physicians (p < 0.05). As expected, the performance of the model increased from 0.81 to 0.90 when low-quality images were excluded (11.7% of images), thus emphasizing the importance of checking image quality before training the AI model.
Conclusions: Using the VGG19 network on a small dataset of fewer than two thousand images, it was possible to develop an AI model that could detect images with hypofunctioning lesions at an accuracy comparable to that of experienced nuclear physicians. Pretrained generic image classification networks can shortcut the training process, accelerate the development of models for computer-aided diagnosis and potentially improve the accuracy, speed and certainty of thyroid imaging reports.