Abstract
P1463
Introduction: In 2021 we saw the release of the first invite-only commercially available text-to-image neural network graphical user interface, DALL-E. This platform combines computer vision with natural language processing allowing the user to quickly and easily create an image using commands. Later in 2022 came Midjourney, a similar competing platform open to the public. And quickly after came Stable Diffusion, the main open-source text-to-image platform available. The power of these tools is that they allow the user to create images using only simple natural language commands instantly. While these were initially created to produce digital art images, medical images could be created for machine learning and artificial intelligence research. This educational exhibit aims to introduce these novel technologies and guide nuclear medicine professionals to understand better how to use these artificial intelligence models.
Methods: All three platforms are available online with different methods of access. DALL-E requires an email account to sign up and is a fee-per-image service. Images are generated via text entry into a web-based GUI, https://openai.com/dall-e-2/. Midjourney requires a Discord account to sign up and use, and it is a subscription-based service, https://discord.com/. To generate images, the user chats with an AI bot on the Discord server. Stable diffusion is a bit more involved. There are second-party GUI's that allow you to enter text to generate images, such as Hugging Face, https://huggingface.co/spaces/stabilityai/stable-diffusion. However, Stable Diffusion is an open-source model that can be downloaded onto your computer and used for free to generate images.
Results: These models are text-to-image Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) with GUIs. No code is needed to use the models; instead, natural language in the form of "prompts" acts as the code. A prompt is a set of text instructions sent to the model to generate a specific image output. In the case of nuclear image generation, a prompt may look like this, "planar, anterior-posterior, black on white background, whole body bone scintigraphy image, with linear increase along the tibias, in the pattern of shin splints". It is important to use simple, clear, and straightforward language when prompting a model. The AI tends to directly translate prompt language to image characteristics. Once an image is created, you have the option to create a variation of the image with additional prompts to fine-tune the next iteration to optimize your image generation. Therefore the process of image generation with these models is iterative and occurs as an evolution over time. Once you have the desired final image, that comprehensive prompt can be used to make infinite additional images in a series. While DALL-E and Midjourney do allow image input for image generation, they do not effectively allow input of a sample of images to train the model to create similar images. Stable Diffision on the other hand, does allow image training on even small subsets of only 5-10 images. This represents a powerful tool for the future of medical image generation at a low cost with minimal effort and essentially free processing power.
Conclusions: We outlined the three main AI text-to-image generative models available today, explained the fundamental of image prompts, and proposed the possibility of using Stable-Diffusion as a free open-source method to generate nuclear medicine images for future machine learning research.