Abstract
3243
Introduction: Developing a deep-learning based disease diagnosis system using medical images is a data-intensive task due to the large number of parameters that must be learned. Data sharing between institutions is one way to overcome data requirements, but can be difficult due to privacy concerns. Furthermore, a disease condition may have varying sub-types, and data availability may be significantly imbalanced across the different sub-types. Thus, the primary goal of our research is to explore data augmentation; generating new synthetic data that approximates real data distributions to increase the amount of data available to researchers.
Generative Adversarial Networks (GANs) represent a technique capable of generating realistic medical images. This method has been shown to be effective in 2-D, but current methods of generating 3-D PET images usually require conditioning on other 3-D images, which may not be available.
In our experiment, we show that we are able to generate realistic 3-D head and neck PET images using the Temporal GAN (TGAN) architecture. To evaluate the fidelity of the synthetic data, we train a segmentation model on real data where a ground truth mask is available, and test its performance on synthetic data. We then evaluate image utility by performing a data augmentation experiment to see how synthetic data may be used to improve a segmentation model when a large real data set is not available.
Methods: The TGAN architecture, originally designed for video generation, can easily be adapted for 3-D PET image generation, by substituting the frames of a video for a series of 2-D image slices of the 3-D PET volume. We are then able to generate specific tumours by conditioning the TGAN on tumour masks. Our TGAN model was trained for 5000 epochs using the RMSProp optimizer with learning rate 0.00005, and Wasserstein loss with a batch size of 32.
We utilized a publicly available Head and Neck (HN) dataset in The Cancer Imaging Archive (TCIA), composed of 200 PET images from different centers. For our segmentation model we use the challenge winning architecture (3D U-net with squeeze and excitation modules), described by Iantsen et al. The models were trained for 150 epochs with a batch size of 2. Segmentation quality was evaluated using the DICE score (DSC) metric.
A data augmentation experiment was performed by training a segmentation model, S, on real data from one center in the HNTCIA dataset (CHGJ). A TGAN was then trained on the remainingHN dataset and synthetic data was generated to fine-tune S. The performance of S was evaluated on two validation sets; one comprises a small subset of the CHGJ center’s data and another validation set comprises a random selection of samples from outside the CHGJ center.
Results: We show samples of images generated by the conditional TGAN in Figure 1. Each image is shown with the tumour mask overlaid in red. We also show the corresponding real PET image with tumour mask overlaid for comparison.
The conditional TGAN reproduces lesions based on the input tumour mask with good generalization and there are no visibly noticeable issues when generating images given unseen tumour masks.
We generated 200 synthetic images on the conditional TGAN, then performed automatic segmentations on them. The average Dice scores were 0.7 and 0.65 for real and synthetic data, respectively.
Figure 3 shows the results of our data augmentation experiment. DSC increases when training segmentation models with additional synthetic data. We observed an average increase in DSC of 0.05 on samples outside the CHGJ subset when synthetic data was used.
Conclusions: Temporal GANs can be adapted to generate 3-D medical images with high fidelity and utility, which retain important image features of the training data set. We have shown that our model generalizes well and can produce realistic images with ground truth lesions based on user-input masks. This model will be further validated on other PET data sets in the future.