TY - JOUR T1 - Hybrid Machine Learning Methods and Ensemble Voting for Identification of Parkinson’s Disease Subtypes JF - Journal of Nuclear Medicine JO - J Nucl Med SP - 107 LP - 107 VL - 62 IS - supplement 1 AU - Mohammad R Salmanpour AU - Abdollah Saberi AU - Ghasem Hajianfar AU - Arman Rahmim Y1 - 2021/05/01 UR - http://jnm.snmjournals.org/content/62/supplement_1/107.abstract N2 - 107Objectives: It is important to subdivide Parkinson’s disease (PD) into specific subtypes, since homogeneous groups of patients are more likely to share genetic and pathological features, enabling potentially earlier disease recognition and more tailored treatment strategies. We aim to identify PD subtypes by using advanced hybrid machine learning (ML) methods followed by ensemble voting. Methods: A timeless dataset consisting of 885 studies was derived from longitudinal datasets (years 0, 1, 2 and 4; Parkinson’s Progressive Marker Initiative). Segmentation of dorsal striatum (DS) on DAT SPECT images was performed via MRI. Radiomic features of DS were extracted using our standardized SERA software. Hybrid ML systems were constructed invoking: 16 feature reduction algorithms (FRAs), 14 clustering algorithms (CAs) and 16 classifiers (Cs). The C-index evaluation method was initially used on each trajectory (hybrid system) to optimize number of derived clusters (from range of 2-10 clusters). We then selected optimal number of subtypes, for all trajectories, through both Average of Classifier Performance (AOCP) and Average of Correlation Factor (AOCF); AOCF assessed how well results from different clustering methods correlate with one another, while AOCP assesses accuracies in ultimate classification. Finally, employing ensemble voting enabled us to assign patients to different subtypes based on comprehensive voting by different hybrid systems. To do this, first we applied t-distributed Stochastic Neighbor Embedding (t-SNE) analysis to the list of subtypes resulting from the different trajectories, in order to transform the high-dimensional dataset into 2 dimensions in such a way that similar datapoints are modeled by nearby datapoints and dissimilar datapoints are modeled by distant datapoints with high probability. Subsequently, using hierarchical agglomerative clustering enabled identification of 3 distinct sub-clusters. Results: Initially selected disease subtypes via the C-index were not consistent across hybrid ML methods. Subsequently, AOCP and AOCF enabled us to select more consistent clusters across different hybrid methods. Using sole non-imaging clinical information did not enable reproducible subtypes, whereas utilizing SPECT information enabled consistent generation of subtypes. Overall, we arrived at 3 distinct subtypes. PD patients in Cluster I showed milder scores in all domains, including motor and non-motor symptoms, as well as imaging, compared to other PD sub-clusters, while they depicted higher scores compared to healthy control subjects. Patients in Cluster II had scores which were mostly larger than those in Cluster I. Cluster III illustrated the most severe clinical manifestations and values for various features compared to other sub-clusters. The 3 identified subtypes were thus identified to as 1) mild, 2) intermediate, and 3) severe. Conclusions: Appropriate hybrid ML framework enabled identification of 3 distinct subtypes in PD subjects. This was achieved by combining clinical information with SPECT images segmented using MRI, in the context of ensemble voting from various ML analysis trajectories. Overall, our ensemble voting framework assisted by t-SNE analysis may enable more comprehensive identification and analysis of disease subtypes. ER -