Abstract
328
Objectives: We compared two Machine Learning (ML) pipelines to build prognostic models exploiting clinical and 18F-FDG PET/CT radiomics features in lung cancer patients. Material and
Methods: Regarding the radiomics workflow, only primary tumor volumes were characterized after (semi)automated segmentation in both PET and low-dose CT components of PET/CT. Three different grey-level discretization methods (histogram equalization, linear re-sampling, fixed bin-width interval) were considered for 2nd- and higher-order textural features. All available clinical variables and radiomics features were entered in the two ML pipelines (features selection and classifier) under comparison, namely the Support Vector Machines (SVM) with Recursive Feature Elimination (RFE) and Random Forests (RF) with 100 to 500 trees (by steps of 50) with Embedded Wrapper (EW)
Methods: These two pipelines were compared for the classification problem of identifying patients with overall survival less than 6 months in a cohort of 101 non-small cell lung cancer (NSCLC) stage 2 and 3 patients. The cohort was split into a learning set (67%) for training the models with cross-validation and a testing set (33%) for performance evaluation (accuracy, specificity, sensitivity). Results: In the training set, the best model built by RF with a small number of features (10) reached an accuracy of 82% (sensitivity 88%, specificity 76%). In the testing set, this model obtained 62% accuracy (sensitivity 71%, specificity 53%). Improved performance could be obtained by including a higher number of features (22), with accuracy of 71% (sensitivity 71%, specificity 71%) in the testing set. With SVM, the best model built with a small number of features (11) obtained an accuracy of 84% (sensitivity 86%, specificity 82%) but however obtained only low performance in the testing set (accuracy 53% with sensitivity 53% and specificity 53%). Similarly as with RF, higher performance could be obtained by including additional features (25): an accuracy of 91% (sensitivity 94%, specificity 88%) was reached in the training set, although with still limited accuracy in the testing set (accuracy 59%, sensitivity 64%, specificity 53%).
Conclusions: Our results show that although SVM reached better accuracy than RF in the training step, RF had the highest validation performance (71% vs. 59%), in line with previous observations in other applications. In our future work, we will expand this comparison by considering larger training and testing cohorts as well as including other cancer types.