Machine learning on high dimensional shape data from subcortical brain surfaces: A comparison of feature selection and classification methods

Article ID	Journal	Published Year	Pages	File Type
4969858	Pattern Recognition	2017	31 Pages	PDF

Abstract

High-dimensional shape descriptors (HDSD) are useful for modeling subcortical brain surface morphometry. Though HDSD is a useful basis for disease biomarkers, its high dimensionality requires careful treatment in its application to machine learning to mitigate the curse of dimensionality. We explored the use of HDSD feature sets by comparing the performance of two feature selection approaches, Regularized Random Forest (RRF) and LASSO, to no feature selection (NFS). Each feature set was applied to three classifiers: Random Forest (RF), Support Vector Machines (SVM) and Naïve Bayes (NB). Paired feature-selection-classifier approaches were 10-fold cross-validated on two diagnostic contrasts: Alzheimer's disease and mild cognitive impairment, both relative to controls across varying sample sizes to evaluate their robustness. LASSO aided classification efficiency, however, RRF and NFS afforded more robust performances. Performance varied considerably by classifier with RF being most stable. We advise careful consideration of performance-efficiency tradeoffs in choosing feature selection strategies for HDSD.

Keywords

NFs RRF Naïve Bayes mild cognitive impairment Feature selection Alzheimer's disease Biomarker Shape analysis Subcortical Support vector machines SVM Brain MCI