Machine learning methods in computational cancer biology

Article ID	Journal	Published Year	Pages	File Type
4999539	Annual Reviews in Control	2017	21 Pages	PDF

Abstract

Cancer is the second leading cause of death, next only to heart disease, in both developed as well as developing countries. A major source of difficulty in addressing cancer as a disease is its bewildering variety, in that no two manifestations of cancer are alike, even when they occur in the same site. This makes cancer an ideal candidate for “personalized medicine” (also known as “precision medicine”). At present there are some high-quality public databases consisting of both molecular measurements of tumors, as well as clinical data on the patients. By applying machine learning methods to these databases, it is possible even for non-experimenters to generate plausible hypotheses that are supported by the data, which can then be validated on one or more independent data sets. A characteristic of cancer databases is that the number of measured features is many orders of magnitude larger than the number of samples. Therefore any machine learning algorithms must also perform feature selection, that is, elicit the most relevant or most predictive features from the large number of measured features. In this paper, some algorithms for sparse regression and sparse classification are reviewed, and their applications to endometrial and ovarian cancer are discussed.