Article ID Journal Published Year Pages File Type
5907762 Genomics 2016 8 Pages PDF
Abstract

•Gene interaction networks can improve the performance of clustering algorithms on gene expression data.•Network-informed clustering identifies clinically distinct subgroups of smokers based on blood gene expression.•Subtype-specific blood gene expression signatures include genes that are smoke-responsive in independent experiments.

One of the most common smoking-related diseases, chronic obstructive pulmonary disease (COPD), results from a dysregulated, multi-tissue inflammatory response to cigarette smoke. We hypothesized that systemic inflammatory signals in genome-wide blood gene expression can identify clinically important COPD-related disease subtypes, and we leveraged pre-existing gene interaction networks to guide unsupervised clustering of blood microarray expression data. Using network-informed non-negative matrix factorization, we analyzed genome-wide blood gene expression from 229 former smokers in the ECLIPSE Study, and we identified novel, clinically relevant molecular subtypes of COPD. These network-informed clusters were more stable and more strongly associated with measures of lung structure and function than clusters derived from a network-naïve approach, and they were associated with subtype-specific enrichment for inflammatory and protein catabolic pathways. These clusters were successfully reproduced in an independent sample of 135 smokers from the COPDGene Study.

Related Topics
Life Sciences Biochemistry, Genetics and Molecular Biology Genetics