Predicting overlapping protein complexes from weighted protein interaction graphs by gradually expanding dense neighborhoods

Article ID	Journal	Published Year	Pages	File Type
377550	Artificial Intelligence in Medicine	2016	8 Pages	PDF

Abstract

•Gradually Expanding Dense Neighborhoods (GENA) is proposed for the computational prediction of protein complexes from weighted protein interaction networks.•GENA permits the participation of proteins to multiple complexes in agreement with the underlying cell mechanisms.•GENA outperformed three of the state of the art algorithms for predicting protein complexes in experiments with datasets from yeast and human organisms.•Downstream analysis of the resulted clusters revealed functional homogeneity between the proteins of the same cluster.•Significantly altered network modules were detected when GENA was applied to two co-expression networks: one generated from Parkinson patients and one from healthy individuals.

ObjectiveProteins are vital biological molecules driving many fundamental cellular processes. They rarely act alone, but form interacting groups called protein complexes. The study of protein complexes is a key goal in systems biology. Recently, large protein–protein interaction (PPI) datasets have been published and a plethora of computational methods that provide new ideas for the prediction of protein complexes have been implemented. However, most of the methods suffer from two major limitations: First, they do not account for proteins participating in multiple functions and second, they are unable to handle weighted PPI graphs. Moreover, the problem remains open as existing algorithms and tools are insufficient in terms of predictive metrics.MethodIn the present paper, we propose gradually expanding neighborhoods with adjustment (GENA), a new algorithm that gradually expands neighborhoods in a graph starting from highly informative “seed” nodes. GENA considers proteins as multifunctional molecules allowing them to participate in more than one protein complex. In addition, GENA accepts weighted PPI graphs by using a weighted evaluation function for each cluster.ResultsIn experiments with datasets from Saccharomyces cerevisiae and human, GENA outperformed Markov clustering, restricted neighborhood search and clustering with overlapping neighborhood expansion, three state-of-the-art methods for computationally predicting protein complexes. Seven PPI networks and seven evaluation datasets were used in total. GENA outperformed existing methods in 16 out of 18 experiments achieving an average improvement of 5.5% when the maximum matching ratio metric was used. Our method was able to discover functionally homogeneous protein clusters and uncover important network modules in a Parkinson expression dataset. When used on the human networks, around 47% of the detected clusters were enriched in gene ontology (GO) terms with depth higher than five in the GO hierarchy.ConclusionsIn the present manuscript, we introduce a new method for the computational prediction of protein complexes by making the realistic assumption that proteins participate in multiple protein complexes and cellular functions. Our method can detect accurate and functionally homogeneous clusters.