کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
377568 | 658795 | 2015 | 9 صفحه PDF | دانلود رایگان |
• A PPI identification approach using relational similarity framework is proposed.
• We construct word similarity matrices that are sensitive to the PPI identification task.
• We develop a hybrid model to integrate word similarity with the basic RS model.
BackgroundMost existing systems that identify protein–protein interaction (PPI) in literature make decisions solely on evidence within a single sentence and ignore the rich context of PPI descriptions in large corpora. Moreover, they often suffer from the heavy burden of manual annotation.MethodsTo address these problems, a new relational-similarity (RS)-based approach exploiting context in large-scale text is proposed. A basic RS model is first established to make initial predictions. Then word similarity matrices that are sensitive to the PPI identification task are constructed using a corpus-based approach. Finally, a hybrid model is developed to integrate the word similarity model with the basic RS model.ResultsThe experimental results show that the basic RS model achieves F-scores much higher than a baseline of random guessing on interactions (from 50.6% to 75.0%) and non-interactions (from 49.4% to 74.2%). The hybrid model further improves F-score by about 2% on interactions and 3% on non-interactions.ConclusionThe experimental evaluations conducted with PPIs in well-known databases showed the effectiveness of our approach that explores context information in PPI identification. This investigation confirmed that within the framework of relational similarity, the word similarity model relieves the data sparseness problem in similarity calculation.
Journal: Artificial Intelligence in Medicine - Volume 64, Issue 3, July 2015, Pages 185–193