کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
531481 869844 2009 10 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Semi-structured document categorization with a semantic kernel
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو
پیش نمایش صفحه اول مقاله
Semi-structured document categorization with a semantic kernel
چکیده انگلیسی

Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual documents according to their meaning. Furthermore, research in text categorization has mainly focused on “flat texts” whereas many documents are now semi-structured and especially under the XML format. In this paper, we propose a semantic kernel for semi-structured biomedical documents. The semantic meanings of words are extracted using the unified medical language system (UMLS) framework. The kernel, with a SVM classifier, has been applied to a text categorization task on a medical corpus of free text documents. The results have shown that the semantic kernel outperforms the linear kernel and the naive Bayes classifier. Moreover, this kernel was ranked in the top 10 of the best algorithms among 44 classification methods at the 2007 Computational Medicine Center (CMC) Medical NLP International Challenge.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition - Volume 42, Issue 9, September 2009, Pages 2067–2076
نویسندگان
, ,