کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
461147 696562 2013 27 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A novel semantic information retrieval system based on a three-level domain model
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
A novel semantic information retrieval system based on a three-level domain model
چکیده انگلیسی

This paper presents a methodology and a prototype for extracting and indexing knowledge from natural language documents. The underlying domain model relies on a conceptual level (described by means of a domain ontology), which represents the domain knowledge, and a lexical level (based on WordNet), which represents the domain vocabulary. A stochastic model (the ME-2L-HMM2, which mixes – in a novel way – HMM and maximum entropy models) stores the mapping between such levels, taking into account the linguistic context of words. Not only does such a context contain the surrounding words; it also contains morphologic and syntactic information extracted using natural language processing tools. The stochastic model is then used, during the document indexing phase, to disambiguate word meanings. The semantic information retrieval engine we developed supports simple keyword-based queries, as well as natural language-based queries. The engine is also able to extend the domain knowledge, discovering new and relevant concepts to add to the domain model. The validation tests indicate that the system is able to disambiguate and extract concepts with good accuracy. A comparison between our prototype and a classic search engine shows that the proposed approach is effective in providing better accuracy.


► We present a methodology for knowledge extraction and indexing from natural language documents.
► We define a model composed of a conceptual level (ontology), a lexical level (WordNet), and a mapping level (stochastic model).
► The stochastic model mixes HMM and maximum entropy models.
► Validation tests indicate that the system is able to disambiguate words and extract concepts with good accuracy.
► A comparison between our prototype and a classic search engine shows that the proposed approach is effective in providing better precision and recall.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Systems and Software - Volume 86, Issue 5, May 2013, Pages 1426–1452
نویسندگان
, ,