کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
403948 677375 2009 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Discovering unexpected documents in corpora
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Discovering unexpected documents in corpora
چکیده انگلیسی

Text mining is widely used to discover frequent patterns in large corpora of documents. Hence, many classical data mining techniques, that have been proven fruitful in the context of data stored in relational databases, are now successfully used in the context of textual data. Nevertheless, there are many situations where it is more valuable to discover unexpected information rather than frequent ones. In the context of technology watch for example, we may want to discover new trends in specific markets, or discover what competitors are planning in the near future, etc. This paper is related to that context of research. We have proposed several unexpectedness measures and implemented them in a prototype, called UnexpectedMiner, that can be used by watchers, in order to discover unexpected documents in large corpora of documents (patents, datasheets, advertisements, scientific papers, etc.). UnexpectedMiner is able to take into account the structure of documents during the discovery of unexpected information. Many experiments have been performed in order to validate our measures and show the interest of our system.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Knowledge-Based Systems - Volume 22, Issue 6, August 2009, Pages 421–429
نویسندگان
, ,