کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
6923573 1448362 2018 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Industrial information extraction through multi-phase classification using ontology for unstructured documents
ترجمه فارسی عنوان
استخراج اطلاعات صنعتی از طریق طبقه بندی چند مرحلهای با استفاده از هستی شناسی برای اسناد بدون ساختار
کلمات کلیدی
استخراج اطلاعات، طبقه بندی چند مرحلهای، هستی شناسی، اسناد غیر سازمانی صنعتی، تحول ویژگی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی
The increased availability of unstructured text documents in industries such as e-mails, office documents, PDF files etc., has inspired many researchers towards Information Extraction. The objective of the proposal is to extract information from unstructured tender documents of power plant industries. The extraction efficiency of recent works depends on the linguistic structure and keyword taxonomy. Hence, these approaches are unsuitable for domain specific applications that demand semantic and contextual taxonomy together. In this paper, a two-phase classification approach for information extraction with feature weighing is proposed. The proposal performs sentence classification in first phase followed by word classification. As industries spans across multiple domains, a multi domain layered industrial ontology is used for knowledge representation. The unstructured documents are enhanced into DAG based semi-structured text with enriched features. A unique feature transformation approach based on the categorical data type of features is attempted to handle heterogeneous textual features. The proposal is evaluated with real time documents obtained from power plant tenders. The results showed minimal loss of precision which can be rectified by enriching the training data and customizing standard parser algorithms to suit the domain requirements.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers in Industry - Volume 100, September 2018, Pages 137-147
نویسندگان
, , ,