کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
488442 | 703898 | 2016 | 9 صفحه PDF | دانلود رایگان |
Document annotation and search are two important factors that need to be considered in data sharing platforms. A large unstructured text document contains substantial amount of structured attribute information. Important information is very difficult to find in these documents. Current ad-hoc or predefined annotation of the shared data causes inadequate search, retrieval and analysis capabilities. In this paper we propose a new approach that supports the generation of the structured annotation in the form of attribute name and attribute value pairs from unstructured document. A new data sharing platform DSPAA (Data Sharing Platform with Automated Annotation) is proposed, where the document annotation occurs when the author uploads a document and it is based on a probabilistic framework that considers the attributes in the document content and the query collection. The new system also performs semantic annotation of document using WordNet database. When a user submits a search query, then the system will search for documents in the annotation database and rank the selected documents by using Vector Space model. From experiment results it is clear that the system generates superior results at a rate faster than traditional document retrieval strategies.
Journal: Procedia Computer Science - Volume 85, 2016, Pages 45–53