Patent document categorization based on semantic structural information

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
515619	867050	2007	16 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

k-NN Structural information - اطلاعات ساختاری Text categorization - طبقه بندی متن

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر

پیش نمایش صفحه اول مقاله

Patent document categorization based on semantic structural information

چکیده انگلیسی

The number of patent documents is currently rising rapidly worldwide, creating the need for an automatic categorization system to replace time-consuming and labor-intensive manual categorization. Because accurate patent classification is crucial to search for relevant existing patents in a certain field, patent categorization is a very important and useful field. As patent documents are structural documents with their own characteristics distinguished from general documents, these unique traits should be considered in the patent categorization process. In this paper, we categorize Japanese patent documents automatically, focusing on their characteristics: patents are structured by claims, purposes, effects, embodiments of the invention, and so on. We propose a patent document categorization method that uses the k-NN (k-Nearest Neighbour) approach. In order to retrieve similar documents from a training document set, some specific components to denote the so-called semantic elements, such as claim, purpose, and application field, are compared instead of the whole texts. Because those specific components are identified by various user-defined tags, first all of the components are clustered into several semantic elements. Such semantically clustered structural components are the basic features of patent categorization. We can achieve a 74% improvement of categorization performance over a baseline system that does not use the structural information of the patent.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 43, Issue 5, September 2007, Pages 1200–1215

نویسندگان

Jae-Ho Kim, Key-Sun Choi,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Patent document categorization based on semantic structural information

دسترسی سریع

ارتباط

English Website