A multi-layer text classification framework based on two-level representation model

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
388103	660916	2012	12 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Text classification - طبقه بندی متن Semantics - معناشناسی یا معنی‌شناسی Text representation - نمایش متن Wikipedia - ویکیپدیا

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش صفحه اول مقاله

A multi-layer text classification framework based on two-level representation model

چکیده انگلیسی

Text categorization is one of the most common themes in data mining and machine learning fields. Unlike structured data, unstructured text data is more difficult to be analyzed because it contains complicated both syntactic and semantic information. In this paper, we propose a two-level representation model (2RM) to represent text data, one is for representing syntactic information and the other is for semantic information. Each document, in syntactic level, is represented as a term vector where the value of each component is the term frequency and inverse document frequency. The Wikipedia concepts related to terms in syntactic level are used to represent document in semantic level. Meanwhile, we designed a multi-layer classification framework (MLCLA) to make use of the semantic and syntactic information represented in 2RM model. The MLCLA framework contains three classifiers. Among them, two classifiers are applied on syntactic level and semantic level in parallel. The outputs of these two classifiers will be combined and input to the third classifier, so that the final results can be obtained. Experimental results on benchmark data sets (20Newsgroups, Reuters-21578 and Classic3) have shown that the proposed 2RM model plus MLCLA framework improves the text classification performance by comparing with the existing flat text representation models (Term-based VSM, Term Semantic Kernel Model, Concept-based VSM, Concept Semantic Kernel Model and Term + Concept VSM) plus existing classification methods.

► Two-level representation model regards term and concept vectors as two text representation levels.
► Multi-layer classification framework handles large scale data in a way of layer-by-layer.
► A context-based method is adopted to identify the relatedness between terms and concepts.
► A structure-based relatedness measure is designed to fast deal with long document collection.
► Multi-layer classification framework can be executed in parallel to reduce the running time.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 39, Issue 2, 1 February 2012, Pages 2035–2046

نویسندگان

Jiali Yun, Liping Jing, Jian Yu, Houkuan Huang,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

A multi-layer text classification framework based on two-level representation model

دسترسی سریع

ارتباط

English Website