Learning aspect models with partially labeled data

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
534863	870297	2011	8 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Document categorization - طبقه بندی اسناد Semi-supervised learning - یاگیری نیمه‌نظارتی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر چشم انداز کامپیوتر و تشخیص الگو

پیش نمایش صفحه اول مقاله

چکیده انگلیسی

In this paper, we address the problem of learning aspect models with partially labeled data for the task of document categorization. The motivation of this work is to take advantage of the amount of available unlabeled data together with the set of labeled examples to learn latent models whose structure and underlying hypotheses take more accurately into account the document generation process, compared to other mixture-based generative models. We present one semi-supervised variant of the Probabilistic Latent Semantic Analysis (PLSA) model (Hofmann, 2001). In our approach, we try to capture the possible data mislabeling errors which occur during the training of our model. This is done by iteratively assigning class labels to unlabeled examples using the current aspect model and re-estimating the probabilities of the mislabeling errors. We perform experiments over the 20Newsgroups, WebKB and Reuters document collections, as well as over a real world dataset coming from a Business Group of Xerox and show the effectiveness of our approach compared to a semi-supervised version of Naive Bayes, another semi-supervised version of PLSA and to transductive Support Vector Machines.

Research highlights
► A new semi-supervised variant of the PLSA algorithm is proposed.
► A mislabeling error model is incorporated in the generative aspect model.
► Experiments in four different datasets were performed.
► The method is particularly effective when the ratio of annotated data is very low.
► Comparison of the proposed method with state-of-the-art algorithms.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Pattern Recognition Letters - Volume 32, Issue 2, 15 January 2011, Pages 297–304

نویسندگان

Anastasia Krithara, Massih R. Amini, Cyril Goutte, Jean-Michel Renders,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Learning aspect models with partially labeled data

دسترسی سریع

ارتباط

English Website