کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
515498 867033 2013 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A link-bridged topic model for cross-domain document classification
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
A link-bridged topic model for cross-domain document classification
چکیده انگلیسی


• We propose a Link-Bridged Topic model for cross-domain document classification.
• LBT utilizes an auxiliary link network to discover the co-citation relationship.
• LBT combines the content information and link structures into a graphical model.
• LBT outperforms both multi-view learning and single-view transfer baselines.

Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 49, Issue 6, November 2013, Pages 1181–1193
نویسندگان
, , , ,