کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
10355212 867112 2005 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Adaptive sampling for thresholding in document filtering and classification
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
پیش نمایش صفحه اول مقاله
Adaptive sampling for thresholding in document filtering and classification
چکیده انگلیسی
Document filtering (DF) and document classification (DC) are often integrated together to classify suitable documents into suitable categories. A popular way to achieve integrated DF and DC is to associate each category with a threshold. A document d may be classified into a category c only if its degree of acceptance (DOA) with respect to c is higher than the threshold of c. Therefore, tuning a proper threshold for each category is essential. A threshold that is too high (low) may mislead the classifier to reject (accept) too many documents. Unfortunately, thresholding is often based on the classifier's DOA estimations, which cannot always be reliable, due to two common phenomena: (1) the DOA estimations made by the classifier cannot always be correct, and (2) not all documents may be classified without any controversy. Unreliable estimations are actually noises that may mislead the thresholding process. In this paper, we present an adaptive and parameter-free technique AS4T to sample reliable DOA estimations for thresholding. AS4T operates by adapting to the classifier's status, without needing to define any parameters. Experimental results show that, by helping to derive more proper thresholds, AS4T may guide various classifiers to achieve significantly better and more stable performances under different circumstances. The contributions are of practical significance for real-world integrated DF and DC.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Processing & Management - Volume 41, Issue 4, July 2005, Pages 745-758
نویسندگان
, ,