کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
552037 1451078 2014 11 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Generating information for small data sets with a multi-modal distribution
ترجمه فارسی عنوان
تولید اطلاعات برای مجموعه داده های کوچک با توزیع چند مدال
کلمات کلیدی
توزیع چندبعدی، مجموعه داده های کوچک، چند نمونه مجازی اندازه نمونه مجازی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر سیستم های اطلاعاتی
چکیده انگلیسی


• We propose a new method to assess the modality of small data.
• We construct a unique scheme to systematically generate multi-modal virtual samples.
• In this scheme, we suggest a criterion to control the size of virtual samples.
• We utilize six data sets to show the superiority of the proposed method.

Virtual sample generation approaches have been used with small data sets to enhance classification performance in a number of reports. The appropriate estimation of data distribution plays an important role in this process, with performance usually better for data sets that have a simple distribution rather than a complex one. Mixed-type data sets often have a multi-modal distribution instead of a simple, uni-modal one. This study thus proposes a new approach to detect multi-modality in data sets, to avoid the problem of inappropriately using a uni-modal distribution. We utilize the common k-means clustering method to detect possible clusters, and, based on the clustered sample sets, a Weibull variate is developed for each of these to produce multi-modal virtual data. In this approach, the degree of error variation in the Weibull skewness between the original and virtual data is measured and used as the criterion for determining the sizes of virtual samples. Six data sets with different training data sizes are employed to check the performance of the proposed method, and comparisons are made based on the classification accuracies. The results using non-parametric testing show that the proposed method has better classification performance to that of the recently presented Mega-Trend-Diffusion method.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Decision Support Systems - Volume 66, October 2014, Pages 71–81
نویسندگان
, ,