کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
392125 664670 2015 20 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Random Ordinality Ensembles: Ensemble methods for multi-valued categorical data
ترجمه فارسی عنوان
مجموعه عادت های تصادفی: روش های گروهی برای داده های طبقه بندی چند متغیره
کلمات کلیدی
گروه سازنده درخت تصمیم گیری، داده های طبقه بندی شده چند راهه تقسیم، تقسیم باینری
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی

Data with multi-valued categorical attributes can cause major problems for decision trees. The high branching factor can lead to data fragmentation, where decisions have little or no statistical support. In this paper, we propose a new ensemble method, Random Ordinality Ensembles (ROE), that reduces this problem, and provides significantly improved accuracies over current ensemble methods. We perform a random projection of the categorical data into a continuous space. As the transformation to continuous data is a random process, each dataset has a different imposed ordinality. A decision tree that learns on this new continuous space is able to use binary splits, hence reduces the data fragmentation problem. Generally, these binary trees are accurate. Diverse training datasets ensure diverse decision trees in the ensemble. We created two variants of the technique, ROE. In the first variant, we used decision trees as the base models for ensembles. In the second variant, we combined the attribute randomisation of Random Subspaces with Random Ordinality. These methods match or outperform other popular ensemble methods. Different properties of these ensembles were studied. The study suggests that random ordinality trees are generally more accurate and smaller than multi-way split decision trees. It is also shown that random ordinality attributes can be used to improve Bagging and AdaBoost.M1 ensemble methods.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 296, 1 March 2015, Pages 75–94
نویسندگان
, ,