دانلود رایگان مقاله: از جمله ویژگی های قدرت بالا در مدل های پیش بینی شده: یک مطالعه موردی در پیش بینی وقفه در بخش انرژی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
554714	1451072	2015	10 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Including high-cardinality attributes in predictive models: A case study in churn prediction in the energy sector

ترجمه فارسی عنوان

از جمله ویژگی های قدرت بالا در مدل های پیش بینی شده: یک مطالعه موردی در پیش بینی وقفه در بخش انرژی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

داده کاوی، مدل سازی پیش بینی کننده ویژگی های قدرتمند، پیش بینی تراکم

Data mining - داده‌کاوی Predictive modeling - مدل سازی پیش بینی شده Churn prediction - پیش بینی تراکم

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر سیستم های اطلاعاتی

پیش نمایش مقاله

از جمله ویژگی های قدرت بالا در مدل های پیش بینی شده: یک مطالعه موردی در پیش بینی وقفه در بخش انرژی

چکیده انگلیسی

• This study addresses the problem of including high-cardinality data
• Several possible transformation functions are introduced that allow to include high-cardinality data
• Using a unique data set with more than 1 million customers, we show that adding HC data improves the predictive performance
• We contribute to the area of big data analytics by empirically demonstrating that having more data leads to better models

High-cardinality attributes are categorical attributes that contain a very large number of distinct values, like for example: family names, ZIP codes or bank account numbers. Within a predictive modeling setting, such features could be highly informative as it might be useful to know that people live in the same village or pay with the same bank account number. Despite this notable and intuitive advantage, high-cardinality attributes are rarely used in predictive modeling. The main reason for this is that including these attributes by using traditional transformation methods is either impossible due to anonymization of the data (when using semantic grouping of the values) or will vastly increase the dimensionality of the data set (when using dummy encoding), thereby making it difficult or even impossible for most classification techniques to build prediction models. The main contributions of this work are (1) the introduction of several possible transformation functions coming from different domains and contexts, that allow the inclusion of high-cardinality features in predictive models. (2) Using a unique data set of a large energy company with more than 1 million customers, we show that adding such features indeed improves the predictive performance of the model significantly. Moreover, (3) we empirically demonstrate that having more data leads to better prediction models, which is not observed for “traditional” data. As such, we also contribute to the area of big data analytics.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Decision Support Systems - Volume 72, April 2015, Pages 72–81

نویسندگان

Julie Moeyersoms, David Martens,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : از جمله ویژگی های قدرت بالا در مدل های پیش بینی شده: یک مطالعه موردی در پیش بینی وقفه در بخش انرژی

دسترسی سریع

ارتباط

English Website