کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
382471 660763 2016 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Discretization of continuous attributes through low frequency numerical values and attribute interdependency
ترجمه فارسی عنوان
تعریف ویژگی های پیوسته از طریق مقادیر عددی فرکانس پایین و ویژگی وابستگی متقابل
کلمات کلیدی
اختیار داده ها، داده پیش پردازش، پاک کردن داده، فقدان ارزشگذاری، تشخیص داده های فاسد، داده کاوی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی


• A new discretization technique called LFD.
• Does not require any user input.
• Interval width, number and frequency are automatically determined; all data driven.
• Minimizes information loss due to discretization by choosing low frequency cut points.
• Categorical attributes are taken as reference point for discretization.

Discretization is the process of converting numerical values into categorical values. There are many existing techniques for discretization. However, the existing techniques have various limitations such as the requirement of a user input on the number of categories and number of records in each category. Therefore, we propose a new discretization technique called low frequency discretizer (LFD) that does not require any user input. There are some existing techniques that do not require user input, but they rely on various assumptions such as the number of records in each interval is same, and the number of intervals is equal to the number of records in each interval. These assumptions are often difficult to justify. LFD does not require any assumptions. In LFD the number of categories and frequency of each category are not pre-defined, rather data driven. Other contributions of LFD are as follows. LFD uses low frequency values as cut points and thus reduces the information loss due to discretization. It uses all other categorical attributes and any numerical attribute that has already been categorized. It considers that the influence of an attribute in discretization of another attribute depends on the strength of their relationship. We evaluate LFD by comparing it with six (6) existing techniques on eight (8) datasets for three different types of evaluation, namely the classification accuracy, imputation accuracy and noise detection accuracy. Our experimental results indicate a significant improvement based on the sign test analysis.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 45, 1 March 2016, Pages 410–423
نویسندگان
, ,