کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
416196 681296 2007 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Statistics to measure correlation for data mining applications
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Statistics to measure correlation for data mining applications
چکیده انگلیسی

Correlation is usually used in the context of real-valued sequences but, in data mining, the values of fields may be of various types—real, nominal or ordinal. Techniques for measuring correlation between any two sequences of data are reviewed, regardless of their type. In particular, a new technique for measuring the correlation between real-valued data and nominal data is proposed. The technique relies on the definition of an assignment of the nominal values to real values and hence is called A-correlation (A for assignment). The proposed assignment is defined to be the most favourable of all such assignments and can be efficiently computed. Moreover, it is shown that the resulting correlation coefficient has a natural interpretation independent of the assignment. With nominal/nominal data, Cramer's V-statistic can be used or, alternatively, a new statistic based on matching. These can also be used in the ordinal/nominal case. With just ordinal data, correlations based on rank are appropriate and these can also be used in the ordinal/ordinal and the ordinal/real cases.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Statistics & Data Analysis - Volume 51, Issue 8, 1 May 2007, Pages 3968–3982
نویسندگان
,