کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
8953552 1645948 2019 29 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A robust correlation analysis framework for imbalanced and dichotomous data with uncertainty
ترجمه فارسی عنوان
یک چارچوب تجزیه و تحلیل قوی همبستگی برای داده های عدم توازن و دوجانبه با عدم اطمینان
کلمات کلیدی
همبستگی محصول پیرسون، داده های نامتعادل شاخص شفافیت، متغیر دوجنسی،
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
چکیده انگلیسی
Correlation analysis is one of the fundamental mathematical tools for identifying dependence between classes. However, the accuracy of the analysis could be jeopardized due to variance error in the data set. This paper provides a mathematical analysis of the impact of imbalanced data concerning Pearson Product Moment Correlation (PPMC) analysis. To alleviate this issue, the novel framework Robust Correlation Analysis Framework (RCAF) is proposed to improve the correlation analysis accuracy. A review of the issues due to imbalanced data and data uncertainty in machine learning is given. The proposed framework is tested with in-depth analysis of real-life solar irradiance and weather condition data from Johannesburg, South Africa. Additionally, comparisons of correlation analysis with prominent sampling techniques, i.e., Synthetic Minority Over-Sampling Technique (SMOTE) and Adaptive Synthetic (ADASYN) sampling techniques are conducted. Finally, K-Means and Wards Agglomerative hierarchical clustering are performed to study the correlation results. Compared to the traditional PPMC, RCAF can reduce the standard deviation of the correlation coefficient under imbalanced data in the range of 32.5%-93.02%.
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Sciences - Volume 470, January 2019, Pages 58-77
نویسندگان
, , , , , , , , , ,