Air quality data clustering using EPLS method

Article ID	Journal	Published Year	Pages	File Type
4969131	Information Fusion	2017	28 Pages	PDF

Abstract

Nowadays air quality data can be easily accumulated by sensors around the world. Analysis on air quality data is very useful for society decision. Among five major air pollutants which are calculated for AQI (Air Quality Index), PM2.5 data is the most concerned by the people. PM2.5 data is also cross-impacted with the other factors in the air and which has properties of non-linear non-stationary including high noise level and outlier. Traditional methods cannot solve the problem of PM2.5 data clustering very well because of their inherent characteristics. In this paper, a novel model-based feature extraction method is proposed to address this issue. The EPLS model includes: (1) Mode Decomposition, in which EEMD algorithm is applied to the aggregation dataset; (2) Dimension Reduction, which is carried out for a more significant set of vectors; (3) Least Squares Projection, in which all testing data are projected to the obtained vectors. Synthetic dataset and air quality dataset are applied to different clustering methods and similarity measures. Experimental results demonstrate that EPLS is efficient in dealing with high noise level and outlier air quality clustering problems, and which can also be adapted to various clustering techniques and distance measures.

Keywords

PM2.5 EEMD PCA Clustering Air quality