Article ID Journal Published Year Pages File Type
4969131 Information Fusion 2017 28 Pages PDF
Abstract
Nowadays air quality data can be easily accumulated by sensors around the world. Analysis on air quality data is very useful for society decision. Among five major air pollutants which are calculated for AQI (Air Quality Index), PM2.5 data is the most concerned by the people. PM2.5 data is also cross-impacted with the other factors in the air and which has properties of non-linear non-stationary including high noise level and outlier. Traditional methods cannot solve the problem of PM2.5 data clustering very well because of their inherent characteristics. In this paper, a novel model-based feature extraction method is proposed to address this issue. The EPLS model includes: (1) Mode Decomposition, in which EEMD algorithm is applied to the aggregation dataset; (2) Dimension Reduction, which is carried out for a more significant set of vectors; (3) Least Squares Projection, in which all testing data are projected to the obtained vectors. Synthetic dataset and air quality dataset are applied to different clustering methods and similarity measures. Experimental results demonstrate that EPLS is efficient in dealing with high noise level and outlier air quality clustering problems, and which can also be adapted to various clustering techniques and distance measures.
Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , , , ,