Article ID Journal Published Year Pages File Type
4374762 Ecological Informatics 2016 22 Pages PDF
Abstract

•Paper provides a comparative study of various feature extraction methods applied to real world data.•We use 16 methods of dimensionality reduction and fractional distances.•Fractional distances exhibit superior performance.•Isomap, Landmark Isomap and Factor Analysis can be used to formulate universal mappings.

In the paper methods aimed at handling high-dimensional weather forecasts data used to predict the concentrations of PM10, PM2.5, SO2, NO, CO and O3 are being proposed. The procedure employed to predict pollution normally requires historical data samples for a large number of points in time — particularly weather forecast data, actual weather data and pollution data. Likewise, it typically involves using numerous features related to atmospheric conditions. Consequently the analysis of such datasets to generate accurate forecasts becomes very cumbersome task. The paper examines a variety of unsupervised dimensionality reduction methods aimed at obtaining compact yet informative set of features. As an alternative, approach using fractional distances for data analysis tasks is being considered as well. Both strategies were evaluated on real-world data obtained from the Institute of Meteorology and Water Management in Katowice (Poland), with extended Air Pollution Forecast Model (e-APFM) being used as underlying prediction tool. It was found that employing fractional distance as a dissimilarity measure ensures the best accuracy of forecasting. Satisfactory results can be also obtained with Isomap, Landmark Isomap and Factor Analysis as dimensionality reduction techniques. These methods can be also used to formulate universal mapping, ready-to-use for data gathered at different geographical areas.

Related Topics
Life Sciences Agricultural and Biological Sciences Ecology, Evolution, Behavior and Systematics
Authors
, ,