Article ID Journal Published Year Pages File Type
529197 Information Fusion 2008 14 Pages PDF
Abstract

We describe an ensemble approach to learning from arbitrarily partitioned data. The partitioning comes from the distributed processing requirements of a large scale simulation. The volume of the data is such that classifiers can train only on data local to a given partition. As a result of the partition reflecting the needs of the simulation, the class statistics can vary from partition to partition. Some classes will likely be missing from some partitions. We combine a fast ensemble learning algorithm with probabilistic majority voting in order to learn an accurate classifier from such data. Results from simulations of an impactor bar crushing a storage canister and from facial feature recognition show that regions of interest are successfully identified in spite of the class imbalance in the individual training sets.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , ,