Article ID Journal Published Year Pages File Type
529877 Pattern Recognition 2015 12 Pages PDF
Abstract

•A robust active learning method, called RDS, based on a priori data organization.•RDS properly balances sample diversity and uncertainty for useful sample selection.•It provides high classification accuracy for the automated diagnosis of parasites.•Comparisons with different clustering, classification and other literature methods.•RDS was evaluated by an experienced expert in parasitology using a realistic scenario.

We have developed an automated system for the diagnosis of intestinal parasites from optical microscopy images. The objects (species of parasites and impurities) segmented from these images form a large dataset. We are interested in the active learning problem of selecting a reasonably small number of objects to be labeled under an expert׳s supervision for use in training a pattern classifier. However, impurities are very numerous, constitute several clusters in the feature space, and can be quite similar to some species of parasites, leading to a significant challenge for active learning methods. We propose a technique that pre-organizes the data and then properly balances the selection of samples from all classes and uncertain samples for training. Early data organization avoids reprocessing of the large dataset at each learning iteration, enabling the halting of sample selection after a desired number of samples per iteration, yielding interactive response time. We validate our method by comparing it with state-of-the-art approaches, using a previously labeled dataset of almost 6000 objects. Moreover, we report results from experiments on a very realistic scenario, consisting of a dataset with over 140,000 unlabeled objects, under unbalanced classes, the absence of some classes, and the presence of a very large set of impurities.

Related Topics
Physical Sciences and Engineering Computer Science Computer Vision and Pattern Recognition
Authors
, , , , ,