Adapting a classification rule to local and global shift when only unlabelled data are available

Article ID	Journal	Published Year	Pages	File Type
479637	European Journal of Operational Research	2015	13 Pages	PDF

Abstract

•The model addresses binary classification of evolving populations.•It bridges the gap between the observation of new features and their class labels.•It helps to improve classification even when model re-estimation is impossible.•The model addresses local and global drift.

For evolving populations the training data and the test data need not follow the same distribution. Thus, the performance of a prediction model will deteriorate over the course of time. This requires the re-estimation of the prediction model after some time. However, in many applications e.g. credit scoring, new labelled data are not available for re-estimation due to verification latency, i.e. label delay. Thus, methods which enable a prediction model to adapt to distributional changes by using only unlabelled data are highly desirable. A shift adaptation method for binary classification is presented here. The model is based on mixture distributions. The conditional feature distributions are determined at the time where labelled data are available, and the unconditional feature distribution is determined at the time where new unlabelled data are accessible. These mixture distributions provide information on the old and the new positions of subpopulations. A transition model then describes how the subpopulations of each class have drifted to form the new unconditional feature distribution. Assuming that the conditional distributions are reorganised using a minimum of energy, a two-step estimation procedure results. First, for a given class prior distribution the transfer of probability mass is estimated such that the energy required to obtain the new unconditional distribution by a local transfer of the old conditional distributions is a minimum. Since the optimal solution of the resulting transportation problem measures the distance between the old and the new distributions, the change of the class prior distribution is found in a second step by solving the transportation problem for varying class prior distributions and selecting the value for which the objective function is a minimum. Using the solution of the transportation problem and the component parameters of the unconditional feature distribution, the new conditional feature distribution can be determined. This thus allows for a shift adaptation of the classification rule. The performance of the proposed model is investigated using a large real-world dataset on default rates in Danish companies. The results show that the shift adaptation improves classification results.

Keywords

Dataset shift Concept drift