Article ID Journal Published Year Pages File Type
403478 Knowledge-Based Systems 2015 12 Pages PDF
Abstract

•The “location difference of multiple distances” and a method LDMDBA are proposed.•LDMDBA has a time complexity of O(logdnlogn) and does not rely on tree structures.•Only LDMDBA can be efficiently applied to high dimensional data.•LDMDBA has a time complexity of (logdlogn) for predicting a new data point.•LDMDBA has very good stability and can be applied to large databases.

k-nearest neighbors (kNN) classifiers are commonly used in various applications due to their relative simplicity and the absence of necessary training. However, the time complexity of the basic algorithm is quadratic, which makes them inappropriate for large scale datasets. At the same time, the performance of most improved algorithms based on tree structures decreases rapidly with increase in dimensionality of dataset, and tree structures have different complexity in different datasets. In this paper, we introduce the concept of “location difference of multiple distances, and use it to measure the difference between different data points. In this way, location difference of multiple distances based nearest neighbors searching algorithm (LDMDBA) is proposed. LDMDBA has a time complexity of O(logdnlogn) and does not rely on a search tree. This makes LDMDBA the only kNN method that can be efficiently applied to high dimensional data and has very good stability on different datasets. In addition, most of the existing methods have a time complexity of O(n) to predict a data point outside the dataset. By contrast, LDMDBA has a time complexity of O(logdlogn) to predict a query point in datasets of different dimensions, and, therefore, can be applied in real systems and large scale databases. The effectiveness and efficiency of LDMDBA are demonstrated in experiments involving public and artificial datasets.

Related Topics
Physical Sciences and Engineering Computer Science Artificial Intelligence
Authors
, , , , ,