کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4608798 1338382 2009 13 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
When is ‘nearest neighbour’ meaningful: A converse theorem and implications
موضوعات مرتبط
مهندسی و علوم پایه ریاضیات آنالیز ریاضی
پیش نمایش صفحه اول مقاله
When is ‘nearest neighbour’ meaningful: A converse theorem and implications
چکیده انگلیسی

Beyer et al. gave a sufficient condition for the high dimensional phenomenon known as the concentration of distances. Their work has pinpointed serious problems due to nearest neighbours not being meaningful in high dimensions. Here we establish the converse of their result, in order to answer the question as to when nearest neighbour is still meaningful in arbitrarily high dimensions. We then show for a class of realistic data distributions having non-i.i.d. dimensions, namely the family of linear latent variable models, that the Euclidean distance will not concentrate as long as the amount of ‘relevant’ dimensions grows no slower than the overall data dimensions. This condition is, of course, often not met in practice. After numerically validating our findings, we examine real data situations in two different areas (text-based document collections and gene expression arrays), which suggest that the presence or absence of distance concentration in high dimensional problems plays a role in making the data hard or easy to work with.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Complexity - Volume 25, Issue 4, August 2009, Pages 385–397
نویسندگان
, ,