When is ‘nearest neighbour’ meaningful: A converse theorem and implications

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
4608798	1338382	2009	13 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

High dimensionality - ابعاد عالی Latent variable models - مدل های متغیر باقیمانده

موضوعات مرتبط

مهندسی و علوم پایه ریاضیات آنالیز ریاضی

پیش نمایش صفحه اول مقاله

When is ‘nearest neighbour’ meaningful: A converse theorem and implications

چکیده انگلیسی

Beyer et al. gave a sufficient condition for the high dimensional phenomenon known as the concentration of distances. Their work has pinpointed serious problems due to nearest neighbours not being meaningful in high dimensions. Here we establish the converse of their result, in order to answer the question as to when nearest neighbour is still meaningful in arbitrarily high dimensions. We then show for a class of realistic data distributions having non-i.i.d. dimensions, namely the family of linear latent variable models, that the Euclidean distance will not concentrate as long as the amount of ‘relevant’ dimensions grows no slower than the overall data dimensions. This condition is, of course, often not met in practice. After numerically validating our findings, we examine real data situations in two different areas (text-based document collections and gene expression arrays), which suggest that the presence or absence of distance concentration in high dimensional problems plays a role in making the data hard or easy to work with.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Complexity - Volume 25, Issue 4, August 2009, Pages 385–397

نویسندگان

Robert J. Durrant, Ata Kabán,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

When is ‘nearest neighbour’ meaningful: A converse theorem and implications

دسترسی سریع

ارتباط

English Website