کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
416789 681399 2013 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
An empirical study of tests for uniformity in multidimensional data
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
An empirical study of tests for uniformity in multidimensional data
چکیده انگلیسی


• A test of uniformity determines if data is uniform or possesses underlying structure.
• Tests of uniformity should be the first step in any pattern or exploratory data analysis.
• We develop new tests of uniformity and compare with many in the literature.
• One test outperforms others when regularity (minimum spacings) exists in the data.
• Our other test is the only option for testing on arbitrary supports and dimensions.

An important problem in high-dimensional data analysis is determining whether sample points are uniformly distributed (i.e., exhibit complete spatial randomness) over some compact support, or rather possess some underlying structure (e.g., clusters or other nonhomogeneities). We propose two new graph-theoretic tests of uniformity which utilize the minimum spanning tree and a snake (a short non-branching acyclic path connecting each data point). We compare the powers of statistics based on these graphs with other statistics from the literature on an array of non-uniform alternatives in a variety of supports. For data in a hypercube, we find that test statistics based on the minimum spanning tree have superior power when the data displays regularity (e.g., results from an inhibition process). For arbitrarily shaped or unknown supports, we use run length statistics of the sequence of segment lengths along the snake’s path to test uniformity. The snake is particularly useful because no knowledge or estimation of the support is required to compute the test statistic, it can be computed quickly for any dimension, and it shows what kinds of non-uniformities are present. These properties make the snake unique among multivariate tests of uniformity since others only function on specific and known supports, have computational difficulties in high dimension, or have inconsistent type I error rates.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Statistics & Data Analysis - Volume 64, August 2013, Pages 253–268
نویسندگان
, ,