Using small random samples for the manual evaluation of statistical association measures

Article ID	Journal	Published Year	Pages	File Type
10369495	Computer Speech & Language	2005	17 Pages	PDF

Abstract

In this paper, we describe the empirical evaluation of statistical association measures for the extraction of lexical collocations from text corpora. We argue that the results of an evaluation experiment cannot easily be generalized to a different setting. Consequently, such experiments have to be carried out under conditions that are as similar as possible to the intended use of the measures. Finally, we show how an evaluation strategy based on random samples can reduce the amount of manual annotation work significantly, making it possible to perform many more evaluation experiments under specific conditions.