Selecting level-specific specialized vocabulary using statistical measures

Article ID	Journal	Published Year	Pages	File Type
373303	System	2006	15 Pages	PDF

Abstract

To find an easy-to-use, automated tool to identify technical vocabulary applicable to learners at various levels, nine statistical measures were applied to the 7.3-million-word ‘commerce and finance’ component of the British National Corpus. The resulting word lists showed that each statistical measure extracted a different level of specialized vocabulary as measured by word length, vocabulary level, US native speaker grade level, and Japanese school textbook vocabulary coverage, and that these measures produced level-specific words; i.e., beginning-level basic business words were identified using Cosine and the complimentary similarity measure; intermediate-level business words were extracted using log-likelihood, the chi-square test, and the chi-square test with Yates’s correction; and advanced-level business word lists were created using mutual information and McNemar’s test. We conclude that these statistical measures are effective tools for identifying multi-level specialized vocabulary for pedagogical purposes.

Keywords

ESP Vocabulary selection Extraction Statistical measures Vocabulary Multi-level corpus