Monitoring polysemy: Word space models as a tool for large-scale lexical semantic analysis

Article ID	Journal	Published Year	Pages	File Type
935616	Lingua	2015	20 Pages	PDF

Abstract

•We argue for Word Space Models as a tool for lexicological research on big corpora.•We give a non-technical introduction to word space modelling on the token-level.•We integrate Word Space Models with visual analytics for in-depth human analyses.•We present a case study of semantic structure finding for Dutch polysemous nouns.•We set out a research programme to tailor the models further to lexicologists’ needs.

This paper demonstrates how token-level Word Space Models (a distributional semantic technique that was originally developed in statistical natural language processing) can be developed into a heuristic tool to support lexicological and lexicographical analyses of large amounts of corpus data. The paper provides a non-technical introduction to the statistical methods and illustrates with a case study analysis of the Dutch polysemous noun ‘monitor’ how token-level Word Space Models in combination with visualisation techniques allow human analysts to identify semantic patterns in an unstructured set of attestations. Additionally, we show how the interactive features of the visualisation make it possible to explore the effect of different contextual factors on the distributional model.

Keywords

Visual analytics Statistical analysis Lexical semantics