Article ID Journal Published Year Pages File Type
935616 Lingua 2015 20 Pages PDF
Abstract

•We argue for Word Space Models as a tool for lexicological research on big corpora.•We give a non-technical introduction to word space modelling on the token-level.•We integrate Word Space Models with visual analytics for in-depth human analyses.•We present a case study of semantic structure finding for Dutch polysemous nouns.•We set out a research programme to tailor the models further to lexicologists’ needs.

This paper demonstrates how token-level Word Space Models (a distributional semantic technique that was originally developed in statistical natural language processing) can be developed into a heuristic tool to support lexicological and lexicographical analyses of large amounts of corpus data. The paper provides a non-technical introduction to the statistical methods and illustrates with a case study analysis of the Dutch polysemous noun ‘monitor’ how token-level Word Space Models in combination with visualisation techniques allow human analysts to identify semantic patterns in an unstructured set of attestations. Additionally, we show how the interactive features of the visualisation make it possible to explore the effect of different contextual factors on the distributional model.

Related Topics
Social Sciences and Humanities Arts and Humanities Language and Linguistics
Authors
, , , ,