کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1110945 1488361 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish
موضوعات مرتبط
علوم انسانی و اجتماعی علوم انسانی و هنر هنر و علوم انسانی (عمومی)
پیش نمایش صفحه اول مقاله
Syntactic Ngrams as Keystructures Reflecting Typical Syntactic Patterns of Corpora in Finnish
چکیده انگلیسی

This article studies syntactic ngrams, i.e. little subtrees of dependency syntax analyses, as keystructures reflecting syntactic characteristics of corpora. While traditional keywords correspond to statistically more or less frequent words of a corpus and are often informative on the corpus topic and style, unlexicalized syntactic ngrams applied in this study extend the level of description beyond individual words to sequences of syntactic elements. The article analyzes the utility of these sequences in corpus description and gives first results on the structural characteristics reflected by them in the studied texts, including Finnish literature, Internet forum discussions from the major Finnish social networking website and Internet discussions following the news and editorials of the major Finnish newspaper's website. The syntactic ngrams are produced with the freely available Finnish Dependency Parser and Ngram Builder and the keystructures analyzed with a linear classifier. The results suggest that syntactic ngrams illustrate both topical features, such as names and Internet urls discussed in the corpora, as well as structural characteristics, such as subject-verb combinations, negations and informal sentence structures, thus both generalizing the information given by traditional keywords from individual words to concepts and providing new knowledge about typical constructions not reached by lexemes.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia - Social and Behavioral Sciences - Volume 198, 24 July 2015, Pages 233-241