Article ID Journal Published Year Pages File Type
1110943 Procedia - Social and Behavioral Sciences 2015 8 Pages PDF
Abstract

Corpus-based dialectology of less-resourced and functionally limited native languages is a developing field of linguistics. In this paper we discuss challenges of annotating dialect corpora for Turkic languages of Russia by the example of Mishar dialect of Tatar language. Peculiarities of grammatical variability in Mishar dialect are investigated from the point of view of automatic annotation and the search functionality of the corpus is described. The proposed methodology of annotation can be used when creating multilingual integrated resources and parallel corpora of closely related languages.

Related Topics
Social Sciences and Humanities Arts and Humanities Arts and Humanities (General)