کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
1118870 1488464 2013 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
esTenTen, a Vast Web Corpus of Peninsular and American Spanish
موضوعات مرتبط
علوم انسانی و اجتماعی علوم انسانی و هنر هنر و علوم انسانی (عمومی)
پیش نمایش صفحه اول مقاله
esTenTen, a Vast Web Corpus of Peninsular and American Spanish
چکیده انگلیسی

Everyone working on general language would like their corpus to be bigger, wider-coverage, cleaner, duplicate-free, and with richer metadata. As a response to that wish, Lexical Computing Ltd. has a programme to develop very large ‘TenTen’ web corpora. In this paper we introduce the Spanish corpus, esTenTen, of 8 billion words and 19 different national varieties of Spanish. We investigate the distance between the national varieties as represented in the corpus, and examine in detail the keywords of Peninsular Spanish vs. American Spanish, finding a wide range of linguistic, cultural and political contrasts.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Procedia - Social and Behavioral Sciences - Volume 95, 25 October 2013, Pages 12-19