کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
506268 864883 2016 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Understanding U.S. regional linguistic variation with Twitter data analysis
ترجمه فارسی عنوان
درک تنوع زبانی منطقه‌ای ایالات متحده با تجزیه و تحلیل داده‌های توییتر
کلمات کلیدی
رسانه های اجتماعی؛ زبانی؛ توییتر؛ گویش آمریکایی؛ منطقه؛ مناطق ایالات متحده؛ داده های مکانی معدن
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


• This paper seeks to understand U.S. regional linguistic variation by analyzing big data of geo-tagged tweets.
• This paper applies a hierarchical regionalization method to discover hierarchical dialect regions.
• This paper is among the very few papers that use social media data to study nation-wide linguistic variations.

We analyze a Big Data set of geo-tagged tweets for a year (Oct. 2013–Oct. 2014) to understand the regional linguistic variation in the U.S. Prior work on regional linguistic variations usually took a long time to collect data and focused on either rural or urban areas. Geo-tagged Twitter data offers an unprecedented database with rich linguistic representation of fine spatiotemporal resolution and continuity. From the one-year Twitter corpus, we extract lexical characteristics for twitter users by summarizing the frequencies of a set of lexical alternations that each user has used. We spatially aggregate and smooth each lexical characteristic to derive county-based linguistic variables, from which orthogonal dimensions are extracted using the principal component analysis (PCA). Finally a regionalization method is used to discover hierarchical dialect regions using the PCA components. The regionalization results reveal interesting linguistic regional variations in the U.S. The discovered regions not only confirm past research findings in the literature but also provide new insights and a more detailed understanding of very recent linguistic patterns in the U.S.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers, Environment and Urban Systems - Volume 59, September 2016, Pages 244–255
نویسندگان
, , , ,