کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4965158 1448226 2017 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A quantitative analysis of global gazetteers: Patterns of coverage for common feature types
ترجمه فارسی عنوان
تجزیه و تحلیل کمی از گوینده های جهانی: الگوهای پوشش برای انواع ویژگی های مشترک
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نرم افزارهای علوم کامپیوتر
چکیده انگلیسی


- Global coverage maps presented for common feature types in two gazetteers
- Wide discrepancies in coverage between two global gazetteers illustrated
- The country unit is the main driver of variation in global gazetteer coverage.
- Natural feature type coverage is unbalanced, concentrated in few countries.
- Variation in gazetteer coverage has important implications for many applications.

Gazetteers are important tools used in a wide variety of workflows that depend on linking natural language text to geographical space. The spatial properties of these data sources, such as coverage, balance, and completeness, affect the performance of common tasks such as geoparsing and geocoding. However, little attention has focused on how these properties vary in global gazetteers, particularly across country boundaries and according to feature types. In this paper, we present a detailed investigation of the spatial properties of two open gazetteers with worldwide coverage: GeoNames, and the Getty Thesaurus of Geographic Names (TGN). Using point density maps, correlations, and linear regressions, we analyze the global spatial coverage of each data source for the full set of features and for top feature types: populated places, streams, mountains, and hills. Results show wide discrepancies in coverage between the two datasets, sharp changes in feature type coverage across country borders, and idiosyncratic patterns dominated by a few countries for the more sparsely covered natural features. As more and more systems rely on recognizing and grounding named places, these patterns can influence the analysis of growing amounts of online text content and reinforce or amplify existing inequalities.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers, Environment and Urban Systems - Volume 64, July 2017, Pages 309-320
نویسندگان
, , ,