کد مقاله | کد نشریه | سال انتشار | مقاله انگلیسی | نسخه تمام متن |
---|---|---|---|---|
486706 | 703390 | 2012 | 9 صفحه PDF | دانلود رایگان |
The rise of the Web 2.0 caused a real democratization in the context of data generation. These data are mostly provided in the form of texts, ranging from the reports provided by news portals, using a formal language, to comments in blog and micro-blogging applications that abuse the use of an informal language. Address this heterogeneity is an essential preprocessing so that these data can be used by tools that aim to infer accurate information based on such data. Thus, this work presents the HASCH (High Performance Automatic Spell CHEcker), whose objective is to correct spelling in Portuguese texts collected from the Web. Being a tool that aims to handle a large volume of data, HASCH is completely parallelized in shared memory. In our evaluation, we found that the HASCH was extremely effective in the correction of very large texts from different Web sources, with a almost superlinear speedup.
Journal: Procedia Computer Science - Volume 9, 2012, Pages 403-411