دانلود رایگان مقاله: تنوع زیستی معادن با استفاده از کلمه استفاده و ویژگی های همپوشانی

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
382277	660754	2014	15 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

Mining language variation using word using and collocation characteristics

ترجمه فارسی عنوان

تنوع زیستی معادن با استفاده از کلمه استفاده و ویژگی های همپوشانی

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

تنوع زبان، استخراج متن، نسبت رتبه فرکانس، مصلحت کلی

Language variation - تنوع زبان Text mining - متن‌کاوی

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

تنوع زیستی معادن با استفاده از کلمه استفاده و ویژگی های همپوشانی

چکیده انگلیسی

• Two metrics are proposed for extracting language variation characteristics.
• Two textual features are derived by employing the two proposed textual metrics.
• Using our features, language variation cues can be visualized.
• Our method can display language changes when semantics and syntax are unknown.
• Both entropy-based analysis and simulations prove the feasibility of our algorithm.

Two textual metrics “Frequency Rank” (FR) and “Intimacy” are proposed in this paper to measure the word using and collocation characteristics which are two important aspects of text style. The FR, derived from the local index numbers of terms in a sentences ordered by the global frequency of terms, provides single-term-level information. The Intimacy models relationship between a word and others, i.e. the closeness a term is to other terms in the same sentence. Two textual features “Frequency Rank Ratio (FRR)” and “Overall Intimacy (OI)” for capturing language variation are derived by employing the two proposed textual metrics. Using the derived features, language variation among documents can be visualized in a text space. Three corpora consisting of documents of diverse topics, genres, regions, and dates of writing are designed and collected to evaluate the proposed algorithms. Extensive simulations are conducted to verify the feasibility and performance of our implementation. Both theoretical analyses based on entropy and the simulations demonstrate the feasibility of our method. We also show the proposed algorithm can be used for visualizing the closeness of several western languages. Variation of modern English over time is also recognizable when using our analysis method. Finally, our method is compared to conventional text classification implementations. The comparative results indicate our method outperforms the others.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 41, Issue 17, 1 December 2014, Pages 7805–7819

نویسندگان

Peng Tang, Tommy W.S. Chow,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : تنوع زیستی معادن با استفاده از کلمه استفاده و ویژگی های همپوشانی

دسترسی سریع

ارتباط

English Website