Genetic algorithm for text clustering based on latent semantic indexing

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
469606	698334	2009	7 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

Genetic algorithm - الگوریتم ژنتیک Text clustering - خوشه بندی متن Latent Semantic Indexing - نمایه سازی معنایی باقیمانده

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر علوم کامپیوتر (عمومی)

پیش نمایش صفحه اول مقاله

Genetic algorithm for text clustering based on latent semantic indexing

چکیده انگلیسی

In this paper, we develop a genetic algorithm method based on a latent semantic model (GAL) for text clustering. The main difficulty in the application of genetic algorithms (GAs) for document clustering is thousands or even tens of thousands of dimensions in feature space which is typical for textual data. Because the most straightforward and popular approach represents texts with the vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Latent semantic indexing (LSI) is a successful technology in information retrieval which attempts to explore the latent semantics implied by a query or a document through representing them in a dimension-reduced space. Meanwhile, LSI takes into account the effects of synonymy and polysemy, which constructs a semantic structure in textual data. GA belongs to search techniques that can efficiently evolve the optimal solution in the reduced space. We propose a variable string length genetic algorithm which has been exploited for automatically evolving the proper number of clusters as well as providing near optimal data set clustering. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. The superiority of GAL approach over conventional GA applied in VSM model is demonstrated by providing good Reuter document clustering results.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computers & Mathematics with Applications - Volume 57, Issues 11–12, June 2009, Pages 1901–1907

نویسندگان

Wei Song, Soon Cheol Park,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

Genetic algorithm for text clustering based on latent semantic indexing

دسترسی سریع

ارتباط

English Website