کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
433483 1441719 2011 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Applying a dynamic threshold to improve cluster detection of LSI
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Applying a dynamic threshold to improve cluster detection of LSI
چکیده انگلیسی

Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice.

Research highlights
► We examine two dendrogram cutting algorithms for Latent Semantic Indexing.
► We discuss the limitations of the most used cutting algorithm, the fixed height cut.
► We present an alternative, the Dynamic Hybrid cut, which cuts at flexible heights.
► We present the results from two case studies performed at Philips Healthcare.
► From these case studies we conclude that the Dynamic Hybrid cut performs better.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Science of Computer Programming - Volume 76, Issue 12, 1 December 2011, Pages 1261–1274
نویسندگان
, ,