کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
434318 1441700 2013 16 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Using heuristics to estimate an appropriate number of latent topics in source code analysis
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
Using heuristics to estimate an appropriate number of latent topics in source code analysis
چکیده انگلیسی


• A method to estimate the appropriate number of latent topics for source code.
• Using pairwise relationships between code fragments for conceptual analysis.
• Estimating good latent topic counts based on experimental results.
• Verification of previous estimates for latent topic counts in source code.

Latent Dirichlet Allocation (LDA) is a data clustering algorithm that performs especially well for text documents. In natural-language applications it automatically finds groups of related words (called “latent topics”) and clusters the documents into sets that are about the same “topic”. LDA has also been applied to source code, where the documents are natural source code units such as methods or classes, and the words are the keywords, operators, and programmer-defined names in the code. The problem of determining a topic count that most appropriately describes a set of source code documents is an open problem. We address this empirically by constructing clusterings with different numbers of topics for a large number of software systems, and then use a pair of measures based on source code locality and topic model similarity to assess how well the topic structure identifies related source code units. Results suggest that the topic count required can be closely approximated using the number of software code fragments in the system. We extend these results to recommend appropriate topic counts for arbitrary software systems based on an analysis of a set of open source systems.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Science of Computer Programming - Volume 78, Issue 9, 1 September 2013, Pages 1663-1678