Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
6853933 | Data & Knowledge Engineering | 2018 | 14 Pages |
Abstract
We survey many machine learning algorithms on different types of program representations including software metrics, sequences, and tree structures. The approaches are evaluated based on classifying 52000 programs written in C language into 104 target labels. The experiments show that the tree-based classifiers dramatically achieve high performance in comparison with those of metrics-based or sequences-based; and two proposed models TBCNN + SVM and TBCNN + kNN rank as the top and the second classifiers. Pruning redundant AST branches leads to not only a substantial reduction in execution time but also an increase in accuracy.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Anh Viet Phan, Phuong Ngoc Chau, Minh Le Nguyen, Lam Thu Bui,