Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
535757 | Pattern Recognition Letters | 2013 | 5 Pages |
Automatic identification of source code authors has many applications in different fields such as source code plagiarism detection, and law suit prosecution. This paper presents a new source code author identification system based on an unsupervised feature learning technique. As a method of extracting features from high dimensional data, unsupervised feature learning has obtained a great success in many fields such as character recognition and image classification. However, according to our knowledge it has not been applied for source code author identification systems. Therefore, we investigated an unsupervised feature learning technique called sparse auto-encoder as a method of extracting features from source code files. Our system was evaluated with several datasets and results have shown that performance is very close to the state of art techniques in the source code identification field.
► We designed and built a new source code author identification system. ► It is based on an unsupervised feature learning technique. ► Our system is evaluated with several datasets. ► Results shown that our system outperforms some existing systems.