کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
377567 658795 2015 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network
کلمات کلیدی
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network
چکیده انگلیسی


• We propose a new alignment-free method for the classification of DNA barcoding based on both a spectral representation and prototype-based unsupervised clustering.
• We investigate how much the characteristics of different species are related to their DNA barcoding spectral distribution.
• We compare the proposed method with six state-of-the-art machine learning classifiers and the results confirm our method overcome all the other classifiers when applied to short fragments.

ObjectivesIn this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed.MethodsIn the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource “Barcode of Life Database”.ResultsThe experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%.ConclusionsOur results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Artificial Intelligence in Medicine - Volume 64, Issue 3, July 2015, Pages 173–184
نویسندگان
, , , ,