کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
11018335 1720246 2018 12 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
LVQ-KNN: Composition-based DNA/RNA binning of short nucleotide sequences utilizing a prototype-based k-nearest neighbor approach
موضوعات مرتبط
علوم زیستی و بیوفناوری ایمنی شناسی و میکروب شناسی ویروس شناسی
پیش نمایش صفحه اول مقاله
LVQ-KNN: Composition-based DNA/RNA binning of short nucleotide sequences utilizing a prototype-based k-nearest neighbor approach
چکیده انگلیسی
Unbiased sequencing is an upcoming method to gain information of the microbiome in a sample and for the detection of unrecognized pathogens. There are many software tools for a taxonomic classification of such metagenomics datasets available. Numerous of them have a satisfactory sensitivity and specificity for known organisms, but they fail if the sample contains unknown organisms, which cannot be detected by similarity-based classification employing available databases. However, recognition of unknowns is especially important for the detection of newly emerging pathogens, which are often RNA viruses. Here we present the composition-based analysis tool LVQ-KNN for binning unclassified nucleotide sequence reads into their provenance classes DNA or RNA. With a 5-fold cross-validation, LVQ-KNN reached correct classification rates (CCR) of up to 99.9% for the classification into DNA/RNA. Real datasets gained CCRs of up to 94.5%. Comparing the method to another composition-based analysis tool, similar or better classification results were reached. LVQ-KNN is a new tool for DNA/RNA classification of sequence reads from unbiased sequencing approaches that could be applicable for the detection of yet unknown RNA viruses in metagenomic samples. The source-code, training and test data for LVQ-KNN is available at Github (https://github.com/ab1989/LVQ-KNN).
ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Virus Research - Volume 258, 15 October 2018, Pages 55-63
نویسندگان
, , , , ,