کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
456333 695696 2010 8 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
The Normalised Compression Distance as a file fragment classifier
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر شبکه های کامپیوتری و ارتباطات
پیش نمایش صفحه اول مقاله
The Normalised Compression Distance as a file fragment classifier
چکیده انگلیسی

We have applied the generalised and universal distance measure NCD—Normalised Compression Distance—to the problem of determining the type of file fragments. To enable later comparison of the results, the algorithm was applied to fragments of a publicly available corpus of files. The NCD algorithm in conjunction with the k-nearest-neighbour (k ranging from one to ten) as the classification algorithm was applied to a random selection of circa 3000 512-byte file fragments from 28 different file types. This procedure was then repeated ten times. While the overall accuracy of the n-valued classification only improved the prior probability from approximately 3.5% to circa 32–36%, the classifier reached accuracies of circa 70% for the most successful file types.A prototype of a file fragment classifier was then developed and evaluated on new set of data (from the same corpus). Some circa 3000 fragments were selected at random and the experiment repeated five times. This prototype classifier remained successful at classifying individual file types with accuracies ranging from only slightly lower than 70% for the best class, down to similar accuracies as in the prior experiment.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Digital Investigation - Volume 7, Supplement, August 2010, Pages S24–S31
نویسندگان
,