دانلود رایگان مقاله: SFEM: روش‌شناسی استخراج ویژگی ساختاری برای تشخیص اسناد اداری مخرب با استفاده از روش های یادگیری ماشین

کد مقاله	کد نشریه	سال انتشار	مقاله انگلیسی	نسخه تمام متن
382996	660799	2016	20 صفحه PDF	دانلود رایگان

عنوان انگلیسی مقاله ISI

SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods

ترجمه فارسی عنوان

SFEM: روش‌شناسی استخراج ویژگی ساختاری برای تشخیص اسناد اداری مخرب با استفاده از روش های یادگیری ماشین

دانلود مقاله + سفارش ترجمه

دانلود مقاله ISI انگلیسی

رایگان برای ایرانیان

کلمات کلیدی

فراگیری ماشین؛ تشخیص نرم افزارهای مخرب. تحلیل استاتیک. ویژگی های ساختاری؛ XML باز مایکروسافت آفیس ؛ سند

Document - اسناد Static analysis - تجزیه و تحلیل استاتیک Malware detection - تشخیص بدافزار Structural features - ویژگی های ساختاری Machine learning - یادگیری ماشین

موضوعات مرتبط

مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی

پیش نمایش مقاله

SFEM: روش‌شناسی استخراج ویژگی ساختاری برای تشخیص اسناد اداری مخرب با استفاده از روش های یادگیری ماشین

چکیده انگلیسی

• SFEM is a novel structural feature extraction methodology for XML-Based documents.
• SFEM is static, lightweight, and fast - 37 ms for an average file (250 KB).
• SFEM is leveraged with machine learning for effective malicious document detection.
• Best configuration: Fisher Score, TFIDF, Top 200 features, and Random Forest.
• The best configuration provided: TPR = 0.97, FPR = 0.049, AUC = 0.9912.

Office documents are used extensively by individuals and organizations. Most users consider these documents safe for use. Unfortunately, Office documents can contain malicious components and perform harmful operations. Attackers increasingly take advantage of naive users and leverage Office documents in order to launch sophisticated advanced persistent threat (APT) and ransomware attacks. Recently, targeted cyber-attacks against organizations have been initiated with emails containing malicious attachments. Since most email servers do not allow the attachment of executable files to emails, attackers prefer to use of non-executable files (e.g., documents) for malicious purposes. Existing anti-virus engines primarily use signature-based detection methods, and therefore fail to detect new unknown malicious code which has been embedded in an Office document. Machine learning methods have been shown to be effective at detecting known and unknown malware in various domains, however, to the best of our knowledge, machine learning methods have not been used for the detection of malicious XML-based Office documents (*.docx, *.xlsx, *.pptx, *.odt, *.ods, etc.). In this paper we present a novel structural feature extraction methodology (SFEM) for XML-based Office documents. SFEM extracts discriminative features from documents, based on their structure. We leveraged SFEM’s features with machine learning algorithms for effective detection of malicious *.docx documents. We extensively evaluated SFEM with machine learning classifiers using a representative collection (16,938 *.docx documents collected "from the wild") which contains ∼4.9% malicious and ∼95.1% benign documents. We examined 1,600 unique configurations based on different combinations of feature extraction, feature selection, feature representation, top-feature selection methods, and machine learning classifiers. The results show that machine learning algorithms trained on features provided by SFEM successfully detect new unknown malicious *.docx documents. The Random Forest classifier achieves the highest detection rates, with an AUC of 99.12% and true positive rate (TPR) of 97% that is accompanied by a false positive rate (FPR) of 4.9%. In comparison, the best anti-virus engine achieves a TPR which is ∼25% lower.

ناشر

Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Expert Systems with Applications - Volume 63, 30 November 2016, Pages 324–343

نویسندگان

Aviad Cohen, Nir Nissim, Lior Rokach, Yuval Elovici,

علوم انسانی و هنر

فنی، مهندسی و علوم پایه

پزشکی و سلامت

بیو تکنولوژی

پذیرش سفارش ترجمه

دانلود رایگان مقاله ISI : SFEM: روش‌شناسی استخراج ویژگی ساختاری برای تشخیص اسناد اداری مخرب با استفاده از روش های یادگیری ماشین

دسترسی سریع

ارتباط

English Website