کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
246743 502387 2014 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Automatic clustering of construction project documents based on textual similarity
ترجمه فارسی عنوان
خوشه بندی خودکار اسناد پروژه ساخت و ساز بر اساس شباهت متنی
کلمات کلیدی
مدیریت اسناد، خوشه تک تک، روش های یادگیری تحت نظارت / غیر نظار
موضوعات مرتبط
مهندسی و علوم پایه سایر رشته های مهندسی مهندسی عمران و سازه
چکیده انگلیسی


• Hybrid approach for clustering semantically-related project documents is proposed.
• For clustering, inverse relationship between dimensionality & similarity threshold.
• tf–idf weighting method results in high precision, average recall outcomes.
• Refining clustering outcome using supervised learning improves accuracy.
• Textual similarities can be used to reveal semantic relations between documents.

Text classifiers, as supervised learning methods, require a comprehensive training set that covers all classes in order to classify new instances. This limits the use of text classifiers for organizing construction project documents since it is not guaranteed that sufficient samples are available for all possible document categories. To overcome the restriction imposed by the all-inclusive requirement, an unsupervised learning method was used to automatically cluster documents together based on textual similarities. Repeated evaluations using different randomizations of the dataset revealed a region of threshold/dimensionality values of consistently high precision values and average recall values. Accordingly, a hybrid approach was proposed which initially uses an unsupervised method to develop core clusters and then trains a text classifier on the core clusters to classify outlier documents in a consequent refinement step. Evaluation of the hybrid approach demonstrated a significant improvement in recall values, resulting in an overall increase in F-measure scores.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Automation in Construction - Volume 42, June 2014, Pages 36–49
نویسندگان
, ,