کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
397356 671181 2014 19 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Data preparation for KDD through automatic reasoning based on description logic
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر هوش مصنوعی
پیش نمایش صفحه اول مقاله
Data preparation for KDD through automatic reasoning based on description logic
چکیده انگلیسی


• The success of KDD largely depends on ad-hoc and costly tasks of data preparation.
• We proposed automatic data preparation mechanisms to support these tasks.
• The framework has been valuated statistically in several real KDD projects.
• It has demonstrated to save an average of 30% of the time in data preparation.
• It is capable of reducing time in every KDD projects of that uses XML data.

Without data preparation, data mining algorithms cannot operate on data within the knowledge discovery in databases (KDD) process. In fact, the success of later KDD phases largely depends on the data preparation stage. The use of mechanisms for automatically preparing data saves a lot of time and resources within the KDD process. These resources will then be available for use at later, less automatable stages, for example, during results interpretation.We have proposed a general-purpose mechanism applicable to multiple domains in order to improve the data preparation phase in the KDD process. This mechanism processes and automatically converts input data to a suitable format for the application of different data preparation techniques based on a known syntax. It is based on the use of description logic. Taking a generic UML2 data model as a reference, this mechanism is able to check whether any XML data source whatsoever can be transformed and modelled as a subsumption or instance of the above UML2 model. Thus it automatically identifies a consistent, non-ambiguous and finite set of XLST transformations which are used to prepare the data for the application of data mining techniques, obviating the need to expend resources on the preliminary preparation and formatting stage.The proposed mechanism was applied on structurally complex data from four different domains. In order to test the validity of the proposal, we have applied data mining techniques to extract knowledge from the prepared data. The sound results of applying our proposal to several different domains confirm that it is applicable to any XML data source, as well as being correct, computationally efficient and saving time during the data preparation phase.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Information Systems - Volume 44, August 2014, Pages 54–72
نویسندگان
, , , ,