Article ID Journal Published Year Pages File Type
552175 Decision Support Systems 2013 11 Pages PDF
Abstract

An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational Naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the effect that information from different data tables is independent given the class label. The independence assumption entails a closed-form formula for combining probabilistic predictions based on decision trees learned on different database tables. Logistic regression learns different weights for information from different tables and prunes irrelevant tables. In experiments, learning was very fast with competitive accuracy.

► A new statistical independence assumption for knowledge discovery in databases. ► Decision tree learning is applied to different tables in a database. ► The assumption entails a log-linear formula for combining single-table predictions. ► Logistic regression adaptively assigns weights to information from different tables. ► The classification system is accurate and very fast even for complex databases.

Related Topics
Physical Sciences and Engineering Computer Science Information Systems
Authors
, , , , ,