Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
552175 | Decision Support Systems | 2013 | 11 Pages |
An important task in multi-relational data mining is link-based classification which takes advantage of attributes of links and linked entities, to predict the class label. The relational Naive Bayes classifier exploits independence assumptions to achieve scalability. We introduce a weaker independence assumption to the effect that information from different data tables is independent given the class label. The independence assumption entails a closed-form formula for combining probabilistic predictions based on decision trees learned on different database tables. Logistic regression learns different weights for information from different tables and prunes irrelevant tables. In experiments, learning was very fast with competitive accuracy.
► A new statistical independence assumption for knowledge discovery in databases. ► Decision tree learning is applied to different tables in a database. ► The assumption entails a log-linear formula for combining single-table predictions. ► Logistic regression adaptively assigns weights to information from different tables. ► The classification system is accurate and very fast even for complex databases.