Article ID Journal Published Year Pages File Type
483484 Informatics in Medicine Unlocked 2015 8 Pages PDF
Abstract

•Recursive-Rule eXtraction algorithm with J48graft (Re-RX with J48graft) was employed.•Extracted concise and interpretable classification rules for breast cancer diagnosis.•Obtained 10-fold CV accuracy, number of rules, average number of antecedents for WBCD.•Compared and investigated the characteristics of the rule set by Re-RX with J48graft.•Expected greatly aid physicians in making accurate and concise diagnoses for patients.

To assist physicians in the diagnosis of breast cancer and thereby improve survival, a highly accurate computer-aided diagnostic system is necessary. Although various machine learning and data mining approaches have been devised to increase diagnostic accuracy, most current methods are inadequate. The recently developed Recursive-Rule eXtraction (Re-RX) algorithm provides a hierarchical, recursive consideration of discrete variables prior to analysis of continuous data, and can generate classification rules that have been trained on the basis of both discrete and continuous attributes. The objective of this study was to extract highly accurate, concise, and interpretable classification rules for diagnosis using the Re-RX algorithm with J48graft, a class for generating a grafted C4.5 decision tree. We used the Wisconsin Breast Cancer Dataset (WBCD). Nine research groups provided 10 kinds of highly accurate concrete classification rules for the WBCD. We compared the accuracy and characteristics of the rule set for the WBCD generated using the Re-RX algorithm with J48graft with five rule sets obtained using 10-fold cross validation (CV). We trained the WBCD using the Re-RX algorithm with J48graft and the average classification accuracies of 10 runs of 10-fold CV for the training and test datasets, the number of extracted rules, and the average number of antecedents for the WBCD. Compared with other rule extraction algorithms, the Re-RX algorithm with J48graft resulted in a lower average number of rules for diagnosing breast cancer, which is a substantial advantage. It also provided the lowest average number of antecedents per rule. These features are expected to greatly aid physicians in making accurate and concise diagnoses for patients with breast cancer.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science (General)
Authors
, ,