کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
415336 681201 2016 15 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Bayesian network data imputation with application to survival tree analysis
ترجمه فارسی عنوان
محاسبه داده های شبکه بیسین با استفاده از تجزیه و تحلیل درخت بقا
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


• Retrospective clinical datasets have often small sample size and many missing data.
• We use Bayesian networks to impute missing data enhancing survival tree analysis.
• The Bayesian network is learned from incomplete data and used for the imputation.
• Our method generally achieved more accurate predictions than widely used approaches.

Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation–maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Computational Statistics & Data Analysis - Volume 93, January 2016, Pages 373–387
نویسندگان
, , , , ,