Fault diagnosis of chemical processes with incomplete observations: A comparative study

Article ID	Journal	Published Year	Pages	File Type
172074	Computers & Chemical Engineering	2016	13 Pages	PDF

Abstract

•A framework is proposed for data-driven fault diagnosis with incomplete observations.•The contribution index is introduced for feature selection to reduce computational burden.•Advantages and limitations of different methods are reported and discussed.•The redundancy ratio is proposed to assess the informative level of incomplete data and generalized the study.•Guidelines for the use of the most promising techniques are provided.

An important problem to be addressed by diagnostic systems in industrial applications is the estimation of faults with incomplete observations. This work discusses different approaches for handling missing data, and performance of data-driven fault diagnosis schemes. An exploiting classifier and combined methods were assessed in Tennessee–Eastman process, for which diverse incomplete observations were produced. The use of several indicators revealed the trade-off between performances of the different schemes. Support vector machines (SVM) and C4.5, combined with k-nearest neighbourhood (kNN), produce the highest robustness and accuracy, respectively. Bayesian networks (BN) and centroid appear as inappropriate options in terms of accuracy, while Gaussian naïve Bayes (GNB) is sensitive to imputation values. In addition, feature selection was explored for further performance enhancement, and the proposed contribution index showed promising results. Finally, an industrial case was studied to assess informative level of incomplete data in terms of the redundancy ratio and generalize the discussion.

Keywords

Fault diagnosis Missing data Classification Imputation Machine learning