Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
233565 | Minerals Engineering | 2012 | 16 Pages |
Better understanding of process phenomena is dependent on the interpretation of models capturing the relationships between the process variables. Although linear regression is used routinely in the mineral process industries for this purpose, it may not be useful where the relationships between variables are nonlinear or complex. Under these circumstances, nonlinear methods, such as neural networks or decision trees can be used to develop reliable models, without necessarily giving any particular or explicit insight into the relationships between the process and the target variables. This is a major drawback in situations where such information would be very important, such as in fault identification or gaining a better understanding of the fundamentals of a process.In this paper, the use of variable importance measures and partial dependency plots generated by random forest models are proposed as a practical tool that can be used to surmount this problem. In particular, it is shown that important variables can be flagged by appropriate threshold generated by inclusion of dummy variables in the system. Moreover, the results of the study indicate that random forest models can reliably identify the influence of individual variables, even in the presence of high levels of additive noise. This would make it a useful tool in continuous process improvement and root cause analysis of abnormal process behaviour.
Graphical abstractVariable importance measures derived from a random forest model of the throughput of a calcium carbide furnace depending on nine process variables. The dummy variable (No. 10) is shown in red, with the dashed red line indicating the upper 95% confidence limit of the significance of the process variables.Figure optionsDownload full-size imageDownload as PowerPoint slideHighlights► Random forest models can be used to interpret complex process or plant data. ► With dummy variables, the significance of explanatory variables can be assessed. ► Reliable analysis is possible, despite significant additive noise in the data.