Roles of pre-training in deep neural networks from information theoretical perspective

Article ID	Journal	Published Year	Pages	File Type
4947397	Neurocomputing	2017	15 Pages	PDF

Abstract

Although deep learning shows high performance in pattern recognition and machine learning, the reasons remain unclarified. To tackle this problem, we calculated the information theoretical variables of the representations in the hidden layers and analyzed their relationship to the performance. We found that entropy and mutual information, both of which decrease in a different way as the layer deepens, are related to the generalization errors after fine-tuning. This suggests that the information theoretical variables might be a criterion for determining the number of layers in deep learning without fine-tuning that requires high computational loads.

Keywords

Deep neural networks Information theory Pre-training