Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature

Article ID	Journal	Published Year	Pages	File Type
404661	Neural Networks	2008	11 Pages	PDF

Abstract

Multi-stage feed-forward neural network (NN) learning with sigmoidal-shaped hidden-node functions is implicitly constrained optimization featuring negative curvature. Our analyses on the Hessian matrix H of the sum-squared-error measure highlight the following intriguing findings: At an early stage of learning, H tends to be indefinite and much better-conditioned than the Gauss–Newton Hessian JTJ. The NN structure influences the indefiniteness and rank of H. Exploiting negative curvature leads to effective learning. All these can be numerically confirmed owing to our stagewise second-order backpropagation; the systematic procedure exploits NN’s “layered symmetry” to compute H efficiently, making exact Hessian evaluation feasible for fairly large practical problems.

Keywords

Negative curvature Trust-region methods