The Normalized Risk-Averting Error Criterion for Avoiding Nonglobal Local Minima in Training Neural Networks
February 1, 2015
The convexification method for data fitting is capable of avoiding nonglobal local minima, but suffers from two shortcomings: The risk-averting error (RAE) criterion grows exponentially as its risk-sensitivity index λ increases, and the existing method of determining λ is often not effective. To eliminate these shortcomings, the normalized RAE (NRAE) is herein proposed. As NRAE is a monotone increasing function of RAE, the region without a nonglobal local minimum of NRAE expands as does that of RAE. However, NRAE does not grow unboundedly as does RAE. The performances of training with NRAE at a fixed λ are reported. Over a large range of the risk-sensitivity index, such training has a high rate of achieving a global or near global minimum starting with different initial weight vectors of the neural network under training. It is observed that at a large λ, the landscape of the NRAE is rather flat, which slows down the training to a halt. This observation motivates the development of the NRAE-MSE method that exploits the large region of an NRAE without a nonglobal local minimum and takes excursions from time to time for training with the standard mean squared error (MSE) to zero into a global or near global minimum. A number of examples of approximating functions that involve fine features or unevenly-sampled segments are used to test the method. Numerical experiments show that the NRAE-MSE training method has a success rate of 100% in all the testing trials for each example, all starting with randomly selected initial weights. The method is also applied to classifying numerals in the well-known MNIST dataset. The new training method outperforms other methods reported in the literature under the same operating conditions.
Downloads: 10 downloads