Convexification and Deconvexification for Training Artificial Neural Networks
May 1, 2016
The purpose of this dissertation research is to overcome a fundamental problem in the theory and application of artificial neural networks (ANNs). The problem, called the local minimum problem in training ANNs, has plagued the ANN community since the middle of 1980s. ANNs trained with backpropagation are extensively utilized to solve various tasks in artificial intelligence fields for decades. The computing power of ANNs is derived through its particularly distributed structure together with the capability to learn and to generalize. However, application and further development of ANNs have been impeded by the local minimum problem and attracted much attention for a very long time.
To represent high-level abstractions, ANNs with deep architectures, called deep neural networks (DNNs), have recently been extensively studied. Training DNNs involves significantly intricate difficulties, prompting the development of the method of unsupervised layer-wise pre-training and many ingenious heuristic techniques. Although such methods and techniques have produced impressive results in solving famous machine learning tasks, more serious problems originated from the essence of local minimum in high-dimensional non-convex optimization still remain.
A primary difficulty of solving the local minimum problem lies in the intrinsic nonconvexity of the training criteria of the ANNs, which usually contain a large number of non-global local minima in the weight space of the ANNs. As the standard optimization methods perform local search in the parameter (e.g., weight) space, they cannot consistently guarantee a satisfactory performance of the resultant ANN even with a large number of training sessions. Although an enormous amount of solutions have been developed to optimize the free parameters of the objective function for consistently achieving a better optimum, these methods or algorithms are unable to solve the local minimum problem essentially with the intricate presence of the non-convex function.
To alleviate the fundamental difficulty of the local minimum problem in training ANNs, this dissertation proposes a series of methodologies by applying convexification and deconvexification to avoid non-global local minima and achieve the global or near-global minima with satisfactory optimization and generalization performances. These methodologies are developed based on a normalized risk-averting error (NRAE) criterion. The use of this criterion removes the practical difficulty of computational overflow and ill-initialization existed in the risk-averting error criterion, which was the predecessor of the NRAE criterion and has benefits to effectively handle non-global local minima by convexifying the non-convex error space.
The effectiveness of the method based on the NRAE criterion is evaluated in training multilayer perceptrons (MLPs) for function approximation tasks, demonstrating the optimization advantage as compared to training with the standard mean squared error criterion. Moreover, numerical experiments also illustrate that the NRAE-based training methods applied to train DNNs, such as convolutional neural networks and deep MLPs, for recognizing handwritten digits in the MNIST dataset achieve better optimization and generalization results than many benchmark performances. At last, to enhance the generalization of the ANNs obtained with the NRAE-based training, a statistical pruning method that prunes redundant connections of the ANNs is implemented and confirmed for further improving the generalization ability of the ANNs trained by the NRAE criterion.
PhdThesis
University of Maryland, Baltimore County
Downloads: 361 downloads