Abstract

= PDF Reprint, = BibTeX entry, = Online Abstract

T. Furlanello, Z. C. Lipton, L. Itti, A. Anandkumar, Born Again Neural Networks, In: Metalearn 2017 NIPS workshop, pp. 1-5, Dec 2017. (Cited by 1246)

Abstract: Knowledge distillation techniques seek to transfer knowledge acquired by a learned teacher model to a new student model. In prior work, the teacher typically is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student’s compactness while suffering only minimal degradation in performance. In this paper, we revisit knowledge distillation but with a different objective. Rather than compressing models, we train students that are parameterized identically to their parents. Surprisingly, these born again networks (BANs), tend to outperform their teacher models. Our experiments with born again dense networks demonstrate state-of-the-art performance on the CIFAR-100 dataset reaching a validation error of 15.5% with a single model and 14.9% with our best ensemble. Additionally, we investigate knowledge transfer to architectures that are different, but with capacity comparable to their teachers. In these experiments, we show that similar advantages can be achieved by transferring knowledge between dense networks and residual networks of similar capacity.

Themes: Machine Learning