[edit]
Variational Information Distillation for Knowledge Transfer
Conference on Computer Vision and Pattern Recognition (CVPR):9155-9163, 2019.
Abstract
Transferring knowledge from a teacher neural network pretrained on
the same or a similar task to a student neural network can
significantly improve the performance of the student neural
network. Existing knowledge transfer approaches match the
activations or the corresponding hand-crafted features of the
teacher and the student networks. We propose an
information-theoretic framework for knowledge transfer which
formulates knowledge transfer as maximizing the mutual information
between the teacher and the student networks. We compare our method
with existing knowledge transfer methods on both knowledge
distillation and transfer learning tasks and show that our method
consistently outperforms existing methods. We further demonstrate
the strength of our method on knowledge transfer across
heterogeneous network architectures by transferring knowledge from a
convolutional neural network (CNN) to a multi-layer perceptron (MLP)
on CIFAR-10. The resulting MLP significantly outperforms
the-state-of-the-art methods and it achieves similar performance to
the CNN with a single convolutional layer.