Variational Information Distillation for Knowledge Transfer

Sungsoo Ahn; Shell Xu Hu; Andreas Damianou; Neil D. Lawrence; Zhenwen Dai

doi:10.1109/CVPR.2019.00938

edit

Back to publications

Variational Information Distillation for Knowledge Transfer

Sungsoo Ahn, Shell Xu Hu, Andreas Damianou, Neil D. Lawrence, Zhenwen Dai

Conference on Computer Vision and Pattern Recognition (CVPR):9155-9163, 2019.

Abstract

Transferring knowledge from a teacher neural network pretrained on the same or a similar task to a student neural network can significantly improve the performance of the student neural network. Existing knowledge transfer approaches match the activations or the corresponding hand-crafted features of the teacher and the student networks. We propose an information-theoretic framework for knowledge transfer which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks. We compare our method with existing knowledge transfer methods on both knowledge distillation and transfer learning tasks and show that our method consistently outperforms existing methods. We further demonstrate the strength of our method on knowledge transfer across heterogeneous network architectures by transferring knowledge from a convolutional neural network (CNN) to a multi-layer perceptron (MLP) on CIFAR-10. The resulting MLP significantly outperforms the-state-of-the-art methods and it achieves similar performance to the CNN with a single convolutional layer.

Links

Cite this Paper

BibTeX


@InProceedings{variational-information-distillation-for-knowledge-transfer,
  title = 	 {Variational Information Distillation for Knowledge Transfer},
  author = 	 {Ahn, Sungsoo and Hu, Shell Xu and Damianou, Andreas and Lawrence, Neil D. and Dai, Zhenwen},
  booktitle = 	 {Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages = 	 {9155--9163},
  year = 	 {2019},
  address = 	 {Long Beach, CA},
  doi = 	 {10.1109/CVPR.2019.00938},
  pdf = 	 {https://openaccess.thecvf.com/content_CVPR_2019/papers/Ahn_Variational_Information_Distillation_for_Knowledge_Transfer_CVPR_2019_paper.pdf},
  url = 	 {/publications/variational-information-distillation-for-knowledge-transfer.html},
  abstract = 	 {Transferring knowledge from a teacher neural network pretrained on
the same or a similar task to a student neural network can
significantly improve the performance of the student neural
network. Existing knowledge transfer approaches match the
activations or the corresponding hand-crafted features of the
teacher and the student networks. We propose an
information-theoretic framework for knowledge transfer which
formulates knowledge transfer as maximizing the mutual information
between the teacher and the student networks. We compare our method
with existing knowledge transfer methods on both knowledge
distillation and transfer learning tasks and show that our method
consistently outperforms existing methods. We further demonstrate
the strength of our method on knowledge transfer across
heterogeneous network architectures by transferring knowledge from a
convolutional neural network (CNN) to a multi-layer perceptron (MLP)
on CIFAR-10. The resulting MLP significantly outperforms
the-state-of-the-art methods and it achieves similar performance to
the CNN with a single convolutional layer.
}
}

Endnote

%0 Conference Paper
%T Variational Information Distillation for Knowledge Transfer
%A Sungsoo Ahn
%A Shell Xu Hu
%A Andreas Damianou
%A Neil D. Lawrence
%A Zhenwen Dai
%B Conference on Computer Vision and Pattern Recognition (CVPR)
%D 2019	
%F variational-information-distillation-for-knowledge-transfer
%P 9155--9163
%R 10.1109/CVPR.2019.00938
%U /publications/variational-information-distillation-for-knowledge-transfer.html
%X Transferring knowledge from a teacher neural network pretrained on
the same or a similar task to a student neural network can
significantly improve the performance of the student neural
network. Existing knowledge transfer approaches match the
activations or the corresponding hand-crafted features of the
teacher and the student networks. We propose an
information-theoretic framework for knowledge transfer which
formulates knowledge transfer as maximizing the mutual information
between the teacher and the student networks. We compare our method
with existing knowledge transfer methods on both knowledge
distillation and transfer learning tasks and show that our method
consistently outperforms existing methods. We further demonstrate
the strength of our method on knowledge transfer across
heterogeneous network architectures by transferring knowledge from a
convolutional neural network (CNN) to a multi-layer perceptron (MLP)
on CIFAR-10. The resulting MLP significantly outperforms
the-state-of-the-art methods and it achieves similar performance to
the CNN with a single convolutional layer.

RIS


TY  - CPAPER
TI  - Variational Information Distillation for Knowledge Transfer
AU  - Sungsoo Ahn
AU  - Shell Xu Hu
AU  - Andreas Damianou
AU  - Neil D. Lawrence
AU  - Zhenwen Dai
BT  - Conference on Computer Vision and Pattern Recognition (CVPR)
DA  - 2019/06/15	
ID  - variational-information-distillation-for-knowledge-transfer
SP  - 9155
EP  - 9163
DO  - 10.1109/CVPR.2019.00938
L1  - https://openaccess.thecvf.com/content_CVPR_2019/papers/Ahn_Variational_Information_Distillation_for_Knowledge_Transfer_CVPR_2019_paper.pdf
UR  - /publications/variational-information-distillation-for-knowledge-transfer.html
AB  - Transferring knowledge from a teacher neural network pretrained on
the same or a similar task to a student neural network can
significantly improve the performance of the student neural
network. Existing knowledge transfer approaches match the
activations or the corresponding hand-crafted features of the
teacher and the student networks. We propose an
information-theoretic framework for knowledge transfer which
formulates knowledge transfer as maximizing the mutual information
between the teacher and the student networks. We compare our method
with existing knowledge transfer methods on both knowledge
distillation and transfer learning tasks and show that our method
consistently outperforms existing methods. We further demonstrate
the strength of our method on knowledge transfer across
heterogeneous network architectures by transferring knowledge from a
convolutional neural network (CNN) to a multi-layer perceptron (MLP)
on CIFAR-10. The resulting MLP significantly outperforms
the-state-of-the-art methods and it achieves similar performance to
the CNN with a single convolutional layer.

ER  -

APA


Ahn, S., Hu, S.X., Damianou, A., Lawrence, N.D. & Dai, Z.. (2019). Variational Information Distillation for Knowledge Transfer. Conference on Computer Vision and Pattern Recognition (CVPR):9155-9163 doi:10.1109/CVPR.2019.00938 Available from /publications/variational-information-distillation-for-knowledge-transfer.html.