Do Deep Nets Really Need To Be Deep?

Rich Caruana, Microsoft Research

Deep neural networks are the state of the art on problems such as speech recognition and computer vision.  Using a method called model compression, we show that shallow nets can learn the complex functions previously learned by deep nets and achieve accuracies previously only achievable with deep models while using the same number of parameters as the original deep models.  On the TIMIT phoneme recognition and CIFAR-10 image recognition tasks, shallow nets can be trained that perform similarly to complex, well-engineered, deeper convolutional architectures.  The same model compression trick also can be used to compress impractically large deep models and ensembles of large deep models down to small- or medium-size deep models that run more efficiently on mobile devices or servers. 

Link to Recording: