Princeton University

School of Engineering & Applied Science

Efficient Methods and Hardware for Deep Learning

Speaker: 
Song Han, Stanford University
Location: 
E-Quad, B205
Date/Time: 
Monday, April 3, 2017 - 4:30pm

Abstract:
Deep neural networks have evolved to be the state-of-the-art technique for machine-learning tasks. However, running such neural network is both computationally intensive and memory intensive, making it difficult to deploy on embedded systems or data center with tight power budget. To address this limitation, this talk presents an algorithm and hardware co-design methodology for improving the efficiency of deep learning.
 
Starting with the algorithm, this talk introduces “Deep Compression” that can compress the deep neural network models by 10x-49x without loss of prediction accuracy for a broad range of CNN, RNN, and LSTMs, which saves both memory and computation. Followed by changing the hardware architecture and efficiently implement deep compression, this talk introduces EIE, the "Efficient Inference Engine” that can do decompression and inference simultaneously, which significantly saves memory bandwidth. Taking advantage of the compressed model, and being able to deal with the irregular computation pattern efficiently, EIE achieves 13x speedup and 3000x better energy efficient over GPU. Finally, this talk closes the loop by revisiting the inefficiencies in current learning algorithms and proposes DSD training, and discuss the challenges and future work of efficient methods and hardware for deep learning.
 
Bio:
Song Han is a Ph.D. candidate with Prof. Bill Dally at Stanford University. His research focuses on energy-efficient deep learning computing, at the intersection between machine learning and computer architecture. He proposed Deep Compression that can compress state-of-the-art CNNs by 10x-49x while fully preserving prediction accuracy. He designed EIE: Efficient Inference Engine, a hardware accelerator that can make inference directly on the compressed sparse model, which gives significant speedup and energy saving. His work has been covered by TheNextPlatform, TechEmergence, Embedded Vision and O’Reilly. His work received the Best Paper Award in ICLR’16,  Best Paper Honorable Mention in NIPS'16 EMDNN workshop, and Best Paper Award in FPGA’17. Before joining Stanford, Song graduated from Tsinghua University.