Princeton University

School of Engineering & Applied Science

Hypothesis testing perspectives on generative adversarial networks and differential privacy

Sewoong Oh, UIUC
B205 Engineering Quadrangle
Thursday, March 15, 2018 - 4:30pm

We bring the tools from Blackwell’s seminal result on comparing two stochastic experiments from 1953, to shine new lights on two modern applications of great interest: Generative Adversarial Networks (GAN) and Differential Privacy (DP). Binary hypothesis testing is at the center of both applications, and we propose new data processing inequalities that allow us to discover new algorithms, provide sharper analyses, and provide simpler proofs. 
In the case of training GANs, two neural networks (a generator and a critic) are trained with competing objectives (hence the name adversarial networks). A generator aims to generate realistic samples, while a critic aims to detect whether a sample is real or fake. By jointly training with the critic, the generator will learn to fool the critic and generate realistic samples. One of the major challenges in GAN is known as “mode collapse”; the lack of diversity in the generated samples. We address mode collapse by proposing a new training framework, where the critic is fed with multiple samples jointly (which we call packing), as opposed to each sample separately as done in standard GAN training. With this simple but fundamental departure from existing GANs, experimental results show that the diversity of the generated samples improve significantly. We analyze this practical gain by first providing a formal mathematical definition of mode collapse and making a fundamental connection between the idea of packing and the intensity of mode collapse. The analyses critically rely on operational interpretation of hypothesis testing and corresponding data processing inequalities.
In the case of DP, we address one of the most fundamental questions in differential privacy: how much privacy is lost after k differentially private queries to a database? More privacy is lost as the same database is accessed multiple times, and it is of fundamental interest to characterize the level of privacy degradation. We introduce a new operational interpretation of DP, that provides a complete answer to this question. We give an exact characterization of how privacy degrades as a function of the individual privacy guarantee of each query. The key innovation is the introduction of an operational interpretation (involving hypothesis testing) to differential privacy and the use of the corresponding data processing inequalities.
Sewoong Oh is an Assistant Professor of Industrial and Enterprise Systems Engineering at UIUC. He received his PhD from the department of Electrical Engineering at Stanford University. Following his PhD, he worked as a postdoctoral researcher at Laboratory for Information and Decision Systems (LIDS) at MIT. His research interest is in theoretical machine learning, including spectral methods, ranking, crowdsourcing, estimation of information measures, differential privacy, and generative adversarial networks. He was co-awarded the best paper award at the SIGMETRICS in 2015, NSF CAREER award in 2016 and GOOGLE Faculty Research Award.