Princeton University

School of Engineering & Applied Science

Data Science with Universal Probability

Young-Han Kim, University of California, San Diego
B205 Engineering Quadrangle
Monday, November 12, 2018 - 4:30pm


As Laplace famously asked “What is the probability that the sun will rise tomorrow?,” inferring the probabilities underlying a given data is at the heart of many data science problems. In this talk, I will explore how to assign probabilities to data such that its unknown distribution is uniformly approximated without overfitting the data. I will develop the mathematical foundation of this theory of universal probability which traces back to Rissanen, to Ziv and Lempel, and even to Laplace, and present a general framework for addressing many data science problems in a unified manner. To illustrate this framework, I will present two practical applications in nucleotide sequence classification and grayscale image denoising.



Young-Han Kim received his B.S. degree in Electrical Engineering from Seoul National University in 1996 and his Ph.D. degree in Electrical Engineering (M.S. degrees in Statistics and in Electrical Engineering) from Stanford University in 2006. Since then he has been on the faculty of the Department of Electrical and Computer Engineering at the University of California, San Diego, where he is currently a Professor. His research contributions have been in information theory, communication engineering, and data science. He has coauthored a highly cited textbook ``Network Information Theory'' (Cambridge University Press, 2011) and a recent monograph ``Fundamentals of Index Coding'' (Now Publishers, 2018). He has received several awards and honors for his contributions, including the NSF CAREER Award (2008), the US-Israel BSF Bergmann Memorial Award (2009), the IEEE Information Theory Paper Award (2012), and the first IEEE James L. Massey Research and Teaching Award (2015). He is a fellow of the IEEE.



This seminar is supported with funds from the Korhammer Lecture Series