Princeton University

School of Engineering & Applied Science

Rethinking the Science of Statistical Privacy

Changchang Liu
Prof. Mittal
Engineering Quadrangle J323
Monday, February 18, 2019 - 9:30am to 11:00am


Nowadays, more and more data, such as social network data, mobility data, business data, medical data, are shared or made public to enable real world applications. Such data is likely to contain sensitive information and thus needs to be obfuscated prior to release, to protect privacy. However, existing statistical data privacy mechanisms in the security community have several weaknesses: 1) they are limited in protecting sensitive information in the static scenario, and cannot be generally applied to accommodate temporal dynamics. With the increasing development of data science, a large amount of sensitive data such as personal social relationships are becoming public, making the privacy concerns of a time series of data more and more challenging; 2) these privacy mechanisms do not explicitly capture correlations, leaving open the possibility of inference attacks. In many real-world scenarios, the data tuple dependence/correlation occurs naturally in datasets due to social, behavioral and genetic interactions between users, violating assumptions made in prior work; 3) there are very few practical guidelines on how to apply existing statistical privacy notions in practice, and a key challenge is how to set an appropriate value for the privacy parameters.

In this talk, we aim to overcome these weaknesses to provide privacy guarantees for protecting dynamic data structures, dependent (correlated) data structures. We also aim to discover useful and interpretable guidelines for selecting proper values of parameters in the state-of-the-art privacy-preserving frameworks. Furthermore, we investigate how an auxiliary information in the form of prior distribution of the database and correlation across records and time can influence the proper choice of the privacy parameters. Specifically, we 1) first propose the design of a privacy-preserving system called LinkMirage, that mediates access to dynamic social relationships in social networks, while effectively supporting social graph-based data analytics; 2) explicitly incorporate structural properties of data into current differential privacy metrics and mechanisms, to enable privacy-preserving data analytics for dependent/correlated data; and 3) finally provide a quantitative analysis of how hypothesis testing can guide the choice of the privacy parameters in an interpretable manner for differential privacy and other statistical privacy frameworks.