Princeton University

School of Engineering & Applied Science

Optimization Techniques for Data Analysis

Ming Fai Felix Wong
Engineering Quadrangle B327
Friday, January 30, 2015 - 12:00pm to 1:30pm

As new sources of large-scale data with increasing volume and complexity are being created, finding scalable ways to gain insights from unstructured big data has become a big challenge. Furthermore, our big data challenge is exacerbated by the data often being noisy, sparse and heterogeneous. This dissertation illustrates, through three studies, the benefits of using an optimization framework to devise methods for big data analytics.

We study three problems in computational political science, finance and recommender systems, analyzing a wide range of data, including time series, text, ratings and social networks. First, we propose an inference technique to quantify the political leaning of Twitter users based on the patterns of how they get retweeted. We apply the technique to Twitter data collected during the U.S. presidential election of 2012. Second, we propose a joint latent space model for stock price movements and word usage patterns in newspapers. We apply the model and develop an algorithm to predict stock closing prices of a given day using the full text of The Wall Street Journal of the morning. Finally, we study the fundamental question of evaluating the quality of a social recommender network. We propose a pair of metrics to quantify (a) a network’s efficiency in disseminating recommendations and (b) the quality of a user’s neighbors in the network. Then we empirically study their tradeoff on Yelp data, and devise an algorithm to improve a network’s quality through friend recommendation and news feed curation.