Recent years have witnessed increasing empirical successes in reinforcement learning (RL). However, many theoretical questions about RL were not well understood. For example, how many observations are necessary and sufficient for learning a good policy? How to learn to control using structural information with provable regret? In this talk, we discuss the statistical efficiency of RL, with and without structural information such as linear feature representation, and show how to algorithmically learn the optimal policy with nearly minimax-optimal complexity.
Complexity of RL algorithms largely depend on dimension of state features. Towards reducing the dimension of RL, we discuss a state embedding learning method that automatically learns state features and aggregation structures from trajectory data. We illustrate an application in clinical decision optimization.
Mengdi Wang is an associate professor at the Department of Operations Research and Financial Engineering at Princeton University. She is also affiliated with the Department of Computer Science and Princeton’s Center for Statistics and Machine Learning. Her research focuses on data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, the Google Faculty Award in 2017, and the MIT Tech Review 35-Under-35 Innovation Award (China region) in 2018. She is currently serving as an associate editor for Operations Research.