STAT9910-303: Large-Scale Optimization for Data Science

Yuxin Chen, University of Pennsylvania, Fall 2023

The term project can either be a literature review or include original research:

Literature review. We will provide a list of related papers not covered in the lectures, and the literature review should involve in-depth summaries and exposition of one of these papers.
Original research. It can be either theoretic or experimental (ideally a mix of the two), with approval from the instructor. If you choose this option, you can do it either individually or in groups of two. You are encouraged to combine your current research with your term project.

There are 2 milestones / deliverables to help you through the process.

Proposal (due Nov. 12). Submit a short report (no more than 1 page) stating the papers you plan to survey or the research problems that you plan to work on. Describe why they are important or interesting, and provide some appropriate references. If you elect to do original research, please do not propose an overly ambitious project that cannot be completed by the end of the semester, and do not be too lured by generality. Focus on the simplest scenarios that can capture the issues you’d like to address.
A written report (due Dec. 15). You are expected to submit a final project report – up to 5 pages with unlimited appendix—summarizing your findings.

A few suggested (theoretical) papers for literature review (to be updated)

‘‘On the theory of policy gradient methods: Optimality, approximation, and distribution shift,’’ A. Agarwal, S. Kakade, J. Lee, G. Mahajan, Journal of Machine Learning Research, 2021.
‘‘Fast global convergence of natural policy gradient methods with entropy regularization,’’ S. Cen, C. Cheng, Y. Chen, Y. Wei, Y. Chi, Operations Research, 2022.
‘‘Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes,’ G. Lan, Mathematical Programming, 2023.
‘‘Optimization, learning, and games with predictable sequences,’’ A. Rakhlin, K Sridharan, NeurIPS 2013.
‘‘On the convergence of fedavg on non-iid data,’’ X. Li, K. Huang, W. Yang, S. Wang, Z. Zhang, ICLR 2019.
‘‘On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems,’’ T. Lin, J. Chi, and M. Jordan, 2019.
‘‘On the convergence of Adam and beyond,’’ S. Reddi, S. Kale, S. Kumar, 2019.
‘‘On-Demand Sampling: Learning Optimally from Multiple Distributions,’’ N. Haghtalab, M. Jordan, E. Zhao, NeurIPS 2022.
‘‘Minimax Regret Optimization for Robust Machine Learning under Distribution Shift,’’ A. Agarwal, T. Zhang, 2022.
‘‘FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data,’’ X. Zhang, M. Hong, S. Dhople, W. Yin, Y. Liu, IEEE Transactions on Signal Processing, 2021.
‘‘Bipartite Matching in Nearly-linear Time on Moderately Dense Graphs,’’ J. van den Brand, Y. T. Lee, D. Nanongkai, R. Peng, T. Saranurak, A. Sidford, Z. Song, D. Wang, FOCS 2020.
‘‘A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic,’’ M. Hong, H. Wai, Z. Wang, Z. Yang, SIAM Journal on Optimization, 2023.
‘‘Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent,’’ A. Dalalyan, Conference on Learning Theory, 2017.
‘‘Sharp analysis for nonconvex sgd escaping from saddle points,’’ C. Fang, Z. Lin, and T. Zhang, 2019.
‘‘Stochastic model-based minimization of weakly convex functions,’’ D. Davis, D. Drusvyatskiy, SIAM Journal on Optimization, 2019.
‘‘Stochastic methods for composite and weakly convex optimization problems,’’ J. Duchi, R. Feng, SIAM Journal on Optimization, 2018.
‘‘EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization,’’ W. Shi, Q. Ling, G. Wu, W. Yin, 2014.
‘‘Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems, ’’ M. Hong, Z.Q. Luo, and M. Razaviyayn, 2016
‘‘A Geometric Alternative to Nesterov's Accelerated Gradient Descent, ’’ S. Bubeck, Y. T. Lee, M Singh, 2015
‘‘Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution, ’’ C. Ma, K. Wang, Y. Chi, and Y. Chen, Foundations of Computational Mathematics, 2020
‘‘Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization, ’’ M. Hong, J. Lee, M Razaviyayn, 2018
‘‘Mirror descent in non-convex stochastic programming, ’’ Z. Zhou, P. Mertikopoulos, N. Bambos, S. Boyd, P. Glynn, 2017
‘‘Gradient Descent Can Take Exponential Time to Escape Saddle Points, ’’ S. Du, C. Jin, J. Lee, M. Jordan, B. Poczos, A. Singh, 2017
‘‘Accelerating Stochastic Gradient Descent, ’’ P. Jain, S. Kakade, R. Kidambi, P. Netrapalli, A. Sidford, 2017

You have the freedom to select a paper of your own interest (especially more practical papers), upon the instructor's approval.