ELE522: Large-Scale Optimization for Data Science

Yuxin Chen, Princeton University, Fall 2019

The term project can either be a literature review or include original research:

  • Literature review. We will provide a list of related papers not covered in the lectures, and the literature review should involve in-depth summaries and exposition of one of these papers.

  • Original research. It can be either theoretic or experimental (ideally a mix of the two), with approval from the instructor. If you choose this option, you can do it either individually or in groups of two. You are encouraged to combine your current research with your term project.

There are 2 milestones / deliverables to help you through the process.

  1. Proposal (due Nov. 6). Submit a short report (no more than 1 page) stating the papers you plan to survey or the research problems that you plan to work on. Describe why they are important or interesting, and provide some appropriate references. If you elect to do original research, please do not propose an overly ambitious project that cannot be completed by the end of the semester, and do not be too lured by generality. Focus on the simplest scenarios that can capture the issues you’d like to address.

  2. A written report (due Jan. 13). You are expected to submit a final project report – up to 5 pages with unlimited appendix—summarizing your findings. You must email an electronic copy to both the TA and me.

A few suggested (theoretical) papers for literature review (to be updated)

  1. ‘‘On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems,’’ T. Lin, J. Chi, and M. Jordan, 2019.

  2. ‘‘Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent,’’ A. Dalalyan, Conference on Learning Theory, 2017.

  3. ‘‘Variance-reduced Q-learning is minimax optimal,’’ M. Wainwright, 2019.

  4. ‘‘Stochastic approximation with cone-contractive operators: Sharp Linf-bounds for Q-learning,’’ M. Wainwright, 2019.

  5. ‘‘Sharp analysis for nonconvex sgd escaping from saddle points,’’ C. Fang, Z. Lin, and T. Zhang, 2019.

  6. ‘‘Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator,’’ C. Fang, C. Li, Z. Lin, T. Zhang, 2018.

  7. ‘‘Plug-and-Play Methods Provably Converge with Properly Trained Denoisers,’’ E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, W. Yin, 2019.

  8. ‘‘Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds,’’ X. Chen, J. Liu, Z. Wang, W. Yin, 2018.

  9. ‘‘What Can ResNet Learn Efficiently, Going Beyond Kernels?’’ Z. Allen-Zhu, Y. Li, 2019.

  10. ‘‘Can SGD Learn Recurrent Neural Networks with Provable Generalization?’’ Z. Allen-Zhu, Y. Li, 2019.

  11. ‘‘Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence,’’ V. Charisopoulos, Y. Chen, D. Davis, M. Diaz, L. Ding, D. Drusvyatskiy, 2019.

  12. ‘‘Stochastic methods for composite and weakly convex optimization problems,’’ J. Duchi, R. Feng, SIAM Journal on Optimization, 2018.

  13. ‘‘EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization,’’ W. Shi, Q. Ling, G. Wu, W. Yin, 2014.

  14. ‘‘Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems, ’’ M. Hong, Z.Q. Luo, and M. Razaviyayn, 2016

  15. ‘‘How to Escape Saddle Points Efficiently, ’’ C. Jin, R. Ge, P. Netrapalli, S. Kakade, M. Jordan, 2017

  16. ‘‘Natasha 2: Faster Non-Convex Optimization Than SGD, ’’ Z. Allen-Zhu, 2017

  17. ‘‘A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights, ’’ W. Su, S. Boyd and E. J. Candes, 2015

  18. ‘‘A Geometric Alternative to Nesterov's Accelerated Gradient Descent, ’’ S. Bubeck, Y. T. Lee, M Singh, 2015

  19. ‘‘Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets, ’’ D. Garber, E. Hazan, 2014

  20. ‘‘The landscape of empirical risk for non-convex losses,’’ S. Mei, Y. Bai, and A. Montanari, 2016.

  21. ‘‘Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, ’’ P. Loh and M. Wainwright, 2013.

  22. ‘‘Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution, ’’ C. Ma, K. Wang, Y. Chi, and Y. Chen, 2017

  23. ‘‘Adaptive restart for accelerated gradient schemes, ’’ B. O'Donoghue and E. Candes, 2015

  24. ‘‘Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization, ’’ M. Hong, J. Lee, M Razaviyayn, 2018

  25. ‘‘Mirror descent in non-convex stochastic programming, ’’ Z. Zhou, P. Mertikopoulos, N. Bambos, S. Boyd, P. Glynn, 2017

  26. ‘‘Gradient Descent Can Take Exponential Time to Escape Saddle Points, ’’ S. Du, C. Jin, J. Lee, M. Jordan, B. Poczos, A. Singh, 2017

  27. ‘‘Accelerating Stochastic Gradient Descent, ’’ P. Jain, S. Kakade, R. Kidambi, P. Netrapalli, A. Sidford, 2017

  28. ‘‘Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming, ’’ S. Ghadimi, G. Lan, 2013

  29. ‘‘On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition, ’’ M. Mondelli, A. Montanari, 2018

  30. ‘‘Learning One-hidden-layer Neural Networks with Landscape Design, ’’ R. Ge, J. Lee, T. Ma, 2017

  31. ‘‘Gradient Descent Learns Linear Dynamical Systems, ’’ M. Hardt, T. Ma, B. Recht, 2016

  32. ‘‘Memory-efficient Kernel PCA via Partial Matrix Sampling and Nonconvex Optimization: a Model-free Analysis of Local Minima, ’’ J. Chen, X. Li, 2017

  33. ‘‘On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions, ’’ S. Lu, M. Hong, Z. Wang, 2018

  34. ‘‘On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization, ’’ S. Arora, N. Cohen, E. Hazan, 2018

  35. ‘‘Characterizing Implicit Bias in Terms of Optimization Geometry, ’’ S. Gunasekar, J. Lee, D. Soudry, N. Srebro, 2018

  36. ‘‘An Alternative View: When Does SGD Escape Local Minima? ’’ R. Kleinberg, Y. Li, Y. Yuan, 2018

  37. ‘‘Stochastic Cubic Regularization for Fast Nonconvex Optimization, ’’ N. Tripuraneni, M. Stern, C. Jin, J. Regier, M. Jordan

You have the freedom to select a paper of your own interest (especially more practical papers), upon the instructor's approval.