ELE522: Large-Scale Optimization for Data Science
The term project can either be a literature review or include original research:
Literature review. We will provide a list of related papers not covered in the lectures, and the literature review should involve in-depth summaries and exposition of one of these papers.
Original research. It can be either theoretic or experimental (ideally a mix of the two), with approval from the instructor. If you choose this option, you can do it either individually or in groups of two. You are encouraged to combine your current research with your term project.
There are 2 milestones / deliverables to help you through the process.
Proposal (due Nov. 6). Submit a short report (no more than 1 page) stating the papers you plan to survey or the research problems that you plan to work on. Describe why they are important or interesting, and provide some appropriate references. If you elect to do original research, please do not propose an overly ambitious project that cannot be completed by the end of the semester, and do not be too lured by generality. Focus on the simplest scenarios that can capture the issues you’d like to address.
A written report (due Jan. 13). You are expected to submit a final project report – up to 5 pages with unlimited appendix—summarizing your findings. You must email an electronic copy to both the TA and me.
A few suggested (theoretical) papers for literature review (to be updated)
‘‘On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems,’’ T. Lin, J. Chi, and M. Jordan, 2019.
‘‘Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent,’’ A. Dalalyan, Conference on Learning Theory, 2017.
‘‘Variance-reduced Q-learning is minimax optimal,’’ M. Wainwright, 2019.
‘‘Stochastic approximation with cone-contractive operators: Sharp Linf-bounds for Q-learning,’’ M. Wainwright, 2019.
‘‘Sharp analysis for nonconvex sgd escaping from saddle points,’’ C. Fang, Z. Lin, and T. Zhang, 2019.
‘‘Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator,’’ C. Fang, C. Li, Z. Lin, T. Zhang, 2018.
‘‘Plug-and-Play Methods Provably Converge with Properly Trained Denoisers,’’ E. Ryu, J. Liu, S. Wang, X. Chen, Z. Wang, W. Yin, 2019.
‘‘Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds,’’ X. Chen, J. Liu, Z. Wang, W. Yin, 2018.
‘‘What Can ResNet Learn Efficiently, Going Beyond Kernels?’’ Z. Allen-Zhu, Y. Li, 2019.
‘‘Can SGD Learn Recurrent Neural Networks with Provable Generalization?’’ Z. Allen-Zhu, Y. Li, 2019.
‘‘Low-rank matrix recovery with composite optimization: good conditioning and rapid convergence,’’ V. Charisopoulos, Y. Chen, D. Davis, M. Diaz, L. Ding, D. Drusvyatskiy, 2019.
‘‘Stochastic methods for composite and weakly convex optimization problems,’’ J. Duchi, R. Feng, SIAM Journal on Optimization, 2018.
‘‘EXTRA: An Exact First-Order Algorithm for Decentralized Consensus Optimization,’’ W. Shi, Q. Ling, G. Wu, W. Yin, 2014.
‘‘Convergence Analysis of Alternating Direction Method of Multipliers for a Family of Nonconvex Problems, ’’ M. Hong, Z.Q. Luo, and M. Razaviyayn, 2016
‘‘How to Escape Saddle Points Efficiently, ’’ C. Jin, R. Ge, P. Netrapalli, S. Kakade, M. Jordan, 2017
‘‘Natasha 2: Faster Non-Convex Optimization Than SGD, ’’ Z. Allen-Zhu, 2017
‘‘A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights, ’’ W. Su, S. Boyd and E. J. Candes, 2015
‘‘A Geometric Alternative to Nesterov's Accelerated Gradient Descent, ’’ S. Bubeck, Y. T. Lee, M Singh, 2015
‘‘Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets, ’’ D. Garber, E. Hazan, 2014
‘‘The landscape of empirical risk for non-convex losses,’’ S. Mei, Y. Bai, and A. Montanari, 2016.
‘‘Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima, ’’ P. Loh and M. Wainwright, 2013.
‘‘Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion and Blind Deconvolution, ’’ C. Ma, K. Wang, Y. Chi, and Y. Chen, 2017
‘‘Adaptive restart for accelerated gradient schemes, ’’ B. O'Donoghue and E. Candes, 2015
‘‘Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solutions for Nonconvex Distributed Optimization, ’’ M. Hong, J. Lee, M Razaviyayn, 2018
‘‘Mirror descent in non-convex stochastic programming, ’’ Z. Zhou, P. Mertikopoulos, N. Bambos, S. Boyd, P. Glynn, 2017
‘‘Gradient Descent Can Take Exponential Time to Escape Saddle Points, ’’ S. Du, C. Jin, J. Lee, M. Jordan, B. Poczos, A. Singh, 2017
‘‘Accelerating Stochastic Gradient Descent, ’’ P. Jain, S. Kakade, R. Kidambi, P. Netrapalli, A. Sidford, 2017
‘‘Accelerated Gradient Methods for Nonconvex Nonlinear and Stochastic Programming, ’’ S. Ghadimi, G. Lan, 2013
‘‘On the Connection Between Learning Two-Layers Neural Networks and Tensor Decomposition, ’’ M. Mondelli, A. Montanari, 2018
‘‘Learning One-hidden-layer Neural Networks with Landscape Design, ’’ R. Ge, J. Lee, T. Ma, 2017
‘‘Gradient Descent Learns Linear Dynamical Systems, ’’ M. Hardt, T. Ma, B. Recht, 2016
‘‘Memory-efficient Kernel PCA via Partial Matrix Sampling and Nonconvex Optimization: a Model-free Analysis of Local Minima, ’’ J. Chen, X. Li, 2017
‘‘On the Sublinear Convergence of Randomly Perturbed Alternating Gradient Descent to Second Order Stationary Solutions, ’’ S. Lu, M. Hong, Z. Wang, 2018
‘‘On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization, ’’ S. Arora, N. Cohen, E. Hazan, 2018
‘‘Characterizing Implicit Bias in Terms of Optimization Geometry, ’’ S. Gunasekar, J. Lee, D. Soudry, N. Srebro, 2018
‘‘An Alternative View: When Does SGD Escape Local Minima? ’’ R. Kleinberg, Y. Li, Y. Yuan, 2018
‘‘Stochastic Cubic Regularization for Fast Nonconvex Optimization, ’’ N. Tripuraneni, M. Stern, C. Jin, J. Regier, M. Jordan
You have the freedom to select a paper of your own interest (especially more practical papers), upon the instructor's approval.
|