Overview of "The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound"

Keywords: Non-convex optimization, rank-constrained semidefinite programming, optimization on smooth manifolds

Overview: The so-called Burer-Monteiro method is an extremely popular heuristic for solving semidefinite programs (SDPs) in practice, and there has been much recent interest in obtaining a better theoretical understanding of its empirical success. The method, pioneered by Burer and Monteiro in the early 2000s [BM03, BM05], involves solving an SDP of the form

\[\begin{align} \label{eq:SDP} \tag{SDP} \begin{split} &\min_{X \in \mathbb{S}^{n \times n}} \langle C, X \rangle \\ \text{s.t. } &\langle B_i, X \rangle = b_i \text{ for all $1 \le i \le m$}, \\ &X \succeq 0, \end{split} \end{align}\]

by instead solving the factorized “reformulation”

\[\begin{align} \label{eq:BM} \tag{BM} \begin{split} &\min_{Y \in \mathbb{R}^{n \times p}} \langle C, YY^\top \rangle \\ \text{s.t. } &\langle B_i, YY^\top \rangle = b_i \text{ for all $1 \le i \le m$}. \\ \end{split} \end{align}\]

Here, $C, B_i \in \mathbb{S}^{n \times n}, b_i \in \mathbb{R}.$

If $\bar{Y} \in \mathbb{R}^{n \times p}$ is an optimum for (BM), the hope is that $\bar{Y} \bar{Y}^\top \in \mathbb{S}^{n \times n}$ is an optimum for (SDP). Thus, the idea is to obtain $\bar{Y}$ (and therefore $\bar{Y} \bar{Y}^\top$) by solving (BM) using nonlinear programming methods like ALM or manifold-based algorithms like Riemannian gradient descent or a Riemannian trust-region method. The advantage of solving (BM) instead of (SDP) is that when $p \ll n$, the search space of (BM) is much lower dimensional than that of (SDP), so solving (BM) yields significant time and memory savings. (Indeed, memory is a serious bottleneck for large-scale SDPs.)

However, going from (SDP) to (BM) can be viewed as adding a rank constraint ($\text{rank} (X) \le p$) to (SDP), so it is not a priori clear whether

the optimal value is preserved after introducing this rank constraint (in other words, whether there exists an optimal solution for (SDP) with rank at most $p$), and
whether (BM) can be solved to global optimality, since it is a non-convex problem.

In regards to the first barrier, Barvinok and Pataki [Bar95, Pat98] showed that if the feasible region of (SDP) is compact, then there is a global optimum of rank at most $\sqrt{2m}$. Thus, setting $p \ge \sqrt{2m}$ is provably “safe” in regards to the first barrier (and $\sqrt{2m} \ll n$ in the vast majority of applications). The second barrier is more serious, but it was shown in the seminal work of Boumal, Voroninski, and Bandeira [BVB16, BVB18] that when $p \ge \sqrt{2m}$, all second-order critical points of (BM) are globally optimal for generic cost matrices $C$. (This kind of result is often called “benign non-convexity.”)

Combining these results, for generic cost matrices and when $p \ge \sqrt{2m}$, algorithms exist which provably converge to a global optimum $\bar{Y}$ of (BM), for which $\bar{Y} \bar{Y}^\top$ is a global optimum of (SDP). A line of follow-up work [BBJN18, PJB18, CM21] gives polynomial-time convergence guarantees for smoothed instances. However, it was unknown whether this “generic cost matrices” caveat was even necessary. Indeed, the authors of [BVB18] comment (p. 22) that resolving whether this restriction is necessary is key to gaining a deeper understanding of the relationship between (SDP) and (BM).

Our main result (Theorem 1) shows that this restriction is necessary by constructing instances of (BM) with spurious local minima when $p$ is as large as $\Theta(m)$. We use a novel technique involving Riemannian gradient descent to prove local minimality, and furthermore, our construction is for one of the simplest and most canonical instances of (BM): that arising from the famed Goemans-Williamson Max-Cut SDP [GW95]. Our result also complements a nice result due to [BVB18] showing that when $p$ is any larger than the value used in our main construction, there are provably no spurious second-order critical points for any cost matrix. That is to say, we “complete” the story of how benign the non-convexity of (BM) is for different ranges of the parameter $p$.

At a high level, this work provides a more complete picture of the optimization landscape of (BM) (particularly the important instances arising from the Max-Cut SDP). Furthermore, it justifies the use of beyond worst-case paradigms like smoothed analysis to obtain global convergence guarantees for the Burer-Monteiro method.