Keywords: Non-convex optimization, rank-constrained semidefinite programming, optimization on smooth manifolds

Overview: The so-called Burer-Monteiro method is an extremely popular heuristic for solving semidefinite programs (SDPs) in practice, and there has been much recent interest in obtaining a better theoretical understanding of its empirical success. The method, pioneered by Burer and Monteiro in the early 2000s [BM03, BM05], involves solving an SDP of the form

\[\begin{align} \label{eq:SDP} \tag{SDP} \begin{split} &\min_{X \in \mathbb{S}^{n \times n}} \langle C, X \rangle \\ \text{s.t. } &\langle B_i, X \rangle = b_i \text{ for all $1 \le i \le m$}, \\ &X \succeq 0, \end{split} \end{align}\]

by instead solving the factorized “reformulation”

\[\begin{align} \label{eq:BM} \tag{BM} \begin{split} &\min_{Y \in \mathbb{R}^{n \times p}} \langle C, YY^\top \rangle \\ \text{s.t. } &\langle B_i, YY^\top \rangle = b_i \text{ for all $1 \le i \le m$}. \\ \end{split} \end{align}\]

Here, \(C, B_i \in \mathbb{S}^{n \times n}, b_i \in \mathbb{R}.\)

If \(\bar{Y} \in \mathbb{R}^{n \times p}\) is an optimum for (BM), the hope is that \(\bar{Y} \bar{Y}^\top \in \mathbb{S}^{n \times n}\) is an optimum for (SDP). Thus, the idea is to obtain \(\bar{Y}\) (and therefore \(\bar{Y} \bar{Y}^\top\)) by solving (BM) using nonlinear programming methods like ALM or manifold-based algorithms like Riemannian gradient descent or a Riemannian trust-region method. The advantage of solving (BM) instead of (SDP) is that when \(p \ll n\), the search space of (BM) is much lower dimensional than that of (SDP), so solving (BM) yields significant time and memory savings. (Indeed, memory is a serious bottleneck for large-scale SDPs.)

However, going from (SDP) to (BM) can be viewed as adding a rank constraint (\(\text{rank} (X) \le p\)) to (SDP), so it is not a priori clear whether

  1. the optimal value is preserved after introducing this rank constraint (in other words, whether there exists an optimal solution for (SDP) with rank at most \(p\)), and
  2. whether (BM) can be solved to global optimality, since it is a non-convex problem.

In regards to the first barrier, Barvinok and Pataki [Bar95, Pat98] showed that if the feasible region of (SDP) is compact, then there is a global optimum of rank at most \(\sqrt{2m}\). Thus, setting \(p \ge \sqrt{2m}\) is provably “safe” in regards to the first barrier (and \(\sqrt{2m} \ll n\) in the vast majority of applications). The second barrier is more serious, but it was shown in the seminal work of Boumal, Voroninski, and Bandeira [BVB16, BVB18] that when \(p \ge \sqrt{2m}\), all second-order critical points of (BM) are globally optimal for generic cost matrices \(C\). (This kind of result is often called “benign non-convexity.”)

Combining these results, for generic cost matrices and when \(p \ge \sqrt{2m}\), algorithms exist which provably converge to a global optimum \(\bar{Y}\) of (BM), for which \(\bar{Y} \bar{Y}^\top\) is a global optimum of (SDP). A line of follow-up work [BBJN18, PJB18, CM21] gives polynomial-time convergence guarantees for smoothed instances. However, it was unknown whether this “generic cost matrices” caveat was even necessary. Indeed, the authors of [BVB18] comment (p. 22) that resolving whether this restriction is necessary is key to gaining a deeper understanding of the relationship between (SDP) and (BM).

Our main result (Theorem 1) shows that this restriction is necessary by constructing instances of (BM) with spurious local minima when \(p\) is as large as \(\Theta(m)\). We use a novel technique involving Riemannian gradient descent to prove local minimality, and furthermore, our construction is for one of the simplest and most canonical instances of (BM): that arising from the famed Goemans-Williamson Max-Cut SDP [GW95]. Our result also complements a nice result due to [BVB18] showing that when \(p\) is any larger than the value used in our main construction, there are provably no spurious second-order critical points for any cost matrix. That is to say, we “complete” the story of how benign the non-convexity of (BM) is for different ranges of the parameter \(p\).

At a high level, this work provides a more complete picture of the optimization landscape of (BM) (particularly the important instances arising from the Max-Cut SDP). Furthermore, it justifies the use of beyond worst-case paradigms like smoothed analysis to obtain global convergence guarantees for the Burer-Monteiro method.