Least-squares with constraints

Least-squares with constraints#

In many applications, optimisation problems are posed with constraints. We consider here constrained least-squares problems of the form

\[\begin{split} \begin{array}{ll} \min_x & \| A x - a \|_2 \\ \text{subject to} &\\ & C x = c,\\ & D x \ge d. \end{array} \end{split}\]

where $a \in \mathbb{R}^m$, $c \in \mathbb{R}^p$, $d \in \mathbb{R}^q$, $C \in \mathbb{R}^{p \times n}$, $D \in \mathbb{R}^{q \times n}$ and $A \in \mathbb{R}^{m \times n}$. The expression $D x \ge d$ is to be understood componentwise: for $d, e \in \mathbb{R}^p$,

\[ d \ge e \qquad \Leftrightarrow \qquad d_i \ge e_i \quad \forall \; i \in \{1, \ldots, q\}. \]

Because of the constraints, the techniques from the previous section do not apply directly, as the problem is inherently nonlinear.

Setting $B = A^\top A$ and $b = - A^\top a$, we rewrite the above least-squares problem in the generic form of a quadratic programme.

Primal problem (quadratic programme)

\[\begin{split} \begin{array}{ll} \min_x & \frac{1}{2} x^\top B x + b^\top x\\ \text{subject to} &\\ & C x - c = 0,\\ & D x - d \ge 0. \end{array} \end{split}\]

Both problems have the same minimisers because

\[ \textstyle \frac{1}{2} \| A x - a \|_2^2 = \frac{1}{2} x^\top A^\top A x - a^\top A x + \frac{1}{2} a^\top a = x^\top B x + b^\top x + \frac{1}{2} a^\top a, \]

noting that the constant term $\frac{1}{2} a^\top a$ can be removed without changing the set of minimisers. Because of our initial assumption $B = A^\top A$ we shall assume throughout the section that $B$ is symmetric and positive semi-definite. We refer to $\frac{1}{2} x^\top B x + b^\top x$ as the objective function, $C x - c = 0$ the equality constraint and $D x - d \ge 0$ the inequality constraint. The value of the optimal control problem is that value of the objective function when evaluated at an optimal solution.

Observe, that the value of the initial problem in least-squares form differs from that of quadratic programme because of the factor $\frac{1}{2}$ and the squaring of the norm and the constant term, even if the set of optimal solutions coincide. However, given a optimal solution $x$ it is simple to compute the value $\| A x - a \|_2$.

Certificates of optimality#

In this section, we learn about certificates of optimality, which address a fundamental point: The algorithms to solve constrained least-squares problem are outside of the scope of this module. How, then, can you solve such problems with confidence? More generally, how can you use numerical methods, that you have not mathematically fully understood?

This is a question you face now, as you only begin with computational mathematics and want to use a wide range of methods as quickly as possible. But even as a seasoned numerical analyst, you will need to think about deploying mathematical algorithms in application scenarios where users are not domain experts.

A key realisation is that verifying whether a candidate solution is indeed equal to (or at least a good approximation of) an exaxt solution is often much simpler than finding such a candidate in the first place.

You are familiar with this situation from other areas of mathematics: For

\[ F(t) = \int_0^\top f(x) dx, \]

it can be hard to find $F$ given $f$, but once a candidate $F$ is found, it is relatively easy to check that its derivative is $f$. Similarly, various encryption methods rely on the fact that it is easier to multiply two prime numbers than to find the prime factors of a given product.

A certificate of optimality consists of a candidate solution together with additional information that makes it relatively easy to confirm the quality of the candidate solution; even if the method by which the candidate solution was obtained is not examined. In other areas of numerical analysis and scientific computing one finds similar concepts; for example, a posteriori error bounds are used to appraise the quality of numerical solutions of differential equations.

Duality#

There are different types of certificates of optimality; many of them are based on the important concept of duality, which plays a key role throughout scientific computing; differential equations again serve as a good example.

The Lagrangian function associated with the primal problem is

\[ L(x, y, s) := \frac{1}{2} x^\top B x + b^\top x - y^\top (C x - c) - s^\top (D x - d). \]

The variables $y$ and $s$ are called Lagrangian multipliers. All information about the primal problem is encoded within the Lagrangian function through the observation that for any given vector $x$ (self-check question)

\[\begin{split} \max_{y, s \atop s \geq 0} L(x, y, s) = \begin{cases} \frac{1}{2} x^\top B x + b^\top x & \text{if $C x = c$ and $D x \geq d$,}\\ +\infty & \text{otherwise.} \end{cases} \end{split}\]

Therefore, the primal problem can be recast in a more abstract form.

Primal problem (abstract form)

\[ \min_x \max_{y, s \atop s \geq 0} L(x, y, s) \]

The dual problem is obtained by reversing the order of the min and max operations.

Dual problem (abstract form)

\[ \max_{y, s \atop s \geq 0} \min_x L(x, y, s) \]

Within mathematics, we speak of duality when we identify two structures $A$ and $B$ and a transformation $\Phi$, where $\Phi(A) = B$ and $\Phi(B) = A$. Here $\Phi$ corresponds to the reversal of $\max$ and $\min$. Now convince yourself (self-check question) that the dual problem can be written as follows:

Dual problem (quadratic programme)

\[\begin{split} \begin{array}{ll} \max_{x,y,s} & c^\top y + d^\top s - \frac{1}{2} x^\top B x\\ \text{subject to} &\\ & C^\top y + D^\top s - B x = b,\\ & s \ge 0. \end{array} \end{split}\]

We call $x$ a feasible point of the primal problem if $C x = c$ and $D x \ge d$. Similarly, $(x,y,s)$ is feasible for the dual problem if $C^\top y + D^\top s - B x = b$ and $s \ge 0$. In short, feasible points are those that satisfy the constraints.

There is a fundamental connection between the primal and dual problem

Theorem 7 (Weak duality)

Assume that $\bar{x}$ and $(x,y,s)$ are feasible points of the primal and dual problems, respectively. Then

\[ c^\top y + d^\top s - \frac{1}{2} x^\top B x \leq \frac{1}{2} \bar{x}^\top B \bar{x} + b^\top \bar{x}. \]

In other words, among feasible points, the objective function of the dual problem attains values below than or equal to those of the primal function.

Proof. For feasible $x$ and $(x,y,s)$,

\[\begin{split} \begin{align*} c^\top y + d^\top s - \frac{1}{2} x^\top B x \, \leq \, & (C \bar{x})^\top y + (D \bar{x})^\top s - \frac{1}{2} x^\top B x\\ = \, & (C^\top y + D^\top s)^\top \bar{x} - \frac{1}{2} x^\top B x\\ = \, & (b + B x)^\top \bar{x} - \frac{1}{2} x^\top B x\\ = \, & \frac{1}{2} \bar{x}^\top B \bar{x} + b^\top \bar{x} - \frac{1}{2} (x - \bar{x})^\top B (x - \bar{x})\\ \leq \, & \frac{1}{2} \bar{x}^\top B \bar{x} + b^\top \bar{x}. \end{align*} \end{split}\]

Bearing the weak duality theorem in mind, we call a pair $(\bar{x}, (x,y,s))$ a certificate of optimality. To check that $\bar{x}$ is indeed solution to the primal problem we need to do the following:

Check that $\bar{x}$ is a feasible point of the primal problem.
Check that $(x,y,s)$ is a feasible point of the dual problem.
Check that $c^\top y + d^\top s - \frac{1}{2} x^\top B x = \frac{1}{2} \bar{x}^\top B \bar{x} + b^\top \bar{x}$.

All those steps only require the evaluation of simple matrix vector and scalar products.

Why does this work? Suppose that $\bar{x}$ and $(x,y,s)$ passed the above tests 1., 2. and 3., but $\bar{x}$ was not an optimal solution. Then there exists another feasible point $\tilde{x}$ of the primal problem with an even lower value:

\[ \frac{1}{2} \tilde{x}^\top B \tilde{x} + b^\top \tilde{x} < \frac{1}{2} \bar{x}^\top B \bar{x} + b^\top \bar{x} = c^\top y + d^\top s - \frac{1}{2} x^\top B x. \]

This is impossible by weak duality, and therefore $\bar{x}$ must be optimal.

With $\bar{x}$ and $(x,y,s)$ optimal, the difference

\[ \underbrace{\frac{1}{2} \bar{x}^\top B \bar{x} + b^\top \bar{x}}_{\text{optimal value of the primal problem}} - \underbrace{c^\top y + d^\top s - \frac{1}{2} x^\top B x}_{\text{optimal value of the dual problem}} \]

is called the duality gap.

One might wonder if certificates of the type $(\bar{x}, (x,y,s))$ are too restrictive: Could there be optimisation problems for which the duality gap exceeds 0? Then the above certificate of optimality would not work even if optimal solutions are found (self-check question). The following deep result, known as strong duality, shows that this cannot happen for constrained least-squares problems.

Theorem 8 (Strong duality)

Assume that either the primal or dual problem has a feasible point. Then this problem is bounded if and only if the other has a feasible point. In such cases, both problems have optimal solutions and share the same optimal value.

Proof. The proof is beyond the scope of the lecture notes. See [Nocedal and Wright, 2006] for details.

In contrast, many optimisation problems, for example with non-convex objective function or constraints, exhibit a positive duality gap, making it more difficult to certify optimality.

Python skills#

CVXPY is a Python library designed to make the construction and solving of quadratic and other convex optimisation problems straightforward. Its high-level syntax allows users to specify an objective and constraints in a manner similar to writing down the mathematical formulation. Under the hood, CVXPY automatically transforms the problem into a canonical form, selects an appropriate solver, and returns primal and dual solutions, all while handling domain restrictions, dual variables, and convergence tolerances. This streamlined workflow makes it an ideal tool for both instructional examples and real-world applications.

Least-squares problems#

The code below illustrates a template for solving least-squares problems of the type discussed above. Here, A, a, C, c, D and d are initialised randomly for demonstration. By substituting specific data, one can easily solve practical least-squares problems.

import cvxpy as cp
import numpy as np

# ------------------------------------------------
# Example data (adjust as appropriate).
# ------------------------------------------------
n, m, p, q = 5, 6, 2, 3
np.random.seed(0)

# Construct a least-squares setup:
A = np.random.randn(m, n)
a = np.random.randn(m)
B = A.T @ A
b_vec = -A.T @ a

# Constraints:
#   Cx - c = 0  ->  in CVXPY: C@x == c
#   Dx - d >= 0 ->  in CVXPY: D@x >= d
C = np.random.randn(p, n)
c = np.random.randn(p)
D = np.random.randn(q, n)
d = np.random.randn(q)

# Define the variable and constraints in CVXPY
x = cp.Variable(n)
constraints = [
    C @ x == c,  # equality
    D @ x >= d   # inequality
]

# Objective
objective = 0.5 * cp.quad_form(x, B) + b_vec @ x

# Formulate the problem
problem = cp.Problem(cp.Minimize(objective), constraints)

# Solve
primal_value = problem.solve(solver=cp.SCS)  # or another solver
print("Primal objective value:", primal_value)
print("Primal solution x:", x.value)

# Original least-squares problem
ls_value = np.linalg.norm(A @ x.value - a, 2)
print("Original least-squares objective:", ls_value)

This code returns the optimiser x, the value of the primal quadratic programme, and the value of the original least-squares problem. To check that CVXPY has indeed found an optimal solution, we also retrieve the dual variables.

# Obtain y and s from the solver.
# CVXPY stores in constraints[0].dual_value the variable -y instead of y
y = -constraints[0].dual_value  
s = constraints[1].dual_value
print("Dual variable y (equality):", y)
print("Dual variable s (inequality):", s)

# Compute x from the dual problem:
rhs = (C.T @ y) + (D.T @ s) - b_vec
xDual = np.linalg.pinv(B) @ rhs  # Use the pseudo-inverse if B is semidefinite.

# Evaluate the dual objective
dual_value = (c @ y) + (d @ s) - 0.5*(xDual.T @ B @ xDual)
print("Dual objective value:", dual_value)

# Duality gap
gap = primal_value - dual_value
print("Duality gap:", gap)

Note that CVXPY does not directly return the $x$ of the dual problem; it merely provides $y$ and $s$. We recover the dual $x$ from the dual equality constraint. The pseudo-inverse (pinv) is needed if B is only positive semidefinite. As always, a near-zero duality gap indicates strong duality and certifies that the solution is optimal.

Self-check questions#

Question

Consider the least-squares problem in $\mathbb{R}^2$

\[\begin{split} \begin{array}{ll} \min_x & \| A x - a \|_2 \\ \text{subject to} &\\ & x_1 + x_2 = 1,\\ & x \ge 0 \end{array} \end{split}\]

with

\[\begin{split} A = \begin{pmatrix} 2 & 1\\ 1 & 2 \end{pmatrix}, \qquad a = \begin{pmatrix} 1\\ 2 \end{pmatrix}. \end{split}\]

Write down the primal and dual quadratic programme. What is the Lagrangian function?

You are provided with the certificates of optimality

\[\begin{split} \begin{align*} \bar{x}^{(1)} & = (0.5, 0.5)^\top,\\ (x^{(1)},y^{(1)},s^{(1)}) & = ((0.5, 0.5)^\top, 1, (0,0)^\top). \end{align*} \end{split}\]

and

\[\begin{split} \begin{align*} \bar{x}^{(2)} & = (0, 1)^\top,\\ (x^{(2)},y^{(2)},s^{(2)}) & = ((0.5, 0.5)^\top, 0, (0,0)^\top). \end{align*} \end{split}\]

Determine for each certificate whether it is a valid proof that $\bar{x}_i$ is optimal solution of the primal problem, $i = 1$ resp. $i = 2$.

Question

The Lagrangian function associated with the primal problem is

\[ L(x, y, s) := \frac{1}{2} x^\top B x + b^\top x - y^\top (C x - c) - s^\top (D x - d). \]

Show that

\[\begin{split} \max_{y, s \atop s \geq 0} L(x, y, s) = \begin{cases} \frac{1}{2} x^\top B x + b^\top x & \text{if $C x = c$ and $D x \geq d$,}\\ +\infty & \text{otherwise.} \end{cases} \end{split}\]

Question

Show that the dual problem (with $B$ symmetric)

\[ \max_{y, s \atop s \geq 0} \min_x L(x, y, s) = \max_{y, s \atop s \geq 0} \min_x \Big[ \frac{1}{2} x^\top B x + b^\top x - y^\top (C x - c) - s^\top (D x - d) \Big] \]

can be written as

\[\begin{split} \begin{array}{ll} \max_{x,y,s} & c^\top y + d^\top s - \frac{1}{2} x^\top B x\\ \text{subject to} &\\ & C^\top y + D^\top s - B x = b,\\ & s \ge 0. \end{array} \end{split}\]

Question

To examine optimisation problems without strong duality, we consider a possibly non-quadratic optimisation problem. Given $f : \mathbb{R}^n \to \mathbb{R}$, $h : \mathbb{R}^n \to \mathbb{R}^p$ and $g : \mathbb{R}^n \to \mathbb{R}^q$, let the Lagrangian function be

\[ L(x, y, s) :=f(x) - y^\top h(x) - s^\top g(x), \]

giving rise to the primal problem

\[ v_p := \min_x \underbrace{\max_{y, s \atop s \geq 0} L(x, y, s)}_{=: P(x)} = \min_x P(x) \]

and the dual problem

\[ v_d := \max_{y, s \atop s \geq 0} \underbrace{\min_x L(x, y, s)}_{=: D(y,s)} = \max_{y, s \atop s \geq 0} D(y,s). \]

Suppose that a minimiser $\bar{x}$ of the primal problem exists and that a maximiser $(y,s)$ of the dual problem exists. For that $(y,s)$ let $x$ be a minimiser of $\min_x L(x, y, s)$.

We say that strong duality does not hold if $v_d < v_p$; note that the inquality is strict. Explain why in this case $(\bar{x}, (x,y,s))$ cannot be used as certificate of optimimality as above.

Question

You have three food products, labelled 1, 2, and 3, with the following characteristics:

Product	Price per kg (pounds)	Protein content (g per kg)
1	5.0	150
2	3.5	100
3	7.0	220

You want to blend these products (in nonnegative quantities $x_1, x_2, x_3$ kg) to reach a target protein intake of $R = 200$ grams, but you also want to cap the cost at 6 pounds. However, it may not be possible to match the target exactly without violating the cost limit, so you decide to solve a least-squares problem: Minimise the deviation from 200 grams of protein per kg, i.e.

\[ \min_{x} \;\|A x - a\|_2, \]

where $A = \begin{pmatrix} 150 & 100 & 220 \end{pmatrix}$, $a = 200$, subject to the cost constraint $5\,x_1 + 3.5\,x_2 + 7\,x_3 \le 6$ and the nonnegativity constraint $x_1, x_2, x_3 \ge 0.$.

Rewrite the least-squares objective in the form

\[ \frac12\,x^\top B\,x \;+\; b^\top x \quad (\text{ignore any constant term}). \]

Identify the explicit matrices $B$ and $b$.
Express all constraints in matrix form, i.e. $D x \ge d$, and state the final quadratic programme

\[ \min_{x} \quad \frac12\,x^\top B\,x + b^\top x \quad \text{subject to} \quad D x \ge d. \]
Interpret the role of the cost constraint within this least-squares setup. Under what circumstance might you exactly meet the 200-gram protein target?
Find the optimal solution with CVXPY.

Answer

Formulating the least-squares problem:

The (squared) least-squares objective is

\[ \|A x - a\|_2^2 = \bigl(150\,x_1 + 100\,x_2 + 220\,x_3 - 200\bigr)^2. \]

We can expand and then write this in the usual form $x^\top B\,x + b^\top x + \text{const}$. Specifically:

\[\begin{split} B = A^\top A = \begin{pmatrix} 150\\ 100\\ 220 \end{pmatrix} \begin{pmatrix} 150 & 100 & 220 \end{pmatrix} = \begin{pmatrix} 150^2 & 150\cdot 100 & 150\cdot 220\\ 100\cdot 150 & 100^2 & 100\cdot 220\\ 220\cdot 150 & 220\cdot 100 & 220^2 \end{pmatrix}. \end{split}\]

and

\[\begin{split} b = - A^\top a = - \begin{pmatrix} 200 \cdot 150\\ 200 \cdot 100\\ 200 \cdot 220 \end{pmatrix}. \end{split}\]

Note that any constant term $200^2$ does not affect the minimiser.
Constraints in matrix form:

We can write the cost constraint as

\[\begin{split} \begin{pmatrix} 5 & 3.5 & 7 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} \;\le\; 10. \end{split}\]

The non-negativity constraint is

\[\begin{split} \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix} x \;\ge\; \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}. \end{split}\]

Putting all inequalities together, we can define

\[\begin{split} D = \begin{pmatrix} -5 & -3.5 & -7 \\[4pt] 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{pmatrix}, \quad d = \begin{pmatrix} -10 \\ 0 \\ 0 \\ 0 \end{pmatrix}. \end{split}\]

The final quadratic programme is

\[\begin{split} \begin{aligned} \min_{x} \quad & \tfrac12 \,x^\top B\,x \;+\; b^\top x \\ \text{subject to}\quad & D\,x \;\ge\; d. \end{aligned}. \end{split}\]
Interpretation:
- Minimising $\|A x - a\|_2$ steers $x$ toward achieving exactly 200 grams of protein.
- The constraint $\sum_i \text{price}_i\,x_i \le 6$ caps the total cost. If 200 grams of protein cannot be reached within 6 pounds, the solution will yield a mix $x$ that partially meets the target but does not exceed the cost.
- If the cost limit is large (say $\ge 10$), the solver will likely achieve $\|A x - a\|_2 = 0$ (i.e. exactly 200 grams) because there is no financial barrier to picking enough high-protein product.

CVXPY: We can solve the problem with CVXPY, using the below code. The optimal solution achieves approximately 189 grams per kile at a cost of 6 pounds. This code implements the least-squares problem in its original form. If a certificate of optimality is needed, adapt the code of the Python skills section to solve this problem.

import cvxpy as cp
import numpy as np

# Problem data
A = np.array([[150, 100, 220]])  # 1x3
a = 200.0
price = np.array([5.0, 3.5, 7.0])
cost_limit = 6

# Define variable
x = cp.Variable(3)

# Objective: least-squares
# Minimize ||A x - a||_2^2 = (A x - a)^2 in this 1D case
objective = cp.Minimize(cp.sum_squares(A @ x - a))

# Constraints
constraints = [
    price @ x <= cost_limit,  # cost constraint
    x >= 0                    # nonnegativity
]

# Form and solve
prob = cp.Problem(objective, constraints)
result = prob.solve(solver=cp.SCS)

print("Optimal objective (squared deviation):", result)
print("Solution x:", x.value)
print("Check cost:", price @ x.value, "(<= 10.0)")
print("Protein intake:", A @ x.value, "(target 200)")

Question

Let $\ell(x,y) = 0$ if $x \neq y$ and otherwise $\ell(x,y) = 1$ where $x,y \in \mathbb{R}$. Show that

\[ \max_y \min_x \ell(x,y) < \min_x \max_y \ell(x,y). \]

Least-squares with constraints

Contents

Least-squares with constraints#

Certificates of optimality#

Duality#

Python skills#

Least-squares problems#

Self-check questions#