Least Squares Problems

Least Squares Problems#

Least squares problems occur in optimisation, data fitting and other fields of the mathematical sciences.

Consider a matrix \(A \in \mathbb{R}^{m \times n}\) and a vector \(b \in \mathbb{R}^m\). The central idea of the least squares problem is to find a vector \(\hat{x} \in \mathbb{R}^n\) that minimises the Euclidean norm (or \(2\)-norm) of the residual vector \(Ax - b\). This problem is formulated as

\[ \hat{x} \in \text{arg}\min_{x \in \mathbb{R}^n} \|Ax - b\|_2. \]

Here \(\text{arg}\min_{x \in \mathbb{R}^n} \|Ax - b\|_2\) is the set of all \(x\) which minimise \(\|Ax - b\|_2\). If the minimiser is unique, we also write (with a slight abuse of notation)

\[ \hat{x} = \text{arg}\min_{x \in \mathbb{R}^n} \|Ax - b\|_2. \]

Thus, in the least squares sense, the optimisation problem generalises the concept of the solution to a system of linear equations: if \(A\) is invertible, then \(A \hat{x} = b\). If \(A\) is overdetermined, then \(\hat{x}\) aims to minimise the amount by which \(A \hat{x}\) fails to achieve equality with \(b\).

The normal equation#

The identity

\[ A^TA\hat{x} = A^Tb. \]

is called the normal equation of the least-squares problem.

Theorem 3 (Normal equation of least-squares approximations)

Let \(A \in \mathbb{R}^{m \times n}\), \(b \in \mathbb{R}^m\). Then

\[ \hat{x} \in \text{arg}\min_{x \in \mathbb{R}^n} \|Ax - b\|_2. \]

if and only if \(\hat{x}\) satisfies the normal equation.

Proof. The vector \(\hat{x}\) minimises the least-squares problem if and only if for all \(\epsilon\in\mathbb{R}^n\):

\[\begin{split} \begin{aligned} \|A\hat{x} -b \|_2^2 & \leq \|A(\hat{x} + \epsilon) - b\|_2^2 = \left[A(\hat{x} + \epsilon) - b\right]^T\left[A(\hat{x} + \epsilon) - b\right]\\ &= \|A\hat{x} -b \|_2^2+2 \epsilon^T(A^TA\hat{x} - A^Tb) + \epsilon^TA^TA\epsilon \end{aligned} \end{split}\]

Step 1: Suppose that \(\hat{x}\) does not satisfy the normal equation. Then \(\| A^TA\hat{x} - A^Tb \|_2 > 0\). Suppose that \(\hat{x}\) minimises the least-squares problem. With \(\epsilon := \delta (A^TA\hat{x} - A^Tb)\), \(\delta \in \mathbb{R}\), the above implies

\[ 0 \leq 2 \delta \, \| A^TA\hat{x} - A^Tb \|_2^2 + \delta^2 \, \| A (A^TA\hat{x} - A^Tb) \|_2^2 =: \Phi(\delta) \qquad \forall \delta \in \mathbb{R}. \]

This is a contradiction because \(\Phi(\delta) < 0\) for small negative \(\delta\).

Step 2: Suppose that \(\hat{x}\) satisfies the normal equation. Then, by the above,

\[ \|A(\hat{x} + \epsilon) - b\|_2^2 = \|A\hat{x} -b \|_2^2 + \epsilon^TA^TA\epsilon \geq \|A\hat{x} -b \|_2^2. \]

Properties of the normal equation#

We assume in this section that \(m \ge n\) and that \(A\) has a full rank. Let \(A = U\Sigma V^T \in \mathbb{R}^{m \times n}\) be the SVD of \(A\). To generalise the relative condition number of a square matrix, we define the relative condition number for a rectangular \((m \times n)\)-matrix as

\[ \kappa_{rel}(A):=\sigma_1/\sigma_n. \]

We also have

\[ A^TA = V\Sigma^T\Sigma V^T \in \mathbb{R}^{n \times n}, \]

guaranteeing the non-singularity of \(A^T A\) and, therefore, the existence of a unique minimiser of the least-squares problem.

Furthermore,

\[ \kappa_{rel}(A^TA) = \sigma_1^2 / \sigma_n^2 = \kappa_{rel}(A)^2. \]

It follows that the linear system in the normal equation has a squared condition number compared to the original matrix \(A\).

Subsequently, we will explore orthogonalisation methods to circumvent this condition number squaring, offering a more stable approach to solving the least-squares problem.

Solving least-squares problems with the QR decomposition#

We assume in this section that \(m \ge n\) and that \(A\) has a full rank. The full \(QR\) decomposition of \(A\) can be expressed as follows:

\[\begin{split} A = \begin{bmatrix} Q_1 & Q_2 \end{bmatrix}\begin{bmatrix} R \\ 0 \end{bmatrix}, \end{split}\]

where \(Q_1\) and \(Q_2\) are orthogonal matrices, and \(R\) is an upper triangular matrix. Substituting this decomposition into the least-squares problem, we get:

\[\begin{split} \|Ax - b\|_2 = \left\|\begin{bmatrix} Q_1 & Q_2 \end{bmatrix}\begin{bmatrix} R \\ 0 \end{bmatrix}x - b\right\|_2 = \left\|\begin{bmatrix} R \\ 0 \end{bmatrix}x - \begin{bmatrix} Q_1^Tb \\ Q_2^Tb \end{bmatrix}\right\|_2 \end{split}\]

Here, we observe that the choice of \(x\) influences only the first block-row. In the second block row, \(x\) is multiplied by zero, meaning no choice of \(x\) will reduce the contribution from this part. Therefore, the minimiser \(\hat{x}\) of the least squares problem is found by solving:

\[ R \hat{x} = Q_1^Tb. \]

Returning to the question of the \(2\)-norm condition number:

\[ \kappa_{rel}(R) = \kappa_{rel}(Q^T A) = \| Q^T A \|_2 \, \| (Q^T A)^{-1} \|_2 \leq \| Q^T \|_2 \, \| A \|_2 \, \| A^{-1} \|_2 \, \| Q \|_2 = \kappa_{rel}(A), \]

where we use the submultiplicativity of the matrix \(2\)-norm and that \(\| Q \|_2 = 1\) and \(Q^T = Q^{-1}\).

Consequently, the \(QR\) decomposition approach effectively circumvents the condition number squaring issue associated with the normal equation method.

Solving least-squares problems with the SVD#

The SVD can also be used to solve least-squares problems. In this section, we assume that \(m \ge n\) but not that \(A\) has a full rank. The full SVD of \(A\) is

\[\begin{split} A = U\Sigma V^T = \begin{bmatrix} U_1 & U_2 \end{bmatrix} \begin{bmatrix} \hat{\Sigma} & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V_1 & V_2 \end{bmatrix}^T \end{split}\]

with \(\hat{\Sigma}\) as in the previous chapter. Substitute into the least-squares problem to obtain

\[ \|Ax-b\|_2 = \|U\Sigma V^Tx - b\|_2. \]

Let \([y_1, y_2]^T = [V_1^T x, V_2^T x]^T\) and factorise out \(U\) to obtain

\[\begin{split} \|Ax - b\|_2 = \left\|\begin{bmatrix} \hat{\Sigma} & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} y_1\\ y_2 \end{bmatrix} - \begin{bmatrix}U_1^Tb\\ U_2^Tb\end{bmatrix}\right\|_2. \end{split}\]

The second block row as well as the choice of \(y_2\) do not affect the least-squares norm and, hence, \(y_1 = \hat{\Sigma}^{-1} U_1^Tb\) and therefore

\[\begin{split} \hat{x}= \begin{bmatrix} V_1 & V_2 \end{bmatrix} \begin{bmatrix} \hat{\Sigma}^{-1} & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} U_1 & U_2 \end{bmatrix}^T b \end{split}\]

is a (possibly non-unique) minimiser of the least-squares problem. This leads to the important concept of a pseudo-inverse of a matrix.

Definition 1 (Pseudo-inverse of a matrix)

Let \(U \Sigma V^T\) be an SVD of \(A \in \mathbb{R}^{m \times n}\) and \(m \geq n\). Then the pseudo-inverse of \(A\) is

\[\begin{split} A^{\dagger} := V \begin{bmatrix} \hat{\Sigma}^{-1} & 0 \\ 0 & 0 \end{bmatrix} U^T, \end{split}\]

where \(\hat{\Sigma} \in \mathbb{R}^{r \times r}\) and \(r = \text{rank}(A)\).

The pseudo-inverse of \(A\) is a generalisation of the inverse of \(A\). In the same way that \(A^{-1}b\) solves the linear system \(Ax=b\) for a square matrix \(A\), the pseudo-inverse \(A^{\dagger}\) applied to \(b\) solves the least-squares minimisation problem \(\|Ax-b\|_2\) for rectangular \(A\).

Python skills#

Solving a least-squares problem with QR decomposition#

The following example shows how to use Numpy’s QR decomposition to find the least-squares approximation

import numpy as np
from scipy.linalg import solve_triangular

# Define matrix A and vector b
A = np.array([[1, 2], [3, 4], [5, 6]])
b = np.array([7, 8, 9])

# Perform QR decomposition
Q, R = np.linalg.qr(A)

# Compute Q^T * b
Q_T_b = np.dot(Q.T, b)

# Solve Rx = Q^Tb for x
x = solve_triangular(R[:A.shape[1], :], Q_T_b[:A.shape[1]])

print("Solution x:", x)

Solving a least-squares problem with SVD#

Here is a problem that is solved with the SVD.

import numpy as np

# Step 1: Define A and b
A = np.random.rand(5, 3)  # A random 5x3 matrix
b = np.random.rand(5)  # A random vector of length 5

# Step 2: Perform SVD
U, sigma, VT = np.linalg.svd(A)

# Step 3: Compute the pseudo-inverse
Sigma_inv = np.hstack((np.diag(1 / sigma), np.zeros((3, 2))))
A_pseudo_inverse = VT.T @ Sigma_inv @ U.T

# Step 4: Solve for x
x = A_pseudo_inverse @ b

print("Solution x:", x)

Self-check questions#

Question

Let’s consider three points: \((1, 2)\), \((2, 3)\), and \((3, 5)\). Find the best-fitting line \(y = m x + c\) in the least-squares sense:

\[ (\hat{c}, \hat{m}) = \text{arg}\min_{(c,m) \in \mathbb{R}^2} | 2 - (m \, 1 + c) |^2 + | 3 - (m \, 2 + c) |^2 + | 5 - (m \, 3 + c) |^2. \]

Question

Let \(x=(0,0.25,0.5,0.75,1)\).

Find a least-squares quadratic approximation to \(\{(x_i,\exp(x_i))\}\).
Find a least-squares quadratic approximation to \(\{(x_i,\cos(x_i))\}\).

Question

For many applications, one must obtain polynomial least-squares approximations at the same nodes \(\{ x_i \}\) for many different sets of data \(\{y_i\}\). What is the best way to do this? Only try to answer this after having worked out the two previous questions.

Question

Let \(A\in\mathbb{R}^{m\times n}\) with \(m\geq n\) and \(\text{rank}(A)=n\) and \(b\in\mathbb{R}^m\). How should one approach the following least-squares problem with the SVD? Find \(\hat{x}\in\mathbb{R}^n\), such that

\[ \|A\hat{x}-b\|_2 = \min_{x\in\mathbb{R}^n}\|Ax-b\|_2. \]

Question

We now consider the modified least-squares problem

\[ \min_{x\in\mathbb{R}^n} \|Ax-b\|_2^2+\|L x\|_2^2 \]

with \(L\in\mathbb{R}^{n\times n}\) nonsingular. Show that the solution of this least-squares problem is identical to the solution of

\[ (A^TA+L^TL)x = A^Tb. \]

Question

Let \(A = QR\) be the QR decomposition of a matrix \(A\in\mathbb{R}^{m\times n}\) with \(R\in\mathbb{R}^{n\times n}\). Show that the singular values of \(A\) are identical to those of \(R\).

Question

Let

\[\begin{split} A = \begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}, \qquad b = \begin{pmatrix} 2\\ 0 \end{pmatrix}. \end{split}\]

What is the pseudo-inverse of \(A\)? Describe all solutions of the least-squares problem \(\text{arg}\min_x \|Ax - b\|_2\).

Least Squares Problems

Contents

Least Squares Problems#

The normal equation#

Properties of the normal equation#

Solving least-squares problems with the QR decomposition#

Solving least-squares problems with the SVD#

Python skills#

Solving a least-squares problem with QR decomposition#

Solving a least-squares problem with SVD#

Self-check questions#

Optional material#