Singular value decomposition

Singular value decomposition#

A matrix \(\Sigma \in \mathbb{R}^{m \times n}\) is called diagonal if \(\sigma_{ij} = 0\) for all \(i \neq j\).

Theorem 4 (Singular Value Decomposition)

For any matrix \(A \in \mathbb{R}^{m \times n}\), there exists a singular value decomposition (SVD)

\[ A = U\Sigma V^\top, \]

where \(U \in \mathbb{R}^{m \times m}\) and \(V \in \mathbb{R}^{n \times n}\) are orthogonal matrices and \(\Sigma\) is diagonal with non-negative descending diagonal elements \(\sigma_i := \sigma_{ii}\):

\[ \sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_n \geq 0. \]

The proof is given below in the optional materials section. For the remainder of the page, assume that \(m \geq n\). The matrix \(\Sigma \in \mathbb{R}^{m \times n}\) is a block matrix of the form

\[\begin{split} \Sigma = \begin{pmatrix} \tilde{\Sigma} \\ 0 \end{pmatrix}, \end{split}\]

where \(\tilde{\Sigma} = \text{diag}(\sigma_1, \sigma_2, \dots, \sigma_r, 0, \dots, 0)\) is a diagonal matrix with non-negative real numbers \(\sigma_1, \sigma_2, \dots, \sigma_r\) on the diagonal, and \(r\) represents the rank of the matrix \(A\).

These diagonal entries \(\sigma_i\) (for \(i = 1, 2, \dots, n\)) are known as the singular values of \(A\). The columns \(v_j\) of \(V\) are called right singular vectors. The columns \(u_j\) of \(U\) are called left singular vectors.

Insights#

Rank determination: The rank of matrix \(A\) is equal to \(r\), defined as the count of non-zero singular values in the decomposition.
Kernel characterisation: The kernel (or null space) of \(A\), denoted as \(\text{ker}(A)\), is the span of the vectors \(\{v_{r+1}, \dots, v_n\}\).
Range specification: The range of \(A\), represented as \(\text{ran}(A)\), is the span of the vectors \(\{u_1, \dots, u_r\}\).
\(2\)-norm relation: The \(2\) norm of \(A\) is equal to the greatest singular value, \(\sigma_1\). Hence, \(\|A\|_2 = \sigma_1\).
Frobenius norm calculation: Utilising the property that \(\|Q A P\|_F = \|A\|_F\) for any orthogonal matrices \(Q\) and \(P\), we find that

\[\|A\|_F =\left(\sum_{j=1}^r\sigma_j^2\right)^{1/2}.\]
Norm inequalities: As a direct result, we deduce the inequality \(\|A\|_2 \leq \|A\|_F \leq \sqrt{r}\|A\|_2\).
Inverse and condition number: For an invertible matrix \(A\), its inverse is \(A^{-1} = V\Sigma^{-1}U^\top\), and \(\|A^{-1}\|_2 = \sigma_n^{-1}\). The relative condition number, \(\kappa_{rel}(A)\), is thus \(\frac{\sigma_1}{\sigma_n}\). The smallest perturbation that makes \(A + \Delta A\) singular is \(\Delta A = -\sigma_n u_n v_n^\top\). Therefore, the condition number inversely relates to the smallest relative perturbation distance to singularity.

See also the related self-check question below.

SVD and low-rank approximation#

The singular value decomposition of matrix \(A\), represented as \(A = U\Sigma V^\top\), can be expansively expressed as a sum of rank-1 matrices: For the element \(a_{ij}\) we find

\[ a_{ij} = \sum_{k,\ell = 1}^n u_{ik} \, \sigma_{k\ell} \, v_{j\ell} = \sum_{k = 1}^r u_{ik} \, \sigma_k \, v_{jk}. \]

Thus, for the matrix \(A\),

\[ A = \sigma_1 u_1 v_1^\top + \sigma_2 u_2 v_2^\top + \dots + \sigma_r u_r v_r^\top, \]

where each term \(\sigma_j u_j v_j^\top\) is a matrix of rank 1. Let us define a partial sum of these terms as \(A_\nu\), given by:

\[ A_\nu := \sum_{k=1}^\nu \sigma_k u_k v_k^\top, \]

for any \(0 \leq \nu \leq r\). This partial sum captures the contribution of the first \(\nu\) singular values and their corresponding singular vectors to the matrix \(A\).

The spectral norm of the difference between \(A\) and \(A_\nu\) is equal to the \((\nu+1)\)-th singular value, \(\sigma_ {\nu+1}\). Thus, we have:

\[ \|A - A_{\nu}\|_2 = \sigma_{\nu+1}. \]

This expression quantifies the significance of each singular value and its associated rank-1 component in approximating the original matrix \(A\).

Fact: Low-rank matrix approximation

Let \(A\) and \(A_{\nu}\) be defined as above. Then

\[ \|A - A_{\nu}\|_2 = \min_{B\in\mathbb{R}^{m\times n}, \text{ rank}(B)\leq\nu} \|A - B\|_2 = \sigma_{\nu+1}. \]

This formulation highlights the utility of singular value decomposition (SVD) in reducing computational costs in numerical linear algebra tasks. Traditional matrix-matrix multiplications, without using advanced techniques like Strassen’s algorithm, typically scale with a computational complexity of \(\mathcal{O}(N^3)\). However, when dealing with the multiplication of two rank-1 matrices, such as \(\sigma u v^\top\) and \(\hat{\sigma} \hat{u} \hat{v}^\top\), the process becomes much more efficient. The critical step involves calculating the scalar product \(\langle v, \hat{u}\rangle\), which is then scaled by the product of \(\sigma\) and \(\hat{\sigma}\). This results in the rank-1 representation of the overall product:

\[ (\sigma u v^\top) (\hat{\sigma} \hat{u} \hat{v}^\top) = (\sigma \hat{\sigma} \langle v, \hat{u}\rangle) u \hat{v}^\top. \]

Building on this approach, the multiplication of rank-\(\nu\) approximations of two matrices \(A\) and \(B\) can be executed with a computational cost of \(\mathcal{O}(n \cdot \nu^2)\):

\[ A_\nu \cdot \hat{A}_\nu = \sum_{k,\ell} (\sigma_k u_k v_k^\top) (\hat{\sigma}_\ell \hat{u}_\ell \hat{v}_\ell^\top) = \sum_{k,\ell} (\sigma_k \hat{\sigma}_\ell \langle v_k, \hat{u}_\ell\rangle) u_k \hat{v}^\top_\ell. \]

This method is particularly advantageous in applications where the singular values diminish rapidly in magnitude. In such scenarios, using a rank-\(\nu\) approximation yields substantial computational savings, often with only a moderate or negligible compromise in accuracy. This balance between efficiency and precision makes SVD a powerful tool in large-scale numerical computations.

The reduced SVD#

Above, we introduced the form of the SVD, which is known as the full SVD of \(A \in \mathbb{R}^{m \times n}\):

\[\begin{split} A = \begin{pmatrix} U_1 & U_2 \end{pmatrix} \begin{pmatrix} \hat{\Sigma} & 0 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} V_1 & V_2 \end{pmatrix}^\top, \end{split}\]

where \(U = \begin{pmatrix} U_1 & U_2 \end{pmatrix} \in \mathbb{R}^{m \times m}\) and \(V = \begin{pmatrix} V_1 & V_2 \end{pmatrix} \in \mathbb{R}^{n \times n}\) are orthogonal matrices derived from the SVD. Here,

\[\begin{split} \Sigma = \begin{pmatrix} \hat{\Sigma} & 0 \\ 0 & 0 \end{pmatrix} \end{split}\]

represents the diagonal matrix in the decomposition. The columns of \(U_2\) span the orthogonal complement of the range of \(A\), and the columns of \(V_2\) constitute the null space of \(A\). The matrix \(\hat{\Sigma} \in \mathbb{R}^{r \times r}\) is a diagonal matrix with its diagonal elements consisting of the positive singular values \(\sigma_1, \dots, \sigma_r\).

The reduced SVD of \(A\) is defined as:

\[ A = U_1\hat{\Sigma}V_1^\top. \]

This reduced SVD is usually sufficient for many computational applications. It leaves out those parts of \(U\), \(\Sigma\) and \(V\), which do not affect the product \(U\Sigma V^\top\).

Python skills#

We still need to develop more theory to state a numerical method for finding the SVD of a matrix. Nonetheless, the SVD already serves us as an essential and flexible tool to tackle many challenges of computational mathematics.

The below code illustrates how to compute the SVD of a matrix in NumPy.

import numpy as np

def SVD_report(A, full_matrices=True):
  m = A.shape[0]
  n = A.shape[1]

  if m <= n:
    raise ValueError("The matrix must have at least as many rows as columns.")

  # compute the SVD
  # U and VT are the orthogonal matrices, and S contains the singular values.
  U, S, VT = np.linalg.svd(A, full_matrices=full_matrices)

  # Since S is returned as a 1D array, let us convert it to a diagonal matrix
  Sigma = np.zeros((U.shape[1],VT.shape[0]))
  np.fill_diagonal(Sigma, S)  

  # Reconstruct the original matrix
  A_reconstructed = U @ Sigma @ VT

  # Print the results
  print("Matrix A:")
  print(A)
  print("\nU matrix:")
  print(U)
  print("\nS vector:")
  print(S)
  print("\nSigma matrix:")
  print(Sigma)
  print("\nV^\top matrix:")
  print(VT)
  print("\nReconstructed Matrix A:")
  print(A_reconstructed)

# Example 1: matrix with full column rank
A = np.array([[1, 2],
              [4, 5],
              [7, 8]])

# Compute the reduced SVD
print("Reduced SVD of full column rank matrix:")
print("---------------------------------------\n")
SVD_report(A, full_matrices=False)

# Compute the full SVD
print("\n\nFull SVD of full column rank matrix:")
print("------------------------------------\n")
SVD_report(A, full_matrices=True)

# Example 2: matrix with deficient column rank
A = np.array([[1, 1],
              [4, 4],
              [7, 7]])

# Compute the reduced SVD
print("\n\nReduced SVD of deficient column rank matrix:")
print("-------------------------------------------\n")
SVD_report(A, full_matrices=False)

# Compute the full SVD
print("\n\nFull SVD of deficient column rank matrix:")
print("----------------------------------------\n")
SVD_report(A, full_matrices=True)

Self-check questions#

Question

Find the SVD of

\[\begin{split} A = \begin{pmatrix} 0 & 2 & 0\\ -3 & 0 & 0\\ 0 & 0 & 1 \end{pmatrix}. \end{split}\]

Question

Explain why each of the insights listed above (just below the theorem about existence) of the SVD holds.

Answer

To prove these insights, let us assume that the SVD of a \(A\) is given by \(A = U \Sigma V^\top\).

Rank determination: We first show that \(\text{rank}(A) \geq r\). Indeed, this is the case if one can find \(r\) linearly independent vectors in the range of \(A\). Because \(U\) is orthogonal, its first \(r\) columns \(\{u_1, \ldots, u_r\}\) are linearly independent. Moreover, \(u_j \in \text{ran}(A)\) because the \(j\)th column vector \(v_j\) of \(V\) maps

\[ A \, v_j = U \, \Sigma V^\top \, v_j = U \, \Sigma \, e_j = \sigma_j \, U e_j = \sigma_j \, u_j. \]

Now we show that \(\text{rank}(A) \leq r\), recall the reduced SVD form \(A = U_1\hat{\Sigma}V_1^\top\). This implies \(\text{rank}(A) \leq \text{rank}(U_1)\). Because \(U_1\) has only \(r\) columns, the result follows.

Kernel characterization: \(Ax = 0\) implies \(U \Sigma V^\top x = 0\). Multiplying both sides by \(U^\top\), we get \(\Sigma V^\top x = 0\). This equation can only be satisfied by vectors in the span of the columns of \(V\) corresponding to zero singular values, which are \(\{v_{r+1}, \dots, v_n\}\).

Range specification: The above argument for the rank determination showed that the columns of \(A\) are spanned by the columns of \(U\) corresponding to non-zero singular values in \(\Sigma\). Hence, \(\text{ran}(A)\) is spanned by \(\{u_1, \dots, u_r\}\).

\(2\)-Norm relation: We matrix norm is induced by the Euclidean norm:

\[\begin{split} \begin{align} \| A \|_2 & = \max_{x \neq 0} \frac{\| A x \|_2}{\| x \|_2} = \max_{x \neq 0} \frac{\| U \Sigma V^\top x \|_2}{\| x \|_2} \stackrel{\| x \| = \| V^\top x \|}{=} \max_{x \neq 0} \frac{\| U \Sigma V^\top x \|_2}{\| V^\top x \|_2}\\ & \stackrel{V^\top x = y}{=} \max_{y \neq 0} \frac{\| U \Sigma y \|_2}{\| y \|_2} \stackrel{\| U \Sigma y \| = \| \Sigma y \|}{=} \max_{y \neq 0} \frac{\| \Sigma y \|_2}{\| y \|_2} \stackrel{(*)}{=} \sigma_1. \end{align} \end{split}\]

For \((*)\) we use that \(\sigma_1\) is attained by \(\| \Sigma y \| / \| y \|\) with \(y = e_1\) and that \(\| \Sigma y \|^2 \leq \sum_i (\sigma_1 y_i)^2 = \sigma_1^2 \| y \|^2\).

Frobenius norm calculation: To show that multiplication with an orthogonal matrix from the left and right does not change the Frobenius norm, we use that the multiplication with an orthogonal matrix does not change the \(2\)-norm of a vector:

\[\begin{split} \begin{align} \sum_{i,j} |(Q A P)_{ij}|^2 & = \sum_j \| (Q A P)_{:,j} \|_2^2 = \sum_j \| (A P)_{:,j} \|_2^2 = \| A P \|_F^2\\ & = \| P^\top A^\top \|_F^2 = \sum_j \| (P^\top A^\top)_{:,j} \|_2^2 = \| A^\top \|F^2. \end{align} \end{split}\]

Therefore,

\[ \| A \|_F^2 = \| U \Sigma V^\top \|_F^2 = \| \Sigma \|_F^2 = \sum_{i = 1}^r \sigma_i^2. \]

Norm inequalities: We note that \(\| A \|_2^2 = \sigma_1^2 \leq \sum_{i = 1}^r \sigma_i^2 = \| A \|_F^2 \leq r \sigma_1^2 = r \| A \|_2^2\).

Inverse and condition number: Let \(A\) be invertible. Because \((U \Sigma V^\top) (V \Sigma^{-1} U^\top) = I\), it is clear that \(A^{-1} = V \Sigma^{-1} U^\top\). Therefore, \(\|A^{-1}\|_2\) is \(\sigma_n^{-1}\), the reciprocal of the smallest singular value and the relative condition number for the \(2\)-norm is \(\kappa_{rel}(A) = \| A \|_2 \cdot \| A^{-1} \|_2 = \sigma_1 / \sigma_n\).

Using the theorem of the low-rank matrix approximation, the smallest perturbation \(\Delta A\) that makes \(A + \Delta A\) singular corresponds to the smallest singular value, so \(\Delta A = -\sigma_n u_n v_n^\top\), implying that the condition number is inversely related to the smallest relative perturbation distance to singularity.

Question

Let \(A \in \mathbb{R}^{m \times n}\), \(m \geq n\). Show that the singular values of \(A\) are the square roots of the eigenvalues of \(A^\top A\).

Question

Let \(A\in\mathbb{R}^{m\times n}\) with \(m\geq n\). Show that the smallest singular value \(\sigma_n\) of \(A\) satisfies

\[ \sigma_n = \min_{x\in\mathbb{R}^n} \frac{\|Ax\|_2}{\|x\|_2}. \]

Question

Explain how the orthogonal complement of the range of a matrix \(A \in \mathbb{R}^{m \times n}\) can be computed with its SVD \(U \Sigma V^\top\).

Question

Let \(A = QR\) be the QR decomposition of a matrix \(A\in\mathbb{R}^{m\times n}\) with \(R\in\mathbb{R}^{n\times n}\). Show that the singular values of \(A\) are identical to those of \(R\).

Question

Find all SVDs of the identity matrix \(I\).

Question

Consider a full-rank, square linear system \(Ax = b\), where \(A \in \mathbb{R}^{n \times n}\) and \(b \in \mathbb{R}^n\). The singular value decomposition (SVD) of \(A\) is \(A = U \Sigma V^\top\). The system can be solved using the SVD by setting up \(U z = b\), \(\Sigma y = z\), and \(V^\top x = y\).

Your task is to analyse the condition numbers of each stage of this process:

Determine the condition number of the matrix \(U\) and explain its significance in solving \(U z = b\).
Calculate the condition number of the diagonal matrix \(\Sigma\) and discuss how it affects the solution of \(\Sigma y = z\).
Evaluate the condition number of \(V^\top\) and its impact on solving \(V^\top x = y\).
Finally, relate the condition numbers found in steps 1-3 to the overall condition number of the matrix \(A\) and explain how they influence the stability and accuracy of solving the linear system \(Ax = b\).

Optional material#

Existence of the Singular Value Decomposition

Theorem 5 (Existence of Singular Value Decomposition)

Every real matrix \(A \in \mathbb{R}^{m \times n}\) has a singular value decomposition (SVD).

Proof. We utilise an inductive argument to establish the existence of the SVD for any matrix \(A \in \mathbb{R}^{m \times n}\).

Base Case: Define the largest singular value of \(A\), denoted as \(\sigma_1\), to be \(\sigma_1 := \|A\|_2\). Due to the properties of finite-dimensional vector spaces, there exists a unit vector \(v_1 \in \mathbb{R}^n\), maximising \(\max_{v \neq 0} \| A v \|_2 / \|v\|_2\), such that \(Av_1 = \sigma_1 u_1\) for some unit vector \(u_1 \in \mathbb{R}^m\).
Formation of Orthogonal Matrices \(U_1\) and \(V_1\): Construct orthogonal matrices \(U_1\) and \(V_1\) in the form:

\[ V_1 = \begin{pmatrix} v_1 & \hat{V}_1 \end{pmatrix}, \quad U_1 = \begin{pmatrix} u_1 & \hat{U}_1 \end{pmatrix} \]

where \(\hat{V}_1\) and \(\hat{U}_1\) are matrices that complete \(v_1\) and \(u_1\) to orthonormal bases. Their existence is guaranteed by the Gram-Schmidt method.
Transformed Matrix \(S\): Consider the matrix \(S := U_1^\top A V_1\), which takes the form:

\[\begin{split} S = \begin{pmatrix} \sigma_1 & w^\top \\ 0 & B \end{pmatrix} \end{split}\]

where \(w\) is a vector in \(\mathbb{R}^{n-1}\) and \(B\) is a \((m-1) \times (n-1)\) matrix.
Proof that \(w = 0\): To show \(w = 0\), we use the properties of matrix norms:

\[\begin{split} \|S\|_2 = \|U_1^\top A V_1\|_2 \geq \left\| \begin{pmatrix} \sigma_1 & w^\top \\ 0 & B \end{pmatrix} \begin{pmatrix} \sigma_1 \\ w \end{pmatrix} \right\|_2 \bigg/ \left\| \begin{pmatrix} \sigma_1 \\ w \end{pmatrix} \right\|_2 \geq \left( \sigma_1^2 + \|w\|^2 \right)^{1/2} \end{split}\]

The inequality \(\|Ax\| / \|x\| \leq \|A\|\) and the disregard of the contribution of \(Bw\) in the norm lead to the conclusion that \(\sigma_1 = \|S\|_2 \geq \left( \sigma_1^2 + \|w\|^2 \right)^{1/2}\), implying \(w = 0\).
Inductive Step: Assume the SVD of \(B\) exists: \(B = \hat{U} \hat{\Sigma} \hat{V}^\top\). Then \(A\) can be decomposed as:

\[\begin{split} A = U_1 \begin{pmatrix} 1 & 0 \\ 0 & \hat{U} \end{pmatrix} \begin{pmatrix} \sigma_1 & 0 \\ 0 & \hat{\Sigma} \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & \hat{V}^\top \end{pmatrix} V_1^\top \end{split}\]

Defining

\[\begin{split} U = U_1 \begin{pmatrix} 1 & 0 \\ 0 & \hat{U} \end{pmatrix}, \qquad V = \begin{pmatrix} 1 & 0 \\ 0 & \hat{V}^\top \end{pmatrix} V_1^\top, \end{split}\]

we complete the inductive step.
Conclusion: Ultimately, the induction will reach a row or column vector, for which the existence of an SVD trivially holds.

The above proof does not inform us how to design an efficient numerical algorithm to construct the SVD of a matrix.

Singular value decomposition

Contents

Singular value decomposition#

Insights#

SVD and low-rank approximation#

The reduced SVD#

Python skills#

Self-check questions#

Optional material#