The Singular Value Decomposition#

A matrix \(\Sigma \in \mathbb{R}^{m \times n}\) is called diagonal if \(\sigma_{ij} = 0\) for all \(i \neq j\).

Theorem 1 (Singular Value Decomposition)

For any matrix \(A \in \mathbb{R}^{m \times n}\), there exists a singular value decomposition (SVD)

\[ A = U\Sigma V^T, \]

where \(U \in \mathbb{R}^{m \times m}\) and \(V \in \mathbb{R}^{n \times n}\) are orthogonal matrices and \(\Sigma\) is diagonal with non-negative descending diagonal elements \(\sigma_i := \sigma_{ii}\):

\[ \sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_n \geq 0. \]

The proof is given below in the optional materials section. For the remainder of the page, assume that \(m \geq n\). The matrix \(\Sigma \in \mathbb{R}^{m \times n}\) is a block matrix of the form

\[\begin{split} \Sigma = \begin{bmatrix} \tilde{\Sigma} \\ 0 \end{bmatrix}, \end{split}\]

where \(\tilde{\Sigma} = \text{diag}(\sigma_1, \sigma_2, \dots, \sigma_r, 0, \dots, 0)\) is a diagonal matrix with non-negative real numbers \(\sigma_1, \sigma_2, \dots, \sigma_r\) on the diagonal, and \(r\) represents the rank of the matrix \(A\).

These diagonal entries \(\sigma_i\) (for \(i = 1, 2, \dots, n\)) are known as the singular values of \(A\). The columns \(v_j\) of \(V\) are called right singular vectors. The columns \(u_j\) of \(U\) are called left singular vectors.

Insights

  1. Rank determination: The rank of matrix \(A\) is equal to \(r\), defined as the count of non-zero singular values in the decomposition.

  2. Kernel characterisation: The kernel (or null space) of \(A\), denoted as \(\text{ker}(A)\), is the span of the vectors \(\{v_{r+1}, \dots, v_n\}\).

  3. Range specification: The range of \(A\), represented as \(\text{ran}(A)\), is the span of the vectors \(\{u_1, \dots, u_r\}\).

  4. \(2\)-norm relation: The \(2\) norm of \(A\) is equal to the greatest singular value, \(\sigma_1\). Hence, \(\|A\|_2 = \sigma_1\).

  5. Frobenius norm calculation: Utilising the property that \(\|Q A P\|_F = \|A\|_F\) for any orthogonal matrices \(Q\) and \(P\), we find that

    \[\|A\|_F =\left(\sum_{j=1}^r\sigma_j^2\right)^{1/2}.\]
  6. Norm inequalities: As a direct result, we deduce the inequality \(\|A\|_2 \leq \|A\|_F \leq \sqrt{r}\|A\|_2\).

  7. Inverse and condition number: For an invertible matrix \(A\), its inverse is \(A^{-1} = V\Sigma^{-1}U^T\), and \(\|A^{-1}\|_2 = \sigma_n^{-1}\). The relative condition number, \(\kappa_{rel}(A)\), is thus \(\frac{\sigma_1}{\sigma_n}\). The smallest perturbation that makes \(A + \Delta A\) singular is \(\Delta A = -\sigma_n u_n v_n^T\). Therefore, the condition number inversely relates to the smallest relative perturbation distance to singularity.

See also the related self-check question below.

SVD and low-rank approximation#

The singular value decomposition of matrix \(A\), represented as \(A = U\Sigma V^T\), can be expansively expressed as a sum of rank-1 matrices: For the element \(a_{ij}\) we find

\[ a_{ij} = \sum_{k,\ell = 1}^n u_{ik} \, \sigma_{k\ell} \, v_{j\ell} = \sum_{k = 1}^r u_{ik} \, \sigma_k \, v_{jk}. \]

Thus, for the matrix \(A\),

\[ A = \sigma_1 u_1 v_1^T + \sigma_2 u_2 v_2^T + \dots + \sigma_r u_r v_r^T, \]

where each term \(\sigma_j u_j v_j^T\) is a matrix of rank 1. Let us define a partial sum of these terms as \(A_\nu\), given by:

\[ A_\nu := \sum_{k=1}^\nu \sigma_k u_k v_k^T, \]

for any \(0 \leq \nu \leq r\). This partial sum captures the contribution of the first \(\nu\) singular values and their corresponding singular vectors to the matrix \(A\).

The spectral norm of the difference between \(A\) and \(A_\nu\) is equal to the \((\nu+1)\)-th singular value, \(\sigma_ {\nu+1}\). Thus, we have:

\[ \|A - A_{\nu}\|_2 = \sigma_{\nu+1}. \]

This expression quantifies the significance of each singular value and its associated rank-1 component in approximating the original matrix \(A\).

Fact: Low-rank matrix approximation

Let \(A\) and \(A_{\nu}\) be defined as above. Then

\[ \|A - A_{\nu}\|_2 = \min_{B\in\mathbb{R}^{m\times n}, \text{ rank}(B)\leq\nu} \|A - B\|_2 = \sigma_{\nu+1}. \]

This formulation highlights the utility of singular value decomposition (SVD) in reducing computational costs in numerical linear algebra tasks. Traditional matrix-matrix multiplications, without using advanced techniques like Strassen’s algorithm, typically scale with a computational complexity of \(\mathcal{O}(N^3)\). However, when dealing with the multiplication of two rank-1 matrices, such as \(\sigma u v^T\) and \(\hat{\sigma} \hat{u} \hat{v}^T\), the process becomes much more efficient. The critical step involves calculating the scalar product \(\langle v, \hat{u}\rangle\), which is then scaled by the product of \(\sigma\) and \(\hat{\sigma}\). This results in the rank-1 representation of the overall product:

\[ (\sigma u v^T) (\hat{\sigma} \hat{u} \hat{v}^T) = (\sigma \hat{\sigma} \langle v, \hat{u}\rangle) u \hat{v}^T. \]

Building on this approach, the multiplication of rank-\(\nu\) approximations of two matrices \(A\) and \(B\) can be executed with a computational cost of \(\mathcal{O}(n \cdot \nu^2)\):

\[ A_\nu \cdot \hat{A}_\nu = \sum_{k,\ell} (\sigma_k u_k v_k^T) (\hat{\sigma}_\ell \hat{u}_\ell \hat{v}_\ell^T) = \sum_{k,\ell} (\sigma_k \hat{\sigma}_\ell \langle v_k, \hat{u}_\ell\rangle) u_k \hat{v}^T_\ell. \]

This method is particularly advantageous in applications where the singular values diminish rapidly in magnitude. In such scenarios, using a rank-\(\nu\) approximation yields substantial computational savings, often with only a moderate or negligible compromise in accuracy. This balance between efficiency and precision makes SVD a powerful tool in large-scale numerical computations.

The reduced SVD#

Above, we introduced the form of the SVD, which is known as the full SVD of \(A \in \mathbb{R}^{m \times n}\):

\[\begin{split} A = \begin{bmatrix} U_1 & U_2 \end{bmatrix} \begin{bmatrix} \hat{\Sigma} & 0 \\ 0 & 0 \end{bmatrix} \begin{bmatrix} V_1 & V_2 \end{bmatrix}^T, \end{split}\]

where \(U = \begin{bmatrix} U_1 & U_2 \end{bmatrix} \in \mathbb{R}^{m \times m}\) and \(V = \begin{bmatrix} V_1 & V_2 \end{bmatrix} \in \mathbb{R}^{n \times n}\) are orthogonal matrices derived from the SVD. Here,

\[\begin{split} \Sigma = \begin{bmatrix} \hat{\Sigma} & 0 \\ 0 & 0 \end{bmatrix} \end{split}\]

represents the diagonal matrix in the decomposition. The columns of \(U_2\) span the orthogonal complement of the range of \(A\), and the columns of \(V_2\) constitute the null space of \(A\). The matrix \(\hat{\Sigma} \in \mathbb{R}^{r \times r}\) is a diagonal matrix with its diagonal elements consisting of the positive singular values \(\sigma_1, \dots, \sigma_r\).

The reduced SVD of \(A\) is defined as:

\[ A = U_1\hat{\Sigma}V_1^T. \]

This reduced SVD is usually sufficient for many computational applications. It leaves out those parts of \(U\), \(\Sigma\) and \(V\), which do not affect the product \(U\Sigma V^T\).

Python skills#

We still need to develop more theory to state a numerical method for finding the SVD of a matrix. Nonetheless, the SVD already serves us as an essential and flexible tool to tackle many challenges of computational mathematics.

The below code illustrates how to compute the SVD of a matrix in NumPy.

import numpy as np

# Define a matrix A
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Compute the SVD
U, S, VT = np.linalg.svd(A)

# U and VT are the orthogonal matrices, and S contains the singular values.
# Since S is returned as a 1D array, let us convert it to a diagonal matrix
Sigma = np.zeros(A.shape)
np.fill_diagonal(Sigma, S)

# Print the results
print("Matrix A:")
print(A)
print("\nU matrix:")
print(U)
print("\nSigma matrix:")
print(Sigma)
print("\nV^T matrix:")
print(VT)

# Reconstruct the original matrix
A_reconstructed = U @ Sigma @ VT
print("\nReconstructed Matrix A:")
print(A_reconstructed)

Self-check questions#

Question

Find the SVD of

\[\begin{split} A = \begin{bmatrix} 0 & 2 & 0\\ -3 & 0 & 0\\ 0 & 0 & 1 \end{bmatrix}. \end{split}\]
Answer

While we have yet to discuss a general technique for finding the SVD, we can still determine it in simple enough settings using an ad hoc analysis.

The matrix has only one non-zero entry in each column and row, a critical step towards finding the singular values. We order the entries by modulus and place them on the diagonal by permuting the first two rows. This can be achieved by multiplication with a permutation matrix from the left. Adjusting signs ensures all singular values are non-negative:

\[\begin{split} A = \begin{bmatrix} 0 & 1 & 0\\ -1 & 0 & 0\\ 0 & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} 3 & 0 & 0\\ 0 & 2 & 0\\ 0 & 0 & 1 \end{bmatrix} \cdot \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} = U \Sigma V^T. \end{split}\]

Question

Explain why each of the insights listed above (just below the theorem about existence) of the SVD holds.

Answer

To prove these insights, let us assume that the SVD of a \(A\) is given by \(A = U \Sigma V^T\).

Rank determination: We first show that \(\text{rank}(A) \geq r\). Indeed, this is the case if one can find \(r\) linearly independent vectors in the range of \(A\). Because \(U\) is orthogonal, its first \(r\) columns \(\{u_1, \ldots, u_r\}\) are linearly independent. Moreover, \(u_j \in \text{ran}(A)\) because the \(j\)th column vector \(v_j\) of \(V\) maps

\[ A \, v_j = U \, \Sigma V^T \, v_j = U \, \Sigma \, e_j = \sigma_j \, U e_j = \sigma_j \, u_j. \]

Now we show that \(\text{rank}(A) \leq r\), recall the reduced SVD form \(A = U_1\hat{\Sigma}V_1^T\). This implies \(\text{rank}(A) \leq \text{rank}(U_1)\). Because \(U_1\) has only \(r\) columns, the result follows.

Kernel characterization: \(Ax = 0\) implies \(U \Sigma V^T x = 0\). Multiplying both sides by \(U^T\), we get \(\Sigma V^T x = 0\). This equation can only be satisfied by vectors in the span of the columns of \(V\) corresponding to zero singular values, which are \(\{v_{r+1}, \dots, v_n\}\).

Range specification: The above argument for the rank determination showed that the columns of \(A\) are spanned by the columns of \(U\) corresponding to non-zero singular values in \(\Sigma\). Hence, \(\text{ran}(A)\) is spanned by \(\{u_1, \dots, u_r\}\).

\(2\)-Norm relation: We matrix norm is induced by the Euclidean norm:

\[\begin{split} \begin{align} \| A \|_2 & = \max_{x \neq 0} \frac{\| A x \|_2}{\| x \|_2} = \max_{x \neq 0} \frac{\| U \Sigma V^T x \|_2}{\| x \|_2} \stackrel{\| x \| = \| V^T x \|}{=} \max_{x \neq 0} \frac{\| U \Sigma V^T x \|_2}{\| V^T x \|_2}\\ & \stackrel{V^T x = y}{=} \max_{y \neq 0} \frac{\| U \Sigma y \|_2}{\| y \|_2} \stackrel{\| U \Sigma y \| = \| \Sigma y \|}{=} \max_{y \neq 0} \frac{\| \Sigma y \|_2}{\| y \|_2} \stackrel{(*)}{=} \sigma_1. \end{align} \end{split}\]

For \((*)\) we use that \(\sigma_1\) is attained by \(\| \Sigma y \| / \| y \|\) with \(y = e_1\) and that \(\| \Sigma y \|^2 \leq \sum_i (\sigma_1 y_i)^2 = \sigma_1^2 \| y \|^2\).

Frobenius norm calculation: To show that multiplication with an orthogonal matrix from the left and right does not change the Frobenius norm, we use that the multiplication with an orthogonal matrix does not change the \(2\)-norm of a vector:

\[\begin{split} \begin{align} \sum_{i,j} |(Q A P)_{ij}|^2 & = \sum_j \| (Q A P)_{:,j} \|_2^2 = \sum_j \| (A P)_{:,j} \|_2^2 = \| A P \|_F^2\\ & = \| P^T A^T \|_F^2 = \sum_j \| (P^T A^T)_{:,j} \|_2^2 = \| A^T \|F^2. \end{align} \end{split}\]

Therefore,

\[ \| A \|_F^2 = \| U \Sigma V^T \|_F^2 = \| \Sigma \|_F^2 = \sum_{i = 1}^r \sigma_i^2. \]

Norm inequalities: We note that \(\| A \|_2^2 = \sigma_1^2 \leq \sum_{i = 1}^r \sigma_i^2 = \| A \|_F^2 \leq r \sigma_1^2 = r \| A \|_2^2\).

Inverse and condition number: Let \(A\) be invertible. Because \((U \Sigma V^T) (V \Sigma^{-1} U^T) = I\), it is clear that \(A^{-1} = V \Sigma^{-1} U^T\). Therefore, \(\|A^{-1}\|_2\) is \(\sigma_n^{-1}\), the reciprocal of the smallest singular value and the relative condition number for the \(2\)-norm is \(\kappa_{rel}(A) = \| A \|_2 \cdot \| A^{-1} \|_2 = \sigma_1 / \sigma_n\).

Using the theorem of the low-rank matrix approximation, the smallest perturbation \(\Delta A\) that makes \(A + \Delta A\) singular corresponds to the smallest singular value, so \(\Delta A = -\sigma_n u_n v_n^T\), implying that the condition number is inversely related to the smallest relative perturbation distance to singularity.

Question

Let \(A \in \mathbb{R}^{m \times n}\), \(m \geq n\). Show that the singular values of \(A\) are the square roots of the eigenvalues of \(A^TA\).

Answer

To show that the singular values of \(A\) are the square roots of the eigenvalues of \(A^TA\), consider the singular value decomposition of \(A\), which is \(A = U\Sigma V^T\), where \(\Sigma\) is a diagonal matrix containing the singular values of \(A\).

Now, compute \(A^TA\):

\[ A^TA = (U\Sigma V^T)^T(U\Sigma V^T) = V\Sigma^TU^TU\Sigma V^T = V\Sigma^2V^T. \]

Since \(U\) and \(V\) are orthogonal matrices, \(U^TU = I_m\) and \(V^TV = I_n\), where \(I_m\) and \(I_n\) are identity matrices of size \(m\) and \(n\), respectively.

In the equation \(A^TA = V\Sigma^2V^T\), the matrix \(\Sigma^2\) contains the squares of the singular values of \(A\). The matrix \(A^TA\) is symmetric, and its eigenvalues are given by the diagonal elements of \(\Sigma^2\). Thus, the singular values of \(A\) are the square roots of the eigenvalues of \(A^TA\).

Question

Let \(A\in\mathbb{R}^{m\times n}\) with \(m\geq n\). Show that the smallest singular value \(\sigma_n\) of \(A\) satisfies

\[ \sigma_n = \min_{x\in\mathbb{R}^n} \frac{\|Ax\|_2}{\|x\|_2}. \]
Answer

The singular value decomposition of \(A\) is given as \(A = U\Sigma V^T\). Since the 2-norm is invariant under orthogonal transformations, we can replace \(A\) by \(\Sigma\) in the expression \(\|Ax\|_2\) without changing its value. Let \(y = V^T x\), which implies that \(\|y\|_2 = \|x\|_2\) due to the orthogonality of \(V\). Therefore, the expression becomes:

\[\begin{split} \min_{\substack{x \in \mathbb{R}^n \\ x \neq 0}} \frac{\|Ax\|_2}{\|x\|_2} = \min_{\substack{y \in \mathbb{R}^n \\ y \neq 0}} \frac{\|\Sigma y\|_2}{\|y\|_2}. \end{split}\]

Now, for any \(y\) with \(\|y\|_2=1\), we have:

\[ \|\Sigma y\|_2^2 = \sum_{j=1}^n \sigma_j^2 y_j^2 \geq \sigma_n^2 \sum_{j=1}^n y_j^2 = \sigma_n^2, \]

where the inequality arises because \(\sigma_n\) is the smallest singular value. This inequality becomes equality when \(y\) is chosen as the unit vector \(e_n\), corresponding to the direction of the smallest singular value in the transformed space.

Therefore, we conclude that the smallest singular value \(\sigma_n\) of \(A\) is the minimum of the ratio \(\frac{\|Ax\|_2}{\|x\|_2}\) over all non-zero vectors \(x \in \mathbb{R}^n\). This result shows the significance of the smallest singular value in determining the behaviour of the matrix norm relative to the vector norm.

The exercise reveals symmetry in characterising singular values: maximising the ratio yields the largest singular value while minimising it results in the smallest singular value.

Question

Explain how the orthogonal complement of the range of a matrix \(A \in \mathbb{R}^{m \times n}\) can be computed with its SVD \(U \Sigma V^T\).

Answer

Let us denote the number of non-zero singular values as \(r\). As established above (‘3rd insight’), the first \(r\) columns of \(U\) form a basis for \(\text{ran}(A)\). Since \(U\) is orthogonal, its columns are orthonormal vectors. Thus, the set of columns \(\{ u_{r+1}, u_{r+2}, \ldots, u_m \}\) of \(U\) are orthonormal to the first \(r\) columns of \(U\) and hence to all vectors in \(\text{ran}(A)\). The dimension of the orthogonal complement is \(m-r\). Hence the vectors \(\{ u_{r+1}, u_{r+2}, \ldots, u_m \}\) form a basis for the orthogonal complement.

Question

Let \(A = QR\) be the QR decomposition of a matrix \(A\in\mathbb{R}^{m\times n}\) with \(R\in\mathbb{R}^{n\times n}\). Show that the singular values of \(A\) are identical to those of \(R\).

Answer

We represent \(A\) with its full QR decomposition:

\[\begin{split} A = \begin{bmatrix} Q & Q^{\perp} \end{bmatrix} \begin{bmatrix} R \\ 0 \end{bmatrix}, \end{split}\]

where \(Q^{\perp} \in \mathbb{R}^{m \times (m-n)}\) is the orthogonal complement of \(Q\). Now, let the singular value decomposition (SVD) of \(R\) be \(R = U\Sigma V^T\), where \(\Sigma\) contains the singular values of \(R\).

Substituting the SVD of \(R\) into the QR decomposition, we get:

\[\begin{split} A = \begin{bmatrix} Q & Q^{\perp} \end{bmatrix} \begin{bmatrix} U & 0 \\ 0 & I \end{bmatrix} \begin{bmatrix} \Sigma \\ 0 \end{bmatrix} V^T. \end{split}\]

Here, \(I\) is the identity matrix of appropriate size. Simplifying, we obtain:

\[\begin{split} A = \left( \begin{bmatrix} Q & Q^{\perp} \end{bmatrix} \begin{bmatrix} U & 0 \\ 0 & I \end{bmatrix} \right) \begin{bmatrix} \Sigma \\ 0 \end{bmatrix} V^T. \end{split}\]

Notice that the matrix product

\[\begin{split}\begin{bmatrix} Q & Q^{\perp} \end{bmatrix} \begin{bmatrix} U & 0 \\ 0 & I \end{bmatrix}\end{split}\]

is orthogonal. Thus, the matrix \(A\) has been expressed in a form that reveals its singular value decomposition, where the singular values are precisely those contained in \(\Sigma\). Therefore, the singular values of \(A\) are identical to those of \(R\).

Question

Find all SVDs of the identity matrix \(I\).

Answer

Let \(I = U\Sigma V^T\) be an SVD of \(I\).

  1. To show that \(\Sigma = I\), we observe that \(1 = \| v_i \|_2 = \| (U\Sigma V^T) v_i \|_2 = \| \Sigma (V^T v_i) \|_2 = \sigma_i\) for the \(i\)th column vector \(v_i\) of \(V\) and singular value \(\sigma_i\). We used here that \(V\) and \(U\) are orthogonal matrices.

  2. We now turn our attention to \(U\) and \(V\). Because of the first step, the SVD simplifies to \(I = U V^T\), meaning that \(V^T\) is the inverse of \(U\). As the inverse of an orthogonal matrix is its transpose, we conclude \(U = V\). Because all orthogonal matrices \(Q\) satisfy \(I = Q Q^T\), we deduce that the SVDs of \(I\) are precisely the factorisations of the form

    \[ I = Q I Q^T, \qquad Q \text{ orthogonal.} \]

This example shows that the SVD of a matrix is generally not unique.

Question

Consider a full-rank, square linear system \(Ax = b\), where \(A \in \mathbb{R}^{n \times n}\) and \(b \in \mathbb{R}^n\). The singular value decomposition (SVD) of \(A\) is \(A = U \Sigma V^T\). The system can be solved using the SVD by setting up \(U z = b\), \(\Sigma y = z\), and \(V^T x = y\).

Your task is to analyse the condition numbers of each stage of this process:

  1. Determine the condition number of the matrix \(U\) and explain its significance in solving \(U z = b\).

  2. Calculate the condition number of the diagonal matrix \(\Sigma\) and discuss how it affects the solution of \(\Sigma y = z\).

  3. Evaluate the condition number of \(V^T\) and its impact on solving \(V^T x = y\).

  4. Finally, relate the condition numbers found in steps 1-3 to the overall condition number of the matrix \(A\) and explain how they influence the stability and accuracy of solving the linear system \(Ax = b\).

Answer
  1. Condition Number of \(U\): Since \(U\) is an orthogonal matrix, its condition number is 1. This implies that the solution of \(Uz = b\) is numerically stable and not sensitive to small changes in \(b\).

  2. Condition Number of \(\Sigma\): The condition number of \(\Sigma\), which is diagonal, is the ratio of the largest singular value to the smallest singular value of \(A\).

  3. Condition Number of \(V^T\): Like \(U\), \(V^T\) is also an orthogonal matrix; thus, its condition number is 1. This ensures that solving \(V^T x = y\) is numerically stable.

  4. Overall Condition Number of \(A\): The condition number of \(A\) is the same as that of \(\Sigma\), as both \(U\) and \(V^T\) have condition numbers of 1. The condition number of \(A\) affects the sensitivity of the solution \(x\) to changes in \(b\). A high condition number indicates potential numerical instability and inaccuracy in the computed solution, especially when \(A\) is close to singular.

Optional material#

Existence of the Singular Value Decomposition

Theorem 2 (Existence of Singular Value Decomposition)

Every real matrix \(A \in \mathbb{R}^{m \times n}\) has a singular value decomposition (SVD).

Proof. We utilise an inductive argument to establish the existence of the SVD for any matrix \(A \in \mathbb{R}^{m \times n}\).

  1. Base Case: Define the largest singular value of \(A\), denoted as \(\sigma_1\), to be \(\sigma_1 := \|A\|_2\). Due to the properties of finite-dimensional vector spaces, there exists a unit vector \(v_1 \in \mathbb{R}^n\), maximising \(\max_{v \neq 0} \| A v \|_2 / \|v\|_2\), such that \(Av_1 = \sigma_1 u_1\) for some unit vector \(u_1 \in \mathbb{R}^m\).

  2. Formation of Orthogonal Matrices \(U_1\) and \(V_1\): Construct orthogonal matrices \(U_1\) and \(V_1\) in the form:

    \[ V_1 = \begin{bmatrix} v_1 & \hat{V}_1 \end{bmatrix}, \quad U_1 = \begin{bmatrix} u_1 & \hat{U}_1 \end{bmatrix} \]

    where \(\hat{V}_1\) and \(\hat{U}_1\) are matrices that complete \(v_1\) and \(u_1\) to orthonormal bases. Their existence is guaranteed by the Gram-Schmidt method.

  3. Transformed Matrix \(S\): Consider the matrix \(S := U_1^T A V_1\), which takes the form:

    \[\begin{split} S = \begin{bmatrix} \sigma_1 & w^T \\ 0 & B \end{bmatrix} \end{split}\]

    where \(w\) is a vector in \(\mathbb{R}^{n-1}\) and \(B\) is a \((m-1) \times (n-1)\) matrix.

  4. Proof that \(w = 0\): To show \(w = 0\), we use the properties of matrix norms:

    \[\begin{split} \|S\|_2 = \|U_1^T A V_1\|_2 \geq \left\| \begin{bmatrix} \sigma_1 & w^T \\ 0 & B \end{bmatrix} \begin{bmatrix} \sigma_1 \\ w \end{bmatrix} \right\|_2 \bigg/ \left\| \begin{bmatrix} \sigma_1 \\ w \end{bmatrix} \right\|_2 \geq \left( \sigma_1^2 + \|w\|^2 \right)^{1/2} \end{split}\]

    The inequality \(\|Ax\| / \|x\| \leq \|A\|\) and the disregard of the contribution of \(Bw\) in the norm lead to the conclusion that \(\sigma_1 = \|S\|_2 \geq \left( \sigma_1^2 + \|w\|^2 \right)^{1/2}\), implying \(w = 0\).

  5. Inductive Step: Assume the SVD of \(B\) exists: \(B = \hat{U} \hat{\Sigma} \hat{V}^T\). Then \(A\) can be decomposed as:

    \[\begin{split} A = U_1 \begin{bmatrix} 1 & 0 \\ 0 & \hat{U} \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 \\ 0 & \hat{\Sigma} \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & \hat{V}^T \end{bmatrix} V_1^T \end{split}\]

    Defining

    \[\begin{split} U = U_1 \begin{bmatrix} 1 & 0 \\ 0 & \hat{U} \end{bmatrix}, \qquad V = \begin{bmatrix} 1 & 0 \\ 0 & \hat{V}^T \end{bmatrix} V_1^T, \end{split}\]

    we complete the inductive step.

  6. Conclusion: Ultimately, the induction will reach a row or column vector, for which the existence of an SVD trivially holds.

The above proof does not inform us how to design an efficient numerical algorithm to construct the SVD of a matrix.