QR decomposition

QR decomposition#

Orthogonal matrices#

Let \(Q \in \mathbb{R}^{n \times n}\). The matrix \(Q\) is called orthogonal if \(Q^\top Q = I\), where \(I\) is the identity matrix. This definition implies that all rows and columns of \(Q\) are of unit length and are mutually orthogonal.

The significance of orthogonal transformations lies in their ability to preserve angles and the 2-norm. To understand this, consider the standard Euclidean inner product, defined as \(\langle x, y \rangle := y^\top x\) for vectors \(x, y \in \mathbb{R}^n\).

Now, observe the effect of an orthogonal transformation \(Q\) on this inner product:

\[ \langle Qx, Qy \rangle = (Qx)^\top (Qy) = x^\top Q^\top Qy = x^\top Iy = \langle x, y \rangle. \]

This equation shows that the inner product (and thus the angle) between \(x\) and \(y\) is preserved after applying the orthogonal transformation \(Q\).

Furthermore, the preservation of the 2-norm under orthogonal transformations can be seen as follows:

\[ \|Qx\|_2^2 = \langle Qx, Qx \rangle = x^\top Q^\top Qx = x^\top x = \|x\|_2^2. \]

This demonstrates that a vector’s 2-norm (or length) is unchanged when multiplied by an orthogonal matrix.

QR Decomposition via Gram-Schmidt orthogonalisation#

Consider a matrix \(A \in \mathbb{R}^{m \times n}\) with \(m \geq n\). The aim is to orthogonalise the columns of \(A\). This process is achieved through the Gram-Schmidt orthogonalisation method.

Algorithm 1 (Gram-Schmidt Orthogonalisation)

Inputs \(A \in \mathbb{R}^{m \times n}\)

Outputs \(Q, R \in \mathbb{R}^{m \times n}\)

Let \(\tilde{q}_1\) be the first column \(a_1\) of \(A\) and normalize \(\tilde{q}_1\) to obtain \(q_1\), defined as:

\[\begin{split} \begin{aligned} \tilde{q}_1 &:= a_1,\\ q_1 &:= \frac{\tilde{q}_1}{\|\tilde{q}_1\|_2} \end{aligned} \end{split}\]
For \(k\) in \(\{2, \ldots, n\}\):
- To compute column \(q_k\), orthogonalize column \(a_k\) against all previous \(q_j\) (for \(j = 1, \ldots, k-1\)):
  
  \[\begin{split} \begin{aligned} \tilde{q}_k &:= a_k - \sum_{j=1}^{k-1}\langle a_k, q_j \rangle q_j,\\ q_k &:= \frac{\tilde{q}_k}{\|\tilde{q}_k\|_2}. \end{aligned} \end{split}\]

Matrix decomposition representation#

The Gram-Schmidt procedure can be expressed as a matrix decomposition:

For each column \(a_k\) of \(A\), we have:

\[ a_k = \|\tilde{q}_k\|_2 q_k + \sum_{j=1}^{k-1}\langle a_k, q_j \rangle q_j \]
Hence, the matrix \(A\) can be decomposed as \(A = QR\) where:
- \(Q = \begin{pmatrix} q_1, \ldots, q_n \end{pmatrix}\) is a matrix with orthogonal columns.
- \(R\) is an upper triangular matrix defined as:
  
  \[\begin{split} R = \begin{pmatrix} \|\tilde{q}_1\|_2 & \langle a_2, q_1 \rangle & \ldots & \langle a_n, q_1 \rangle \\ & \|\tilde{q}_2\|_2 & \ddots & \vdots \\ & & \ddots & \langle a_n, q_{n-1} \rangle \\ & & & \|\tilde{q}_n\|_2 \end{pmatrix} \end{split}\]

Fact: QR Decomposition

Given \(A \in \mathbb{R}^{m \times n}\) with \(m \geq n\), a QR decomposition of \(A = QR\) exists, where \(Q\) is an \(m \times n\) matrix satisfying \(Q^\top Q = I\) and where \(R\) is an \(n \times n\) upper triangular matrix. If \(A\) has a full rank, then the QR decomposition is unique if we require the diagonal elements of \(R\) to be positive.

Solving linear systems of equations with the QR decomposition#

To solve \(Ax = b\), we can employ the QR decomposition of \(A \in \mathbb{R}^{n \times n}\).

The linear system \(Ax = b\) can be rewritten using the QR decomposition:

Original System: \(Ax = b\).
Substituting \(A = QR\): \(QRx = b\).
Left-multiplying by \(Q^\top\) (since \(Q\) is orthogonal, \(Q^\top Q = I\)): \(Q^\top QRx = Q^\top b\).
Simplifying (as \(Q^\top Q = I\)): \(Rx = Q^\top b\).

The last equation, \(Rx = Q^\top b\), represents an upper triangular system. This can be efficiently solved by backward substitution, starting from the previous equation and substituting backwards to find all unknowns.

Backward Stability: Solving linear systems via QR decomposition can be implemented so that it is backward stable.
Comparison with LU Decomposition: Despite its stability, QR decomposition method for solving linear systems is less commonly used than LU decomposition in practice. This is primarily due to its computational cost. The QR decomposition typically requires about twice as many operations as the LU decomposition, making it less efficient for large-scale problems. However, recall that LU decomposition is not backward stable.

Reduced vs full QR decomposition#

Consider the QR decomposition \(A = QR\) with \(Q\in\mathbb{R}^{m\times n}\) and \(R\in\mathbb{R}^{n\times n}\). We call this QR decomposition a reduced QR decomposition. To define the full QR decomposition, let \(Q^{\bot}\in\mathbb{R}^{m\times m-n}\) be a matrix whose columns are orthonormal and satisfy \(\hat{Q}^\top Q = 0\) (i.e. they are a basis of the orthogonal complement of the range of \(A\)). We can reformulate the QR decomposition as

\[\begin{split} A = \begin{pmatrix}Q & Q^{\bot}\end{pmatrix}\begin{pmatrix}R\\ 0\end{pmatrix} =: \tilde{R}\tilde{Q}. \end{split}\]

The QR decomposition \(A = \tilde{Q}\tilde{R}\) is called full QR decomposition. It is rarely used in practical computation as the columns of \(Q^{\bot}\) are usually not required. However, it is essential theoretically, as \(Q^{\bot}\) contains information about the range of \(A\) and its orthogonal complement. Moreover, \(\tilde{Q}\) is an orthogonal matrix, while \(Q\) is only a submatrix of an orthogonal matrix.

Python skills#

Implementation from scratch#

Step 1: Import NumPy

import numpy as np

Step 2: Implement the Gram-Schmidt method

def gram_schmidt(A):
    m, n = A.shape
    Q = np.zeros((m, n))
    R = np.zeros((n, n))

    for k in range(n):
        Q[:, k] = A[:, k]
        for i in range(k):
            R[i, k] = np.dot(Q[:, i].T, A[:, k])
            Q[:, k] -= R[i, k] * Q[:, i]
        R[k, k] = np.linalg.norm(Q[:, k])
        Q[:, k] /= R[k, k]
    return Q, R

Step 3: Test with the matrix of the first self-check question

# Define a matrix
A = np.array([[3, 2], [1, 4], [0, 5]])

# Perform QR Decomposition
Q, R = gram_schmidt(A)

print("Matrix Q:\n", Q)
print("Matrix R:\n", R)

Step 4: Verify the decomposition

We can verify the decomposition by checking if \(A \approx QR\).

# Check if A is approximately equal to QR
A_approx = np.dot(Q, R)
print("Matrix A reconstructed from Q and R:\n", A_approx)

Using NumPy’s implementation#

Here is an example using NumPy’s implementation:

import numpy as np

# Step 1: Create a random matrix
A = np.random.rand(3, 3)
print("Matrix A:")
print(A)

# Step 2: Perform QR decomposition
Q, R = np.linalg.qr(A)

# Step 3: Display results
print("Orthogonal Matrix Q:")
print(Q)
print("Upper Triangular Matrix R:")
print(R)

# Step 4: Verification
print("Product QR:")
print(np.dot(Q, R))

print("Product Q^T Q (should be close to identity):")
print(np.dot(Q.T, Q))

Self-check questions#

Question

Find the QR decomposition of the matrix

\[\begin{split} A = \begin{pmatrix} 3 & 2 \\ 1 & 4 \\ 0 & 5 \end{pmatrix}.\end{split}\]

using the Gram-Schmidt method.

Question

A matrix \(A\) is called tridiagonal if on its diagonal and first off-diagonals contain nonzero entries: \(a_{ij} = 0\) if \(|i - j| \geq 2\). Show that if \(A\) is tridiagonal, then \(R\) of its \(QR\) factorisation is nonzero only on the diagonal and the first two upper off-diagonals.

Question

Given \(A \in \mathbb{R}^{n \times m}\) with \(m \geq n\) show that the factorisation \(A = LQ\) exists, where \(Q\) is an \(n \times m\) matrix with orthogonal rows and where \(L\) is an \(n \times n\) lower triangular matrix.

Question

Determine the \(2\times 2\) matrix \(Q\) that rotates a vector by an angle \(\theta\). Is this matrix orthogonal? Show that \(Q^{-1}\) is identical to the matrix that rotates vectors by an angle of \(-\theta\).

Question

Let \(u\in\mathbb{R}^n\) with \(\|u\|_2=1\). Define \(P=uu^\top\). Show that \(P=P^2\). Is \(P\) an orthogonal matrix? Describe what \(P\) is doing. Matrices that satisfy \(P=P^2\) are also called projectors.

Question

Let \(P=P^2\) be a projector satisfying \(P=P^\top\). Show that \(Q=I-2P\) is an orthogonal matrix. Give a geometric interpretation of \(Q\).

Question

In the following, we define two different ways of orthogonalising vectors.

import numpy as np  
  
def gram_schmidt(A):  
    """Returns an orthogonal basis for the columns in A."""  
    
    m = A.shape[0]  
    n = A.shape[1]  
      
    Q = np.zeros((m, n), dtype=np.float64)  
    Q[:, 0] = A[:, 0] / np.linalg.norm(A[:, 0])  
      
    for col in range(1, n):  
        t = A[:, col]  
        inner_products = Q[:, :col].T @ t  
        t -= Q[:, :col] @ inner_products  
        Q[:, col] = t / np.linalg.norm(t)  
      
    return Q  

def modified_gram_schmidt(A):  
    """Returns an orthogonal basis for the columns in A."""  
          
    m = A.shape[0]  
    n = A.shape[1]  
      
    Q = np.zeros((m, n), dtype=np.float64)  
    Q[:, 0] = A[:, 0] / np.linalg.norm(A[:, 0])  
      
    for col in range(1, n):  
        t = A[:, col]  
        for index in range(col):  
            t -= Q[:, index] * (np.inner(Q[:, index], t))  
        Q[:, col] = t / np.linalg.norm(t)  
      
    return Q

Describe the difference between the two formulations and convince yourself that they are algebraically equivalent. How can one numerically demonstrate that the modified formulation is more accurate in floating point arithmetic?

Question

Equip the vector space \(\mathcal{P}_3\) of cubic polynomials

\[ \{ c_3 \, t^3 + c_2 \, t^2 + c_1 \, t + c_0 | \, (c_0, c_1, c_2, c_3) \in \mathbb{R}^4 \} \]

with the scalar product

\[ \langle p, q \rangle = \int_a^b p(t) \, q(t) \, w(t) \, dt. \]

Applying the Gram–Schmidt method to the basis \(\{1, t, t^2, t^3\}\), compute orthogonal bases for \(\mathcal{P}_3\) in three cases:

\((a,b) = (-1,1)\) with \(w(t)=1\);
\((a,b) = (0,1)\) with \(w(t)=1\);
\((a,b) = (0,1)\) with \(w(t)=\exp^{-t}\).

Answer

In all cases, let \(a_j(t)=t^j\). Observe that we only need an orthogonal basis, not an orthonormal one. We therefore only write down the basis vectors \(\tilde{q}_i\); their normalised counterparts still appear in the computations implicitly because, in the notation of the Gram-Schmidt algorithm above,

\[ \sum_{j=1}^{k-1}\langle a_k, q_j \rangle q_j = \sum_{j=1}^{k-1} \frac{\langle \tilde{q}_j, a_k \rangle}{\langle \tilde{q}_j, \tilde{q}_j \rangle} \tilde{q}_j. \]

Starting with \(\tilde{q}_0(t)=1\), we compute

\[\begin{split} \begin{align*} &\tilde{q}_1(t) = a_1(t) - \frac{\langle \tilde{q}_0,a_1 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t) = t - 0,\\ &\tilde{q}_2(t) = a_2(t) - \frac{\langle \tilde{q}_1,a_2 \rangle}{\langle \tilde{q}_1,\tilde{q}_1 \rangle}\tilde{q}_1(t) - \frac{\langle \tilde{q}_0,a_2 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t) = t^2 - 0 - \frac13,\\ &\tilde{q}_3(t) = a_3(t) - \frac{\langle \tilde{q}_2,a_3 \rangle}{\langle \tilde{q}_2,\tilde{q}_2 \rangle}\tilde{q}_2(t) - \frac{\langle \tilde{q}_1,a_3 \rangle}{\langle \tilde{q}_1,\tilde{q}_1 \rangle}\tilde{q}_1(t) - \frac{\langle \tilde{q}_0,a_3 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t) = t^3 - 0 - \frac{3t}5 - 0. \end{align*} \end{split}\]

One could reduce the amount of actual computation by noting that \((f, g)=0\) whenever the product \(f \cdot g\) is odd; using this, \(\tilde{q}_j\) is odd for odd \(j\) and even for even \(j\).
Starting with \(\tilde{q}_0(t)=1\), we compute

\[\begin{split} \begin{align*} \tilde{q}_1(t) & = a_1(t) - \frac{\langle \tilde{q}_0,a_1 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t) = t - \frac12,\\ \tilde{q}_2(t) & = a_2(t) - \frac{\langle \tilde{q}_1,a_2 \rangle}{\langle \tilde{q}_1,\tilde{q}_1 \rangle}\tilde{q}_1(t) - \frac{\langle \tilde{q}_0,a_2 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t) = t^2 - 1\cdot\tilde{q}_1(t) - \frac13\cdot\tilde{q}_0(t) = t^2 - t + \frac16,\\ \tilde{q}_3(t) & = a_3(t) - \frac{\langle \tilde{q}_2,a_3 \rangle}{\langle \tilde{q}_2,\tilde{q}_2 \rangle}\tilde{q}_2(t) - \frac{\langle \tilde{q}_1,a_3 \rangle}{\langle \tilde{q}_1,\tilde{q}_1 \rangle}\tilde{q}_1(t) - \frac{\langle \tilde{q}_0,a_3 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t)\\ & = t^3 - \frac32\cdot\tilde{q}_2(t) - \frac9{10}\cdot\tilde{q}_1(t) - \frac14\cdot\tilde{q}_0(t) = t^3 - \frac{3t^2}2 + \frac{3t}5 - \frac1{20}. \end{align*} \end{split}\]

No trick as in (a) is possible here.
Finally, starting with \(\tilde{q}_0(t)=1\), we compute:

\[\begin{split} \begin{align*} \tilde{q}_1(t) & = a_1(t) - \frac{\langle \tilde{q}_0,a_1 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t) = t - \frac{\exp-2}{\exp-1} = t - 0.4180232931,\\ \tilde{q}_2(t) & = a_2(t) - \frac{\langle \tilde{q}_1,a_2 \rangle}{\langle \tilde{q}_1,\tilde{q}_1 \rangle}\tilde{q}_1(t) - \frac{\langle \tilde{q}_0,a_2 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t)\\ & = t^2 - \frac{4\exp^2-13\exp+6}{\exp^2-3\exp+1}\,\tilde{q}_1(t) - \frac{2\exp-5}{\exp-1}\,\tilde{q}_0(t) = t^2 - \frac{4\exp^2-13\exp+6}{\exp^2-3\exp+1}\,t + \frac{2\exp^2-8\exp+7}{\exp^2-3\exp+1}\\ & = t^2 - 0.93317985227\,t + 0.1360210355\\ \tilde{q}_3(t) & = a_3(t) - \frac{\langle \tilde{q}_2,a_3 \rangle}{\langle \tilde{q}_2,\tilde{q}_2 \rangle}\tilde{q}_2(t) - \frac{\langle \tilde{q}_1,a_3 \rangle}{\langle \tilde{q}_1,\tilde{q}_1 \rangle}\tilde{q}_1(t) - \frac{\langle \tilde{q}_0,a_3 \rangle}{\langle \tilde{q}_0,\tilde{q}_0 \rangle}\tilde{q}_0(t)\\ & = t^3 - \frac{36\exp^3-194\exp^2+279\exp-48}{4\exp^3-21\exp^2+29\exp-4}\,\tilde{q}_2(t) - \frac{18\exp^2-61\exp+33}{\exp^2-3\exp+1}\,\tilde{q}_1(t) - \frac{6\exp-16}{\exp-1}\,\tilde{q}_0(t)\\ & = t^3 - 1.435696868\,t^2 + 0.5378430946\,t - 0.040296542725. \end{align*} \end{split}\]

Optional material#

QR Decomposition with Householder Transformations

While the Gram-Schmidt process is a common approach to obtain the QR decomposition of a matrix, the Householder transformation method provides a more stable alternative, especially for large matrices. In this section, we explore the QR decomposition using Householder transformations.

Householder Transformations

Given a vector \(v \in \mathbb{R}^n \setminus \{0\}\), the corresponding Householder transformation \(H\) of order \(n\) is defined as:

\[ H = I - 2 \frac{vv^\top}{v^\top v} \]

Clearly,

\[ H x = x - 2 \frac{v^\top x}{v^\top v} v \]

Split \(x = x_o + x_c\) into the orthogonal and collinear part: \(v^\top x_o = 0\) and \(\alpha v^\top = x_c\) for some \(\alpha \in \mathbb{R}\). Then

\[ H x = \left(x_o - 2 \frac{v^\top x_o}{v^\top v} v\right) + \left(x_c - 2 \frac{v^\top x_c}{v^\top v} v\right) = x_o - x_c, \]

meaning that geometrically, Householder transformation is a reflection of \(x\) in the hyperplane, which is perpendicular to \(v\).

Application in QR Decomposition

As with the Gram-Schmidt method above, the goal is to transform a given matrix \(A\) into an upper triangular matrix \(R\) using orthogonal transformations. This is achieved iteratively, column by column, using Householder matrices.

Step-by-Step Process:

Target Vector: Consider the first column of \(A\). Let us denote this column as \(x\). We want to transform \(x\) into a new vector \(x'\) with the same norm as \(x\), but all components are zero except for the first component.
Constructing \(v\): To create the Householder matrix \(H\), we first need to construct the vector \(v\). This is done by taking \(v = x - \|x\|e_1\), where \(e_1\) is the first unit vector. With \(v\) determined, we form the Householder matrix \(H = I - 2 \frac{vv^\top}{v^\top v}\).
Applying \(H\): Multiply \(H\) by \(A\) (i.e., \(HA\)) to reflect the first column of \(A\) about the plane orthogonal to \(v\). This operation zeros out all elements below the diagonal in the first column; see Lecture 10 of [Trefethen and Bau, 1997].
Iterative Process: Repeat the process for the submatrix of \(A\) excluding the already zeroed elements. Each step involves constructing a new Householder matrix to introduce zeros in the subsequent column below the diagonal.

Comparing with the Gram-Schmidt method

Numerical Stability: Householder transformations are generally more numerically stable than the classical (and modified) Gram-Schmidt processes.
Efficiency in Large Matrices: For large matrices, the increased stability of Householder transformations often outweighs its higher computational complexity.
Ease of Understanding: The Gram-Schmidt process is often easier to understand and implement.

QR decomposition

Contents

QR decomposition#

Orthogonal matrices#

QR Decomposition via Gram-Schmidt orthogonalisation#

Matrix decomposition representation#

Solving linear systems of equations with the QR decomposition#

Reduced vs full QR decomposition#

Python skills#

Implementation from scratch#

Using NumPy’s implementation#

Self-check questions#

Optional material#

Householder Transformations

Application in QR Decomposition

Comparing with the Gram-Schmidt method