Computing eigenspaces#

The Schur decomposition#

The eigenvalue decomposition \(A = X\Lambda X^{-1}\) is not always suitable for computational purposes. It might not exist, requiring a Jordan normal form instead. Even though small machine perturbations usually ensure diagonalizability in floating point arithmetic, the proximity to a Jordan form can introduce instabilities. The eigenvector matrix \(X\) might also be ill-conditioned, making its inverse computationally unreliable.

For practical computations, the Schur decomposition of \(A\) is more robust.

Theorem 13 (Existence of the Schur decomposition)

Every complex matrix \(A \in \mathbb{R}^{n \times n}\) has a Schur decomposition

\[ A = Q R Q^H \]

with unitary \(Q\in\mathbb{C}^{n\times n}\) and upper triangular \(R\in\mathbb{C}^{n\times n}\).

The proof is given in the optional material section below. Let’s explore some key properties of the Schur decomposition:

  • Eigenvalues: The eigenvalues of \(A\) are the diagonal elements of \(R\). This is evident from the equation \(\det(\lambda I - A) = \det Q \det Q^H \det(\lambda I - R)\) and expanding the determinant of \(\lambda I - R\) along its diagonal.

  • Hermitian Matrices: For Hermitian \(A\), the Schur decomposition coincides with the eigenvalue decomposition. Since \(A = A^H\), it implies \(R = R^H\), which means \(R\) is diagonal. The first column of \(Q\) corresponds to the eigenvector for the eigenvalue \(r_{11}\) in \(R\).

  • Invariant Subspaces: Consider \(R_k\), the upper left \(k \times k\) principal submatrix of \(R\), and \(Q_k \in \mathbb{C}^{n \times k}\), the matrix of the first \(k\) columns of \(Q\). It holds that \(AQ_k = Q_kR_k\). This implies that for any vector \(v\) in the span of \(\{q_1, \ldots, q_k\}\), \(Av\) is also in this span:

    \[ v\in \text{span}\{q_1, \dots, q_k\} \implies Av\in \text{span}\{q_1, \dots, q_k\}. \]

    We say that the subspace is invariant under \(A\). The Schur decomposition thus reveals a sequence of enlarging invariant subspaces.

Subspace iteration#

So far we are always only targeting one eigenvalue. This can be overcome by the subspace iteration, which computes a partial Schur decomposition and is a generalization of the inverse iteration.

Here, \(Q^{(0)}\in\mathbb{C}^{n\times m}\) is a matrix with \(m\) columns, satisfying \(\left[Q^{(0)}\right]^H Q^{(0)} = I\). The following algorithm converges to an upper triangular \(k\times k\) matrix.

Algorithm (Subspace Iteration)

for \(k=1,2,\dots\)

\(\quad Z^{(k)} = AQ^{(k-1)}\)

\(\quad Q^{(k)}R = Z^{(k)}\) (QR decomposition)

\(\quad A^{(k)} = [Q^{(k)}]^HAQ^{(k)}\)

end

%matplotlib inline  
import numpy as np  
from numpy.random import rand, randn  
from matplotlib import pyplot as plt  
from scipy.linalg import qr  
def subspace_iteration(A,k,m):  
    """Evaluates k steps of the subspace iteration for the matrix A to find the largest m eigenvalues.  
       Return a vector of errors computed via the Frobenius norm.    """        Z0 = np.random.rand(A.shape[0],m)+1j*np.random.rand(A.shape[0],m)  
    Q,R = qr(Z0,mode='economic')  
      
    error = np.zeros(k,dtype=np.float64)  
      
    for i in range(k):  
        Z = np.dot(A,Q)  
        Q,R = qr(Z,mode='economic')  
        Lambda = np.dot(np.conjugate(Q).T,np.dot(A,Q))  
        error[i] = np.linalg.norm(np.tril(Lambda,-1))  
          
    return error  
  
n = 50  
k = 40  
m = 2  
A = rand(n,n)  
  
error = subspace_iteration(A,k,m)  
plt.semilogy(error)  
plt.ylabel('error')

The above plot shows the size of the strictly lower triangular part of \(A^{(k)}\). As it converges to zero, the elements on the diagonal converge to eigenvalues of \(A\).

The QR iteration#

The QR iteration for eigenvalue computations derives from subspace iteration and is the work-horse of modern eigenvalue solvers for dense matrices.

Convergence of the subspace iteration#

Before we can understand the QR iteration, let us check on the subspace iteration again.

Let \(S\subset\mathbb{C}^n\) be a subspace of dimension \(k\). Then, the subspace iteration produces subspaces

\[ S_m = A^{m}S. \]

Denote by \(Q^{(0)} = \begin{bmatrix}q_0^{(0)} & q_1^{(0)} & \dots & q_k^{(0)}\end{bmatrix}\) a unitary basis of \(S\), and let \(Q^{(m)}R^{(m)}\) be the QR decomposition of \(A^mQ^{(0)}\). To test convergence, we compute

\[\begin{split} \left[Q^{(m)}\right]^HAQ^{(m)} = \begin{bmatrix}B_{11} & B_{12}\\ B_{21} & B_{22} \end{bmatrix} \end{split}\]

for some matrices \(B_{ij}\), \(i, j=1,2\).

The subspace iteration also converges to an invariant subspace for each \(j\leq k\). Hence, we have a sequence of approximate invariant subspaces for \(j\leq k\) that all converge to invariant subspaces. It follows that

\[ \left[Q^{(m)}\right]^HAQ^{(m)}\rightarrow R \]

with \(R\) upper triangular. The upper triangularity of \(R\) follows from the property of nested invariant subspaces that we are converging to.

We can now ask what happens if we increase the basis vectors \(k\). We always converge to an upper triangular matrix. In particular, there is no reason not to choose \(k=n\), that is, to iterate on the whole vector space. We call this simultaneous iteration, and it follows that also, in this case, we converge to an upper triangular matrix.

The QR iteration#

Let \(A^{(1)} = A\). The basic form of the algorithm is exceedingly simple. Let \(A^{(m)}\) be our current iterate. Compute the QR decomposition \(A^{(m)} = Q_mR_m\) and let \(A^{(m+1)} = R_mQ_m\) (note that we use subscripts to distinguish from the QR factors in the simultaneous iteration, which have superscripts). That’s it. In each step, we compute a QR decomposition and multiply the factors in reverse order again. Why does this work? The magic lies in understanding the connection with subspace iteration.

In the QR iteration, we have

\[ A^{(m+1)} = Q_m^HA^{(m)}Q_m. \]

This looks just like the testing from the simultaneous iteration, and with a little bit of algebra, one can show that simultaneous iteration with \(Q^{(0)} = I\) produces the same sequence \(A^{(m)}\).

Let’s demonstrate this. The QR iteration gives us

\[ A = Q_1R_1. \]

We hence have

\[\begin{split} \begin{aligned} A^2 &= Q_1R_1Q_1R_1\\ &= Q_1A^{(2)}R_1\\ &= Q_1Q_2R_2R_1. \end{aligned} \end{split}\]

In the next step, we have

\[\begin{split} \begin{aligned} A^3 &= Q_1R_1Q_1R_1Q_1R_1\\ &= Q_1Q_2R_2Q_2R_2R_1\\ &= Q_1Q_2Q_3R_3R_2R_1. \end{aligned} \end{split}\]

Every time we have used that \(Q_kR_k = R_{k-1}Q_{k-1}\).

Continuing this, it is easy to see that

\[ A^m = Q_1\dots Q_mR_m \dots R_1 = Q^{(m)}R^{(m)}, \]

where the \(Q^{(m)}\) and \(R^{(m)}\) are the QR factors from applying \(m\) steps of simultaneous iteration with \(Q^{(0)}=I\) as start basis.

Let us now do the testing step of the simultaneous iteration. We have

\[\begin{split} \begin{aligned} \left[Q^{(m)}\right]^HAQ^{(m)} &= Q_m^H\dots Q_1^HAQ_1\dots Q_m\\ &= Q_m^H\dots Q_2^H A^{(2)}Q_2\dots Q_m\\ &= Q_m^H\dots Q_3^HA^{(3)}Q_3\dots Q_m\\ &= \dots\\ &= A^{(m+1)} \end{aligned}. \end{split}\]

Hence, the iterates \(A^{(m)}\) from the QR iteration and the testing procedure \(\left[Q^{(m)}\right]^HAQ^{(m)}\) from the simultaneous iteration are related by

\[ A^{(m+1)} = \left[Q^{(m)}\right]^HAQ^{(m)}. \]

We know that the products \(\left[Q^{(m)}\right]^HAQ^{(m)}\) converge to an upper triangular matrix by virtue of convergence of nested invariant subspaces. It follows, therefore, that the QR iteration also converges to an upper triangular matrix, from which we can read off the eigenvalues of \(A\).

Accelerating the QR iteration in practice#

The basic form of the QR iteration has some problems.

  • Computing the QR decomposition in each step is rather expensive.

  • Convergence can be extremely slow.

  • It is not suitable for finding complex eigenvalues of real matrices. We cannot converge in real arithmetic to an upper triangular matrix with complex entries.

All three problems can be overcome.

  • To make a QR iteration step cheaper, we preprocess the matrix \(A\) by an orthogonal transformation to the so-called upper Hessenberg form. This is a matrix whose upper triangular and first lower diagonal are nonzero. This transformation can easily be achieved in \(O(n^3)\) operations in a finite number of steps (i.e. it is not an iterative process). A QR iteration step preserves the upper Hessenberg form and can be very cheaply implemented.

  • To speed up convergence, we can use a shifting strategy that can be shown to be equivalent to inverse iteration.

  • To find complex eigenvalues, one uses a double-shift strategy that ends up not with an upper triangular matrix \(R\) but with a block upper triangular matrix, where the complex eigenvalues are computed from 2x2 real blocks on the diagonal.

The following computational section demonstrates the upper Hessenberg form and shift strategies. We will not go into details about double shifts for complex eigenvalues.

Numerical investigation of QR iteration steps#

import numpy as np  
from scipy.linalg import qr, hessenberg  
from matplotlib import pyplot as plt  
%matplotlib inline

We start with some random matrix \(A\).

rand = np.random.RandomState(0)  
n = 10  
A  = rand.randn(n, n) + 1j * rand.randn(n, n)

Let us first transform the matrix to upper Hessenberg form. This can be implemented as a similarity transformation that does not change the eigenvalues.

H = hessenberg(A)  
plt.spy(H)

We see that the matrix and its first lower diagonal are nonzero. Upper Hessenberg structures are preserved by a QR iteration step. Let’s check this.

Q, R = qr(H)  
H2 = R @ Q  
plt.spy(H2)

This structure preservation can be used to implement a QR iteration step in a highly efficient manner. We will not go into technical details about this here.

Let us now run a couple of iterations of the QR iteration, and let’s see how quickly the second to last element in the last row of \(H\) converges to zero. If it is zero, the bottom last diagonal element is a wanted eigenvalue (convince yourself that this is true).

nsteps = 300  
  
H = hessenberg(A)  
  
residuals = np.empty(nsteps, dtype='float64')  
  
for index in range(nsteps):  
    Q, R = qr(H)  
    H = R @ Q  
    residuals[index] = np.abs(H[-1, -2]) / np.abs(H[-1, -1])  
      
plt.semilogy(residuals)

The convergence is extremely slow. We can speed this up with a shift strategy. The idea is to modify the QR iteration step so that it reads

\[\begin{split} \begin{aligned} A^{(m)} -\alpha I &= Q_mR_m\\ A^{(m+1)} &= R_mQ_m + \alpha I. \end{aligned} \end{split}\]

Hence, we subtract the shift and do the QR decomposition, and then when we compute \(R_mQ_m\) we add the shift back in.

One can show that a QR step is equivalent to inverse iteration applied to the last vector in the simultaneous iteration. Hence, by applying the shift, we perform a shifted inverse iteration. What shall we use as shift?

We achieve quadratic convergence if we use the bottom right element of the Hessenberg matrix in each step as shift. This is similar to the Rayleigh quotient method for symmetric problems, where we adapted the shift in each step.

The following implements the shift strategy.

nsteps = 7  
  
H = hessenberg(A)  
  
residuals = np.empty(nsteps, dtype='float64')  
  
ident = np.eye(n)  
  
for index in range(nsteps):  
    shift = H[-1, -1]  
    Q, R = qr(H - shift * ident)  
    H = np.dot(R, Q) + shift * ident  
    residual = np.abs(H[-1, -2]) / np.abs(H[-1, -1])  
    print(f"Residual: {residual}")  
    residuals[index] = residual  
      
plt.semilogy(residuals)

We have converged in 7 iterations. Let us now reduce the matrix and continue with the next smaller matrix.

nsteps = 3  
  
H_reduced = H[:-1, :-1] # Copy H over to preserve the original matrix  
  
residuals = np.empty(nsteps, dtype='float64')  
  
ident = np.eye(n-1, n-1)  
  
for index in range(nsteps):  
    shift = H_reduced[-1, -1]  
    Q, R = qr(H_reduced - shift * ident)  
    H_reduced = R @ Q + shift * ident  
    residual = abs(H_reduced[-1, -2]) / abs(H_reduced[-1, -1])  
    print(f"Residual: {residual}")  
    residuals[index] = residual  
      
plt.semilogy(residuals)

We have now converged in 3 iterations. In practice, with more sophisticated shift variants and good deflation strategies, the QR iteration typically takes only 2 to 3 iterations per eigenvalue, where each iteration has a quadratic cost. Hence, the overall algorithm converges usually in cubic time. Therefore, even though the QR iteration is an iterative algorithm, we speak of a method with cubic complexity since this holds in almost all cases.

Python skills#

Schur decomposition#

Here is an analogous example of the Schur decomposition.

import numpy as np
import scipy.linalg

# Creating a random 4x4 matrix
A = np.random.rand(4, 4)

# Performing the Schur decomposition
R, Q = scipy.linalg.schur(A)

# Verifying the decomposition: A = QRQ^H
A_reconstructed = Q @ R @ Q.conj().T

# Display results
print("Original Matrix A:\n", A)
print("\nUnitary Matrix Q:\n", Q)
print("\nUpper Triangular Matrix R:\n", R)
print("\nReconstructed Matrix (QRQ^H):\n", A_reconstructed)

# Check if the reconstruction is close to the original
print("\nIs the reconstruction close to the original? ", np.allclose(A, A_reconstructed))

Optional material#

Existence of the Schur decomposition

The proof of the Schur decomposition’s existence is surprisingly straightforward.

Proof. Let \(\lambda\) be an eigenvalue of \(A\) with a corresponding eigenvector \(x\) of unit norm. Construct \(Q\) as:

\[ Q = \begin{bmatrix}x & \hat{Q}\end{bmatrix}, \]

where \(\hat{Q}\) has orthogonal columns forming a basis of \(\text{span}\{x\}^\bot\). Calculating \(Q^HAQ\) yields:

\[\begin{split} Q^HAQ = \begin{bmatrix}\lambda & w \\ 0 & \tilde{A}\end{bmatrix}, \end{split}\]

with \(\tilde{A}\) being an \((n-1)\)-dimensional matrix. By induction, assume \(\tilde{A}\) has a Schur decomposition \(\tilde{A} = \tilde{Q}\tilde{R}\tilde{Q}^H\).

Then,

\[\begin{split} A = Q\begin{bmatrix}1 & \\ & \tilde{Q}\end{bmatrix}\begin{bmatrix}\lambda & w \\ & \tilde{R}\end{bmatrix}\begin{bmatrix}1 & \\ & \tilde{Q}^H\end{bmatrix}Q^H \end{split}\]

is the Schur decomposition of \(A\). The base case of the induction, for a scalar matrix \(\alpha\), is trivially \(\alpha = 1 \cdot \alpha \cdot 1\).