Backward error and pivoting

Backward error and pivoting#

Recall that we can bound the forward error from the condition number and the backward error. The condition number is a property of the linear system, not of the numerical method we employ (such as the LU decomposition), and we have established that \(\kappa(A) = | A | \cdot | A^{-1} |\).

The backward error of the LU decomposition remains a subject of significant interest. Despite the long-established nature of LU decomposition, several open questions persist regarding its backward error.

Fundamentally, LU decomposition is not backward stable, and we will examine specific examples illustrating this instability. In some cases, LU decomposition can lead to a catastrophic increase in backward error.

To address this issue, a widely used modification known as pivoting is introduced. This refinement substantially improves the stability of the algorithm in almost all cases. However, it is still possible to construct matrices for which the pivoted algorithm remains unstable. Such cases, however, are highly pathological and rarely arise in practical applications.

Backward error of the LU decomposition#

Let \(A = LU\) be the LU decomposition of a nonsingular matrix \(A\in\mathbb{R}^{n\times n}\), and let \(\tilde{L}, \tilde{U}\) be the computed LU factors obtained via Gaussian elimination in floating point arithmetic. These factors satisfy

\[ \tilde{L}\tilde{U} = A + \Delta A. \]

Recall the relative backward error, given \(A\), is

\[\begin{split} \begin{align*} \eta(\tilde{L}, \tilde{U}) & = \min \{ \epsilon : A + \Delta A = \tilde{L} \, \tilde{U} : \| \Delta A \| \leq \epsilon \| A \| \}\\ & = \| A - \tilde{L} \, \tilde{U} \| / \| A \|. \end{align*} \end{split}\]

One can show for \(\Delta A = A - \tilde{L} \, \tilde{U}\) that

\[ \|\Delta A\| = \|L\|\cdot \|U\| \, \mathcal{O}(\epsilon_{mach}). \]

One might naively conclude that this guarantees the backward stability of LU decomposition. However, backward stability requires \(\| \Delta A \| / \|A\| = \mathcal{O}(\epsilon_{mach})\) whereas in our bound, \(\|L\|\cdot \|U\|\) appears in the denominator.

So the question of backward stability of the \(LU\) decomposition is reduced to the problem of when we have that

\[ \|A\| \approx \|L\|\cdot \|U\|. \]

Example 8 (Failure of backward stability)

Let

\[\begin{split} A_{\epsilon} = \begin{pmatrix} \epsilon & 1\\ 1 & 1 \end{pmatrix}. \end{split}\]

We have

\[\begin{split} A_{\epsilon} = \begin{pmatrix} 1 & 0\\ \epsilon^{-1} & 1 \end{pmatrix} \begin{pmatrix} \epsilon & 1\\ 0 & 1 - \epsilon^{-1}. \end{pmatrix} \end{split}\]

Consider the \(1\)-norm. In this example \(\|A\| = 2\) but \(\|L\|\cdot \|U\|\approx \mathcal{O}(\epsilon^{-2})\).

The norms of \(L\) and \(U\) can differ wildly from the norm of \(A\). But there is an easy fix in this example. We can swap the first and second rows of \(A\) to obtain

\[\begin{split} \begin{pmatrix} 1 & 1\\ \epsilon & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0\\ \epsilon & 1 \end{pmatrix} \begin{pmatrix} 1 & 1\\ 0 & 1-\epsilon. \end{pmatrix} \end{split}\]

We now have \(\|L\|\cdot \|U\|\approx 1\).

LU decomposition with pivoting#

The above examples motivates a modification of the LU decomposition, called column-pivoted (or simply pivoted) LU. The idea is simple. Suppose that \(a_{31}\) is the element with the largest magnitude in the first column of \(A \in \mathbb{R}^{3 \times 3}\). We then exchange the third and the first row to get

\[\begin{split} A^{(1)} = \begin{pmatrix} a_{31} & a_{32} & a_{33}\\ a_{21} & a_{22} & a_{23}\\ a_{11} & a_{12} & a_{13} \end{pmatrix} = \begin{pmatrix} 0 & 0 & 1\\ 0 & 1 & 0\\ 1 & 0 & 0 \end{pmatrix} \, \begin{pmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{pmatrix} = P^{(1)} A \end{split}\]

Eliminating in the first column of \(A^{(1)}\) as usual, we obtain a matrix

\[\begin{split} \begin{pmatrix} a_{31} & a_{32} & a_{33}\\ 0 & \tilde{a}_{22} & \tilde{a}_{23}\\ 0 & \tilde{a}_{12} & \tilde{a}_{13} \end{pmatrix} \end{split}\]

Before proceeding with the LU decomposition, we swap the second and third rows if the element at position \((3, 2)\) is larger by magnitude than the entry at position at \((2, 2)\), using

\[\begin{split} P^{(2)}\begin{pmatrix} 1 & 0 & 0\\ 0 & 0 & 1\\ 0 & 1 & 0 \end{pmatrix}, \end{split}\]

giving

\[ F^{(2)} P^{(2)} F^{(1)} P^{(1)} A = U. \]

This pivoting strategy guarantees that \(|\ell_{i, j}|\leq 1\) for all \(i, j\).

Definition 6 (Permutation matrix)

\(P \in \mathbb{R}^{n \times n}\) is a permutation matrix if there exists a bijective mapping \(\pi : \{1, \ldots, n\} \to \{1, \ldots, n\}\) (called permutation) such that

\[ P = \Big( e^{(\pi(1))} \Big| \cdots \Big| e^{(\pi(n))} \Big) \]

where \(e^{(k)} \in \mathbb{R}^n\) is the \(k\)th canonical basis vector.

In other words, a permutation matrix is obtained by re-arranging the rows of the identity matrix. As shown in a self-check question below, reordering rows is precisely the effect of multiplication by a permutation matrix from the left.

Fact: Pivoted LU decomposition

Let \(A \in \mathbb{R}^{n \times n}\). Then there are a lower triangular \(L\), upper triangular \(U\) and permutation matrix \(P\) in \(\mathbb{R}^{n \times n}\) such that

\[ PA = LU. \]

The vector \(x \in \mathbb{R}^n\) solves \(A x = b\) if and only if \(x\) solves \(U x = y\) with \(L y = P b\).

Proof. Generalising the principle of the above example gives

\[ F^{(n-1)} P^{(n-1)} \cdots F^{(2)} P^{(2)} F^{(1)} P^{(1)} A = U. \]

There is a commutation property between permutation and Frobenius matrices,

\[ P (I - f \, (e^{(i)})^\top) P = I - (P f) \, (e^{(i)})^\top \]

for elementary permutations \(P\), that ultimately allows to reorder the product so that

\[ \hat{F}_{n-1} \cdots \hat{F}_1 P A = U, \]

where the \(\hat{F}_i = I - \hat{f}_i e^{(i)}\) are Frobenius matrices of index \(i\), \(P\) is a permutation matrix and \(U\) is upper triangular. The details are beyond the scope of this lecture. See [Plato, 2003] for the full proof.

Growth factors#

Let us consider how pivoting influences the size of the factors in the LU decomposition. It is convenient to initially work with the norm \(\| A \|_* := \max_{ij} | A_{ij} |\).

Pivoting ensures that \(|\ell_{i, j}| \leq 1\) with \(\ell_{i,i} = 1\). Hence, \(\|L\|_* = 1\) and \(\|L\|_* \cdot \|U\|_* = \|U\|_*\). To measure how \(\|U\|_*\) grows the following definition of the growth factor \(\rho\) is useful.

\[ \rho(A) := \frac{\|U\|_*}{\|A\|_*} = \frac{\max_{i, j}|u_{i,j}|}{\max_{i, j}|a_{i,j}|}. \]

Hence, we have that

\[ \frac{\| L \|_* \, \| U \|_*}{\| A \|_*} = \rho(A) \]

Now let \(\| \cdot \|\) be our preferred matrix norm:

\[ C_a \| B \|_* \leq \| B \| \leq C_b \| B \|_* \qquad \forall B \in \mathbb{R}^{n \times n}. \]

Then

\[ \eta(\tilde{L}, \tilde{U}) = \frac{\| \Delta A \|}{\| A \|} = \frac{\| L \| \, \| U \| }{\| A \|} \, \mathcal{O}(\epsilon_{mach}) \leq \frac{C_b^2}{C_a} \, \rho(A) \, \mathcal{O}(\epsilon_{mach}) = \rho(A) \, \mathcal{O}(\epsilon_{mach}). \]

It follows that the LU decomposition with pivoting is backward stable if \(\rho\) is small.

Counterexample to backward stability#

For most matrices, the growth factor \(\rho\) remains small when pivoting is employed. However, one counterexample suffices to demonstrate that LU decomposition with pivoting is not backward stable.

Example 9 (Counterexample to backward stability)

Define \(A_n \in \mathbb{R}^{n \times n}\) as the matrix with ones on the diagonal and in the last column, and \(-1\) on all entries below the diagonal. For instance, \(A_8\) is:

\[\begin{split} \begin{pmatrix} \phantom{-}1 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}1 \\ -1 & \phantom{-}1 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}1 \\ -1 & -1 & \phantom{-}1 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}1 \\ -1 & -1 & -1 & \phantom{-}1 & \phantom{-}0 & \phantom{-}0 & \phantom{-}0 & \phantom{-}1 \\ -1 & -1 & -1 & -1 & \phantom{-}1 & \phantom{-}0 & \phantom{-}0 & \phantom{-}1 \\ -1 & -1 & -1 & -1 & -1 & \phantom{-}1 & \phantom{-}0 & \phantom{-}1 \\ -1 & -1 & -1 & -1 & -1 & -1 & \phantom{-}1 & \phantom{-}1 \\ -1 & -1 & -1 & -1 & -1 & -1 & -1 & \phantom{-}1 \end{pmatrix}. \end{split}\]

Since the diagonal and subdiagonal entries have modulus one, pivoting does not cause row interchanges. Let \(A_n = L_n U_n\). It can be shown that \(U_n\) is the identity matrix except in its last column, whose \(i\)th entry is \(2^{i-1}\):

\[\begin{split} U_8 = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 2 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 4 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 8 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 16 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 32 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 64 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 128 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 256 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 512 \end{pmatrix} \end{split}\]

Hence, the growth factor is \(\rho(A) = 2^{n-1}\), which increases exponentially with \(n\). In double-precision arithmetic (\(\epsilon_{\mathrm{mach}} = 2^{-52}\)), taking \(n = 53\) leads to a backward error of order one. For \(n = 100\), the magnitude of the backward error is on the order of

\[ 2^{100} \cdot 2^{-52} = 2^{48} \approx 2.8 \cdot 10^{14}. \]

A system of dimension \(100\) is modest by modern computational standards, yet this example illustrates that pivoted LU decomposition can fail spectacularly in certain pathological cases.

Though rarely encountered in practice, such counterexamples highlight the importance of seeking alternative methods when numerical stability is critical.

Python skills#

Below are some Python examples that illustrate how to compute and investigate the backward error in LU decomposition.

We also wish to compare LU decomposition with and without pivoting. Since the Python library provides the decomposition with pivoting, we implement here only the LU decomposition without pivoting.

import numpy as np

def lu_no_pivot(A):
    """
    Performs LU decomposition without pivoting.
    Returns (L, U) where A = L @ U.
    
    NOTE: This function may fail or produce large errors if A has zeros or very small pivots on the main diagonal.
    """
    A = A.astype(float)
    n = A.shape[0]
    
    L = np.eye(n)
    U = A.copy()
    
    for k in range(n - 1):
        # If U[k, k] is zero (or extremely small), the decomposition will fail numerically
        if abs(U[k, k]) < 1e-15:
            raise ValueError(f"Zero (or tiny) pivot encountered at index {k}. "
                             "Pivoting would be required here.")
        
        for i in range(k + 1, n):
            L[i, k] = U[i, k] / U[k, k]
            U[i, k:] = U[i, k:] - L[i, k] * U[k, k:]
    
    return L, U

We begin the comparison by computing the reconstruction errors. Then recall the definition of the relative backward error for the computed factors \(\tilde{L}\) and \(\tilde{U}\):

\[ \eta(\tilde{L}, \tilde{U}) = \frac{\| A - \tilde{L}\,\tilde{U} \|}{\|A\|}. \]

from scipy.linalg import lu

# Example usage:
A = np.array([[1e-10, 1.3],
              [1.1,    1.2]], dtype=float)

P, L, U = lu(A)
L_no_pivot, U_no_pivot = lu_no_pivot(A)

print("Matrix A:")
print(A)
print("\nPermutation matrix P:")
print(P)
print("\nLower triangular matrix L:")
print(L)
print("\nUpper triangular matrix U:")
print(U)
print("\nLower triangular matrix without pivoting L:")
print(L_no_pivot)
print("\nUpper triangular matrix without pivoting U:")
print(U_no_pivot)

# Check that P @ A is approximately L @ U
reconstruction_error = np.linalg.norm(P @ A - L @ U)
reconstruction_error_no_pivot = np.linalg.norm(A - L_no_pivot @ U_no_pivot)
norm_A = np.linalg.norm(A, ord=2)

print(f"\nReconstruction error with pivoting (|| PA - LU||): {reconstruction_error:e}")
print(f"\nReconstruction error without pivoting (|| A - LU||): {reconstruction_error_no_pivot:e}")
print(f"Relative backward error with pivoting: (|| PA - LU|| / ||A||): {reconstruction_error / norm_A:e}")
print(f"Relative backward error without pivoting: (|| A - LU|| / ||A||): {reconstruction_error_no_pivot / norm_A:e}")

Finally, we investigate the counterexample for backward stability.

def build_counterexample_matrix(n):
    """
    Constructs the n x n counterexample matrix A described in the notes:
    1s on the diagonal and last column, and -1 below the diagonal.
    """
    A = np.zeros((n, n), dtype=float)
    for i in range(n):
        A[i, i] = 1.0  # diagonal entries
        A[i, -1] = 1.0 # last column entries
        for j in range(i):
            A[i, j] = -1.0
    return A

def test_counterexample(n):
    A = build_counterexample_matrix(n)
    P, L, U = lu(A)
    num = np.linalg.norm(P @ A - L @ U)
    den = np.linalg.norm(A)
    err = num / den
    growth_factor = np.linalg.norm(U, ord=1)  # Illustrative choice for growth factor
    print(f"n = {n} | Relative backward error: {err:e} | Growth factor (||U||_1): {growth_factor:e}")

# Example usage:
for size in [10, 20, 30, 40, 50, 53, 100, 1000]:
    test_counterexample(size)

Self-check exercises#

Question

Solve the linear systems

\[\begin{split} \begin{equation*} \left( \begin{array}{rrrr} 1 & 2 & \;\;3 & 6\\ 2 & 8 & 6 & 5\\ 0 & 12 & 9 & -6\\ -4 & -8 & 0 & 0 \end{array} \right) \left( \begin{array}{c} x_1 \\ x_2 \\ x_3 \\ x_4 \end{array} \right) = \left( \begin{array}{c} 1 \\ 2 \\ 3 \\ 4 \end{array} \right) \quad\textrm{and}\quad \left(\begin{array}{rrr} -1 &17 &-3\\ -1 &29 &-5\\ 3 &-70 &12\\ \end{array}\right) \left(\begin{array}{c} x_1\\ x_2\\ x_3\\ \end{array}\right) = \left(\begin{array}{c} 5\\ 6\\ 7\\ \end{array}\right) \end{equation*} \end{split}\]

using Gaussian elimination with pivoting and back substitution.

Question

Let \(P\) be a permutation matrix with permutation \(\pi\).

(a) Show that

\[\begin{split} P = \begin{pmatrix} & (e^{(\pi^{-1}(1))})^\top & \\ \hline & \vdots\\ \hline & (e^{(\pi^{-1}(n))})^\top & \end{pmatrix} \end{split}\]

(b) Let \(A\), \(B\) be matrices of compatible size so that the matrix product \(A \cdot B\) is defined. Show that

\[ A \cdot B = \sum_k a_k \otimes b_k \]

where \(a_k\) is the \(k\)th column of \(A\) and \((b_k)^\top\) the \(k\)th row of \(B\).

(c) Show for the vectors \(b_1, \ldots, b_n \in \mathbb{R}^n\) that

\[\begin{split} P \cdot \begin{pmatrix} & b_1^\top & \\ \hline &\vdots&\\ \hline &b_n^\top & \end{pmatrix} = \begin{pmatrix} & b_{\pi^{-1}(1)}^\top & \\ \hline & \vdots\\ \hline & b_{\pi^{-1}(n)}^\top & \end{pmatrix} \end{split}\]

Question

As in the counter example above, define \(A_n \in \mathbb{R}^{n \times n}\) as the matrix with ones on the diagonal and in the last column, and \(-1\) on all entries below the diagonal.

Show that the LU decomposition \(A_n = L_n U_n\) exists where \(U_n\) is the identity matrix except in its last column, whose \(i\)th entry is \(2^{i-1}\).

Question

As discussed in the computer practical on Frobenius matrices, suppose you have implemented the following three functions in Python:

permute(i,j,n) returns an n \(\times\) n permutation matrix that interchanges the ith and jth elements of a vector upon multiplication.
frob(s,f) returns the \(n \times n\) Frobenius matrix of index s whose subdiagonal entries are given by last \(n - \)s entries of f. Here \(n\) is the length of f.
ifrob(s,f) returns the inverse of frob(s,f).

Find the LU decomposition with pivoting of A, using the above Python functions for Frobenius and permutation matrices:

A = np.array([[3, 2, 1], [6, 4, 0], [0, 2, -4]])

Backward error and pivoting

Contents

Backward error and pivoting#

Backward error of the LU decomposition#

LU decomposition with pivoting#

Growth factors#

Counterexample to backward stability#

Python skills#

Self-check exercises#