The condition number of linear systems

The condition number of linear systems#

The condition number reflects how sensitive a function evaluation \(f(x + \Delta x)\) is to changes in \(\Delta x\). It doesn’t depend on the numerical method \(\tilde{f}\) and shows how hard it is to evaluate \(f\) numerically, no matter which method is used.

With this in mind, we now compute the condition number for linear systems of equations \(Ay = b\) with \(A\in\mathbb{R}^{n\times n}\) and \(b\in\mathbb{R}^n\). We assume that \(A\) is non-singular and \(b \neq 0\). We use \(\|\cdot\|\) to denote both the vector norm on \(\mathbb{R}^n\) and the induced matrix norm on \(\mathbb{R}^{n\times n}\).

We consider the following two problems:

  1. Given a fixed \(A\), what is the local relative condition number \(\kappa_1(b)\) of \(f_1 : \mathbb{R}^n \to \mathbb{R}^n\), defined by \(b \mapsto A^{-1} b\)?

  2. Given a fixed \(b\), what is the local relative condition number \(\kappa_2(A)\) of \(f_2 : \mathbb{R}^{n \times n} \to \mathbb{R}^n\), defined by \(A \mapsto A^{-1} b\)?

Conveniently, both questions can be answered with the same calculation. Let \(\Delta A\) and \(\Delta b\) be perturbations in \(A\) and \(b\), and let \(\Delta y\) satisfy

\[ (A + \Delta A)(y + \Delta y) = b + \Delta b. \]

Then

\[\begin{split} \begin{aligned} \Delta y &= (A + \Delta A)^{-1}(-\Delta A y + \Delta b)\\ &= \left[A(I + A^{-1}\Delta A)\right]^{-1}(-\Delta A y + \Delta b)\\ &= \left(I + A^{-1}\Delta A\right)^{-1}A^{-1}(-\Delta Ay+\Delta b). \end{aligned}. \end{split}\]

To continue, we use the following lemma, whose proof is given in the optional materials. It is a direct generalisation of a well-known result for scalars (i.e. \(n = 1\) and the norm is the modulus) to the matrix setting.

Lemma 1 (Matrix-valued von Neumann series)

Let \(\|\cdot\|\) be a submultiplicative matrix norm, and let \(X\in\mathbb{R}^{n\times n}\) with \(\|X\| < 1\). Then \(I - X\) is invertible and

\[ (I - X)^{-1} = \sum_{i=0}^{\infty}X^i \]

with

\[ \|(I - X)^{-1}\| \leq \frac{1}{1-\|X\|}. \]

Assuming that \(\Delta A\) is sufficiently small so that \(\|A^{-1}\Delta A\| < 1\), we conclude from the lemma that

\[\begin{split} \begin{align*} \frac{\|\Delta y\|}{\|y\|} & \leq \frac{\|A^{-1}\|}{1-\|A^{-1}\|\cdot \|\Delta A\|}\left(\|\Delta A\| + \frac{\|\Delta b\|}{\|y\|}\right)\\ &\leq \frac{\|A^{-1}\|\cdot \|A\|}{1-\|A^{-1}\| \cdot \| \Delta A\|}\left(\frac{\|\Delta A\|}{\|A\|} + \frac{\|\Delta b\|}{\|b\|}\right), \end{align*} \end{split}\]

where in the last step we used that \(\|b\| = \|Ay\| \leq \|A\|\cdot\|y\|\). To answer the first question above, let \(\Delta A = 0\). Then \(A\Delta y = \Delta b\), and thus

\[\begin{split} \begin{align*} \kappa_1(b) = & \lim_{\delta\rightarrow 0} \sup_{b,b+\Delta b \in \mathbb{R}^n \atop 0 < \| \Delta b \| \leq \delta} \frac{\| f_1(b+\Delta b) - f_1(b) \|_{\mathbb{R}^n} / \| f_1(b) \|_{\mathbb{R}^n}}{\| \Delta b \|_{\mathbb{R}^n} / \| b \|_{\mathbb{R}^n}} = \lim_{\delta\rightarrow 0} \sup_{b,\Delta b \in \mathbb{R}^n \atop 0 < \| \Delta b \| \leq \delta} \frac{\| \Delta y \|_{\mathbb{R}^n} / \| y \|_{\mathbb{R}^n}}{\| \Delta b \|_{\mathbb{R}^n} / \| b \|_{\mathbb{R}^n}}\\ \leq & \lim_{\delta\rightarrow 0} \sup_{b,\Delta b \in \mathbb{R}^n \atop 0 < \| \Delta b \| \leq \delta} \frac{\|A^{-1}\|\cdot \|A\|}{1-\|A^{-1}\| \cdot \| \Delta A\|} \leq \|A^{-1}\|\cdot \|A\|. \end{align*} \end{split}\]

Similarly, to answer the second question, let \(\Delta b = 0\). Then \((A + \Delta A)\Delta y = \Delta A y\), and so

\[\begin{split} \begin{align*} \kappa_2(A) = & \lim_{\delta\rightarrow 0} \sup_{A,A+\Delta A \in \mathbb{R}^{n \times n} \atop 0 < \| \Delta A \| \leq \delta} \frac{\| f_2(A+\Delta A) - f_2(A) \|_{\mathbb{R}^{n \times n}} / \| f_2(A) \|_{\mathbb{R}^{n \times n}}}{\| \Delta A \|_{\mathbb{R}^{n \times n}} / \| A \|_{\mathbb{R}^{n \times n}}} \\ = & \lim_{\delta\rightarrow 0} \sup_{A,A+\Delta A \in \mathbb{R}^{n \times n} \atop 0 < \| \Delta A \| \leq \delta} \frac{\| \Delta y \|_{\mathbb{R}^{n \times n}} / \| y \|_{\mathbb{R}^{n \times n}}}{\| \Delta A \|_{\mathbb{R}^{n \times n}} / \| A \|_{\mathbb{R}^{n \times n}}} \leq \|A^{-1}\|\cdot \|A\|. \end{align*} \end{split}\]

Therefore, both condition numbers \(A \mapsto A^{-1}b\) and \(b \mapsto A^{-1}b\) are bounded above by \(\|A^{-1}\|\cdot \|A\|\). This bound is sharp, as recorded by the following fact.

Fact: Condition number of linear systems

Assume that \(A\in\mathbb{R}^{n\times n}\) is non-singular and \(b\in\mathbb{R}^n\) is non-zero. Then

\[ \kappa_1(b) = \kappa_2(A) = \|A^{-1}\|\cdot \|A\|. \]

The condition number \(\kappa_{rel}(A)\) has an important interpretation. It measures how close the linear system is to being singular. More precisely,

\[ \min\left\{\frac{\|\Delta A\|_2}{\|A\|_2}: A + \Delta A \text{ singular} \right\} = \frac{1}{\kappa_{rel}(A)}. \]

We do not prove this result here.

Python skills#

You can compute the condition number of a matrix using the numpy.linalg.cond function. Here are a few examples:

import numpy as np

# Example 1: 2x2 matrix
A = np.array([[1, 1], [0, 0.01]])
cond_number = np.linalg.cond(A)
print(f"Condition number of A: {cond_number}")

# Example 2: 3x3 matrix
B = np.array([[1, 2, 3], [0, 5, 6], [7, 0, 9]])
cond_number_B = np.linalg.cond(B)
print(f"Condition number of B: {cond_number_B}")

# Example 3: Using different norms
C = np.array([[2, 3], [-1, 1]])
cond_number_C_1 = np.linalg.cond(C, 1)  # 1-norm
cond_number_C_inf = np.linalg.cond(C, np.inf)  # Infinity norm
cond_number_C_2 = np.linalg.cond(C, 2)  # 2-norm
print(f"Condition number of C (1-norm): {cond_number_C_1}")
print(f"Condition number of C (Infinity norm): {cond_number_C_inf}")
print(f"Condition number of C (2-norm): {cond_number_C_2}")

Self-check questions#

Question

Compute the \(1-\)norm condition number of

\[\begin{split} A = \begin{pmatrix}1 & 1\\ 0 & \epsilon\end{pmatrix} \end{split}\]

as a function of \(\epsilon\). What happens as \(\epsilon\rightarrow 0\)?

Answer

We have

\[\begin{split} A^{-1} = \begin{pmatrix}1 & -\epsilon^{-1}\\ 0 & \epsilon^{-1}\end{pmatrix}. \end{split}\]

Hence,

\[ \kappa(A) = \|A\|_1\cdot \|A^{-1}\|_1 = (1+\epsilon) \cdot \frac{2}{\epsilon}. \]

and \(\kappa(A)\rightarrow\infty\) as \(\epsilon\rightarrow 0\). The reason is simple: For \(\epsilon=0\) the matrix is not invertible and we expect the condition number to become unbounded as we reach this limit case.

Question (Sharp condition number)

Let

\[\begin{split} A = \left(\begin{array}{rr} 5 &-18\\ -5 &19\end{array}\right). \end{split}\]

Compute local relative condition number \(\kappa_1\) and find vectors \(b\), \(\Delta b\) such that \(A(x+\Delta x)=b+\Delta b\) and

\[ \frac{\|\Delta x\|_1^{}}{\|x\|_1^{}} = \kappa_1 \frac{\|\Delta b\|_1^{}}{\|b\|_1^{}}. \]

Here \(\| \cdot \|_1^{}\) denotes the \(1\)-norm. Normalise your vectors so that \(\|x\|_1^{}=1\) and \(\|\Delta b\|_1^{}=0.01\).

Answer

We have \(\|A\|_1^{}=\|A\|_\textrm{col}^{}=37\) and

\[\begin{split} A^{-1} = \left(\begin{array}{rr} \frac{19}{5} & \frac{18}{5}\\ 1 & 1\end{array}\right) \qquad\Rightarrow\qquad \|A^{-1}\|_1^{}=\|A^{-1}\|_\textrm{col}^{}=\frac{24}{5} \quad\textrm{and}\quad \kappa_1 = 177 \frac{3}{5}. \end{split}\]

To satisfy \(\|b\|_1^{}=\|A\|_1^{}\|x\|_1^{}\), we could take \(x=(0,1)^\top\) which gives \(b=(-18,19)^\top\). And to satisfy \(\| \Delta x\|_1^{}=\|A^{-1}\|_1^{}\|\Delta b\|_1^{}\), we could take \(\Delta b=(0.01,0)^\top\). By construction,

\[ \frac{\|\Delta x\|_1^{}}{\|x\|_1^{}} = \kappa_1 \frac{\|\Delta b\|_1^{}}{\|b\|_1^{}}. \]

Question (Sharp condition number)

Repeat for

\[\begin{split} A = \left(\begin{array}{rrr} -4 & 8 & -7\\ -5 & 3 & 2\\ -3 & 3 & -9\end{array}\right), \qquad A^{-1} = \left( \begin{array}{rrr} \frac{11}{78} & -\frac{17}{78} & -\frac{37}{234} \\ \frac{17}{78} & -\frac{5}{78} & -\frac{43}{234} \\ \frac{1}{39} & \frac{2}{39} & -\frac{14}{117} \\ \end{array} \right). \end{split}\]
Answer

We have \(\|A\|_1^{}=\|A\|_\textrm{col}=18\) and \(\|A^{-1}\|_1^{} = \frac{108}{234} = \frac6{13}\), so \(\kappa_1(A) = 108/13\). In the derivation of the error formula, there are only two inequalities: \(\|\Delta x\|_1^{}\le\|A^{-1}\|_1^{}\|\Delta b\|_1^{}\) and \(\|b\|_1^{}\le \|A\|_1^{}\,\|x\|_1^{}\). We therefore choose \(x\) such that \(\|b\|_1^{}=\|A\|_1^{}\,\|x\|_1^{}\), namely \(x=(0,0,1)^\top\) which gives \(b=(-7,2,-9)^\top\), and \(\Delta b\) such that \(\|\Delta x\|_1^{}=\|A^{-1}\|_1^{}\|\Delta b\|_1^{}\), e.g. \(\Delta b=(0,0,0.01)^\top\).

Question

Let

\[\begin{split} A = \begin{pmatrix}2 & 1 \\ 1 & 2\end{pmatrix}. \end{split}\]

Compute \(\kappa_1(A)\), \(\kappa_\infty(A)\), \(\kappa_2(A)\), which denote here the condition numbers with respect to the \(1\)-, \(\infty\)- and \(2\)-norm. Which is largest?

Answer

First compute \(A^{-1}\):

\[\begin{split} A^{-1} = \frac{1}{3}\begin{pmatrix}2 & -1 \\ -1 & 2\end{pmatrix}. \end{split}\]

For the \(1\)-norm and \(\infty\)-norm:

\[ \|A\|_1 = \|A\|_\infty = \max_{\textrm{row or col}} (2+1) = 3. \]

Similarly,

\[ \|A^{-1}\|_1 = \|A^{-1}\|_\infty = 1. \]

Thus,

\[ \kappa_1(A) = \kappa_\infty(A) = 3. \]

For the \(2\)-norm, the eigenvalues of \(A\) are \(3\) and \(1\), so

\[ \|A\|_2 = 3,\quad \|A^{-1}\|_2 = \frac{1}{1} = 1, \]

hence

\[ \kappa_2(A) = 3. \]

All three condition numbers coincide and equal \(3\) in this case.

Question

Consider

\[\begin{split} A = \begin{pmatrix}1 & \tfrac{1}{2} \\[6pt] \tfrac{1}{2} & 1\end{pmatrix}. \end{split}\]

Its eigenvalues are \(\lambda_{\max}=1.5\) and \(\lambda_{\min}=0.5\). Suppose we add a small perturbation \(\Delta A\) and consider no perturbation in \(b\) (i.e. \(\Delta b=0\)). Estimate, using the \(2\)-norm condition number, how large the relative perturbation \(\frac{\|\Delta A\|_2}{\|A\|_2}\) can be before \(A+\Delta A\) becomes close to singular.

Answer

Since \(\|A\|_2 = 1.5\) and \(\|A^{-1}\|_2 = 1/\lambda_{\min} = 2\), the condition number is

\[ \kappa_2(A) = \|A\|_2\|A^{-1}\|_2 = 1.5 \cdot 2 = 3. \]

Recall

\[ \min\left\{\frac{\|\Delta A\|_2}{\|A\|_2} : A+\Delta A \text{ singular}\right\} = \frac{1}{\kappa_2(A)} = \frac{1}{3}. \]

Hence, a relative perturbation of about \(1/3\) in norm is sufficient for \(A+\Delta A\) to become close to singular. Since \(\|A\|_2=1.5\), this corresponds to \(\|\Delta A\|_2 = 0.5\).

Optional material#

Proof of the convergence lemma for the matrix-valued von Neumann series

Here is the proof of the convergence lemma for matrix-valued von Neumann series.

Proof. Let \(S_n = \sum_{i=0}^n X^i\). By norm equivalence and submultiplicativity we have for each matrix element \((S_n)_{\ell, t}\) that

\[ |(S_n)_{\ell, t}| \leq \sum_{i=0}^{n}\left|\left(X^i\right)_{\ell, t}\right|\leq \sum_{i=0}^n \underbrace{\max_{\ell, t}\left|\left( X^i\right)_{\ell, t}\right|}_{\text{this is a norm}} \leq C\sum_{i=0}^{\infty}\|X^{i}\|\leq C\sum_{i=0}^\infty \|X\|^{i}= \frac{C}{1 - \|X\|}. \]

for some \(C> 0\). Hence, every component of \(S_n\) is absolutely convergent and therefore the sum \(S_n\) converges with \(X^{n}\rightarrow 0\) as \(n\rightarrow\infty\). We conclude the \(S := \sum_{i=0}^{\infty}X^{i}\) exists.

We find \((I - X)S_n = \sum_{i = 0}^n X^i - \sum_{i = 1}^{n + 1} X^i = I - X^{n+1}\), cancelling common terms in the sums. Taking the limit as \(n \to \infty\) we have \((I-X)S = I\),so \((I-X)\) is non-singular with \((I-X)^{-1} = S\). Finally,

\[ \|(I-X)^{-1}\| = \|S\| \leq \sum_{i=0}^{\infty}\|X\|^{i} = \frac{1}{1-\|X\|}. \]