LU decomposition#
Solving linear systems of equations is among the most fundamental operations in a computer. Many problems (even nonlinear ones) eventually reduce to sequences of linear system solves.
Most students learn about Gaussian elimination, also known as row reduction, in school. The LU decomposition is simply a more organised way to perform Gaussian elimination.
Basic idea#
We consider the linear system of equations
with \(A \in \mathbb{R}^{n \times n}\). Throughout, assume that \(A\) is nonsingular. The LU decomposition applies similarly to complex matrices.
Many techniques in numerical linear algebra rely on factoring a matrix into products of simpler matrices. The LU decomposition is one such factorization:
This represents a decomposition of \(A\) into the product of two matrices, \(L\) and \(U\). Substituting this into the system gives
To solve this system, we now have to solve two matrix problems:
This might seem cumbersome. Instead of one linear system, we now have two. However, the trick is to choose a decomposition that makes solving linear systems with \(L\) and \(U\) particularly easy.
In LU decomposition, \(L\) is a lower triangular matrix, meaning its nonzero elements are confined to the diagonal and the entries below it. Similarly, \(U\) is an upper triangular matrix, where nonzero elements appear on the diagonal and above it.
Solving an upper triangular system#
Consider an upper triangular system \(Ux = y\). Its equations can be written as
for \(i = n, n-1, \dots, 1\). This leads to the well-known back-substitution procedure:
First, find \(x_n = y_n / u_{n,n}\).
Then, for each \(i\) from \(n-1\) down to \(1\),
\[ x_i = \frac{y_i - \sum_{j=i+1}^n u_{i,j} x_j}{u_{i,i}}. \]
The following fact ensures that there is no division by zero in the back-substitution procedure if \(U\) is invertible.
Fact: Invertible triangular systems
A triangular matrix is invertible if and only if its diagonal elements are non-zero.
What is the cost of solving this triangular system? We count the number of additions, subtractions, multiplications, and divisions.
For each \(i\), there are \(n - i\) multiplications and \(n - i - 1\) additions, plus one subtraction and one division.
Hence, the overall number of operations (\(ops\)) is
\[ ops = \sum_{i=1}^n [(n-i) + (n-i-1) + 1 + 1] = \sum_{i=1}^n [2 (n-i) + 1] = n^2. \]
Thus, triangular solves are efficient and form the backbone of methods like LU decomposition for solving large linear systems.
Gaussian elimination in matrix notation#
To derive the LU decomposition, we express Gaussian elimination operations as matrix multiplications.
At the beginning, we have \(A^{(1)} = A\) and \(b^{(1)} = b\). At each stage of Gaussian elimination, we eliminate the entries of a column below the diagonal. Let \(A^{(s)}\) and \(b^{(s)}\) be the matrix and right-hand side we obtain after eliminating the first \(s-1\) columns.
Schematically, we have the picture
To be more precise, given \(A^{(s)}\), subtract \(A^{(s)}_{is} / A^{(s)}_{ss}\) times row \(s\) from row \(i \in \{s+1, \ldots, n\}\):
Note that on a computer, the computation only needs to be carried out for \(j \in \{ s+1, \ldots, n\}\): for \(j \in \{1, \ldots, s-1\}\) both \(A^{(s)}_{ij}\) and \(A^{(s)}_{sj}\) vanish; for \(j = s\) the formula is precisely chosen to guarantee the cancellation of terms. Through successive elimination of columns, we reduce \(A\) to an upper triangular matrix \(A^{(n)}\) after \(n-1\) steps. We now rewrite the elimination step as matrix multiplication
where
Definition 3
A matrix where all diagonal elements are \(1\), with zeros elsewhere except for nonzero entries in column \(s\) below the diagonal is called a Frobenius matrix of index \(s\).
We denote the \(s\)th unit vector by \(e^{(s)}\), thus \(e_k^{(s)} = 1\) if \(s = k\) and \(e_k^{(s)} = 0\) otherwise. Any Frobenius matrix of index \(s\) can be written by choosing a vector
containing the subdiagonal elements:
Here \(I\) is the identity matrix and \(\otimes\) is the outer (or tensor) product: \(a \cdot b^\top = a \otimes b\) for \(a \in \mathbb{R}^m, b \in \mathbb{R}^n\). Observe that \(a \otimes b \in \mathbb{R}^{m \times n}\) and distinguish it from the inner product \(a^\top \cdot b = \langle a, b\rangle \in \mathbb{R}\).
Before continuing, make sure you fully understand
why \(A^{(s+1)} = F^{(s)} \, A^{(s)}\) precisely implements the Gaussian elimination in column \(s\),
why all Frobenius matrices \(F\) of index \(s\) are of the form \(I - f \otimes e^{(s)}\) where the first \(s\) elements of \(f\) vanish.
Also look at the self-check exercises below.
Deriving the LU decomposition#
So far, we have changed the description of Gaussian elimination, but have not added any substantially new mathematics. That is going to change now: the inverses of Frobenius matrices are known and efficiently computable. This is a rare property!
Lemma 2
Let \(F = I - f \otimes e^{(s)}\) be as above. Then
is the inverse of \(F\).
Proof. To show \(F \, G = I\), we compute the entries of the product:
Here \(\delta_{ij}\) is the Kronecker product.
Note that the inverse of a Frobenius matrix is also a Frobenius matrix. We now learn that products of Frobenius matrices can be computed with great ease.
Lemma 3
For \(s = 1, \ldots, n-1\) let \(G^{(s)} = I + f^{(s)} \otimes e^{(s)}\) be a Frobenius matrix of index \(s\). Then
Proof. We begin induction in \(s\) with \(s = 1\):
Suppose the statement of the lemma holds for \(s\). Then for the element in row \(i\) and column \(j\) of the product we find
Now everything is in place to return to Gaussian elimination:
With \(L = G^{(1)} \cdots G^{(n-1)}\), we conclude
We showed that if Gaussian elimination works (without causing division by 0 or needing row or column permutations, called pivoting) then we can find an LU decomposition.
Theorem 3
Suppose \(A \in \mathbb{R}^{n \times n}\) can be transformed into an upper triangular \(U \in \mathbb{R}^{n \times n}\) by Gaussian elimination (without pivoting). Then there exists a lower triangular \(L \in \mathbb{R}^{n \times n}\) such that
The complete algorithm#
Consider that we want to solve the linear system
First, we compute the LU decomposition
using Gaussian elimination, storing the \(\ell_{ij}\) multipliers for each row in the corresponding positions of the \(L\) matrix.
Next, we solve \(Ly = b\) by forward substitution. This involves solving for \(y_1\) first, then \(y_2\), and so on. The algorithm is similar to the triangular solve discussed earlier, but we proceed from the first entry to the last entry of \(y\) since the matrix \(L\) is lower triangular. Additionally, since all diagonal entries of \(L\) are \(1\), we do not need to divide by \(L_{ii}\).
Finally, we solve \(Ux = y\) by backward substitution.
Python skills#
To perform LU decomposition in Python, we can use the scipy.linalg.lu
function from the SciPy library. This function returns the permutation matrix P
, lower triangular matrix L
, and upper triangular matrix U
such that PA = LU
. We shall investigate the use of P
in the next section.
import numpy as np
from scipy.linalg import lu
# Example matrix
A = np.array([[3, 1, 6], [2, 1, 3], [1, 1, 1]])
# Perform LU decomposition
P, L, U = lu(A)
print("P:\n", P)
print("L:\n", L)
print("U:\n", U)
We can use the above factors to solve the linear system with different right-hand sides. The inverse of P
is its own transpose.
# Example right-hand sides
b1 = np.array([1, 2, 3])
b2 = np.array([1, -2, 3])
# Check that P is its own inverse
print(f"P * P =\n {P @ P}\n")
def trig_solve(P, L, U, b):
# Forward substitution to solve Ly = P b
y = np.linalg.solve(L, np.dot(P, b))
# Backward substitution to solve Ux = y
x = np.linalg.solve(U, y)
return x
print(f"Solution x1: {trig_solve(P, L, U, b1)}")
print(f"Solution x2: {trig_solve(P, L, U, b2)}")
Self-check questions#
Question
Write down how to solve the below general \(3 \times 3\) system
with Gaussian elimination. You may assume that the system is invertible.
Answer
We want to transform this system into an upper triangular system. In the first step, we multiply the first row with \(\ell_{21} = \frac{a_{21}}{a_{11}}\) and subtract from the second row to obtain the modified system
with
and \(b_2\) modified correspondingly. We then define \(\ell_{31} = \frac{a_{31}}{a_{11}}\) and proceed in a similar way with the third row to obtain
We now continue in the same way with the bottom right matrix. Define the element \(\ell_{32} = \frac{\tilde{a} _{32}}{\tilde{a}_{22}}\), multiply the second row with it and subtract from the third row, giving at the end an upper triangular system of the form
We can now easily solve this system by backward substitution to compute first \(x_3\), then \(x_2\), and finally \(x_1\).
Question
Do \(LU\) decomposition on
Answer
We have
Question
A triangular matrix is invertible if and only if its diagonal elements are non-zero.
Answer
(⇒) Assume that a triangular matrix \(A \in \mathbb{R}^{n \times n}\) is invertible. An invertible matrix has a non-zero determinant. For a triangular matrix, whether upper or lower, the determinant is the product of its diagonal elements. Thus, if \(A\) is invertible, its determinant is non-zero. This implies that each diagonal element must be non-zero.
(⇐) Now assume that all diagonal elements of a triangular matrix \(A\) are non-zero. The determinant of a triangular matrix is the product of its diagonal entries. Since none of these entries is zero, their product is also non-zero. Hence, \(\det(A) \neq 0\). A non-zero determinant implies that \(A\) is invertible.
Comment: There are other valid methods to prove the result. For example one can argue with the linear independence of the columns.
Question
Eliminate the elements below the diagonal in the first column of the matrix
using multiplication with a Frobenius matrix.
Answer
We want to eliminate the elements in the second and third rows, first column. The multipliers for the second and third rows are:
The Frobenius matrix \(F^{(1)}\) for this step is therefore:
Now, we compute
As required, the multiplication eliminated two elements.
Question
The Frobenius matrix
has the form \(F = I - f \cdot g^\top\). What are \(f, g \in \mathbb{R}^3\)?
Answer
The vector \(f\) containing the subdiagonal elements is:
The vector \(g\) is the first unit vector:
The Frobenius matrix \(F\) is then constructed as:
Thus, the Frobenius matrix has the form \(I - f \cdot g^\top\).
Question
Prove the Sherman-Morrison formula
if \(1+v^\top A^{-1}u\neq 0\).
Answer
We have
Similarly, we could show that
Hence, the formula holds.
Question
Compute the LU decomposition of the following matrix:
Answer
Column 1: The multipliers for the second and third rows are:
The Frobenius matrix \(F^{(1)}\) for this step is:
Now, compute \(A^{(2)} = F^{(1)} \, A\):
Column 2: The multiplier for the third row is:
The Frobenius matrix \(F^{(2)}\) for this step is:
Now, compute \(A^{(3)} = F^{(2)} \, A^{(2)}\):
Constructing \(L\) and \(U\): The matrix \(U\) is the final upper triangular matrix \(A^{(3)}\):
The matrix \(L\) is constructed from the multipliers used in the elimination steps:
Thus, the LU decomposition of \(A\) is:
Question
Suppose that \(A\) is symmetric and \(A = L U\) is its \(LU\) decomposition. Is \(U = L^\top\)?
Answer
No, here is a counterexample:
Question
Given a non-singular \(A \in \mathbb{R}^{3 \times 3}\), is its \(LU\) decomposition unique?
Answer
If \(L\) is only required to be lower triangular, it is clear that the \(LU\) decomposition is not unique: setting \(L'=\alpha L\) and \(U'=(1/\alpha)U\) for some scalar \(\alpha\), one has \(L'U'=LU=A\).
So let us consider our standard form where \(\ell_{ii}=1\). Setting
we find:
Here the \(\Rightarrow\) arrow points to a variable that is fully determined by the preceeding equation. Note that \(u_{ii} \neq 0\) for all \(i\) because \(A\) and thus also \(U\) are non-singular.
This proves the uniqueness of \(L\) and \(U\). In fact, we can extend this argument to general \(n \times n\) matrices.
Question
Let \(A \in \mathbb{R}^{n \times n}\). Prove that an LU decomposition of \(A\) with nonsingular factors \(L\) and \(U\) exists if and only if all leading principal minors of \(A\) (i.e. the determinants of the \(m \times m\) submatrices \(A_{:m, :m}\) for \(m = 1, \ldots, n\)) are nonzero. Determine whether such an LU decomposition exists for the matrix
Justify your conclusion.
Answer
Sufficiency: Suppose \(A\) admits an LU factorisation \(A = LU\) with nonsingular matrices \(L\) and \(U\). Each leading principal submatrix \(A[:m, :m]\) then factors as \(L[:m, :m] \, U[:m, :m]\), implying \(\det(A[:m, :m]) \neq 0\).
Necessity: Assume all leading principal minors of A are nonzero. We use induction on n. The base case for \(n = 1\) is trivial: \(A \in \mathbb{R}\), \(U = A\) and \(L = 1\). For the \((n-1)\times(n-1)\) case, the statement holds by the induction hypothesis. Now write
where \(\hat{A}\) is an \((n-1)\times(n-1)\) matrix. By the induction assumption, \(\hat{A}\) admits an LU factorisation \(\hat{A} = \hat{L}\hat{U}\) with nonsingular factors. Extend this to \(n \times n\) by setting
Because \(A\) is nonsingular, an LU decomposition exists where both blocks in the factorisation are nonsingular. In fact, the above proof ensures the existence of a decomposition with factor \(L\) whose diagonal elements are normalised to \(1\).
Specific Matrix: Consider
Its leading \(2\times 2\) principal submatrix is
whose determinant is \(0.5 \times 4 - (-1) \times (-2) = 2 - 2 = 0\). Since this minor has determinant zero, no LU decomposition with nonsingular factors exists for \(A\).