Python implementation#
In the following, we demonstrate a simple LU decomposition in Python. We also present timing comparisons against the implementation from SciPy to show that one should not use a handwritten LU decomposition in practice, but instead rely on existing NumPy/SciPy routines.
import numpy as np
from scipy.linalg import lu as scipy_lu
def lu(A):
"""A simple LU decomposition without pivoting."""
if not A.shape[0] == A.shape[1]:
raise ValueError("Input matrix must be square.")
n = A.shape[0]
L = np.zeros((n, n), dtype=np.float64)
U = np.zeros((n, n), dtype=np.float64)
U[:] = A
np.fill_diagonal(L, 1)
for col in range(n - 1):
for row in range(col + 1, n):
L[row, col] = U[row, col] / U[col, col]
U[row, col:] = U[row, col:] - L[row, col] * U[col, col:]
U[row, col] = 0
return (L, U)
This simple implementation works only when no zero pivots occur. The highly optimised code in SciPy is also significantly faster. Here is a test case.
n = 200
A = np.random.randn(n, n)
L, U = lu(A)
residual = np.linalg.norm(A - L @ U) / np.linalg.norm(A)
print(f"The residual is: {residual}")
runtime_my_lu = %timeit -o lu(A)
runtime_scipy_lu = %timeit -o scipy_lu(A)
print(f"Runtime of our LU (s): {runtime_my_lu.best}")
print(f"Runtime of SciPy LU (s): {runtime_scipy_lu.best}")
print(f"The speed ratio between the two implementations is: {runtime_my_lu.best / runtime_scipy_lu.best}")
A typical output is below, with some variation to be expected between different computers.
The residual is: 2.285256427880218e-13
50.6 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.26 ms ± 461 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Runtime of our LU (s): 0.05018079660003423
Runtime of SciPy LU (s): 0.0018813237100403057
The speed ratio between the two implementations is: 26.673132503581297