Python implementation

Python implementation#

In the following, we demonstrate a simple LU decomposition in Python. We also present timing comparisons against the Python implementation from Scipy to show that one should never use one’s own implementation of the LU decomposition but always use existing NumPy/Scipy routines.

import numpy as np
from scipy.linalg import lu as scipy_lu

def lu(A):
    """Our cool LU implementation."""

    if not A.shape[0] == A.shape[1]:
        raise ValueError("Input matrix must be square.")

    n = A.shape[0]

    L = np.zeros((n, n), dtype=np.float64)
    U = np.zeros((n, n), dtype=np.float64)

    U[:] = A

    np.fill_diagonal(L, 1)

    for col in range(n-1):
        for row in range(col + 1, n):
            L[row, col] = U[row, col] / U[col, col]
            U[row, col:] = U[row, col:] - L[row, col] * U[col, col:]
            U[row, col] = 0

    return (L, U)

The optimised code of the SciPy library is significantly faster. Here is a test case.

n = 200
A = np.random.randn(n, n)
L, U = lu(A)

residual = np.linalg.norm(A - L @ U) / np.linalg.norm(A)
print(f"The residual is: {residual}")

runtime_my_lu = %timeit -o lu(A)
runtime_scipy_lu = %timeit -o scipy_lu(A)

print(f"Runtime of our LU (s): {runtime_my_lu.best}")
print(f"Runtime of Scipy LU (s): {runtime_scipy_lu.best}")

print(f"The speed ratio between the two implementations is: {runtime_my_lu.best / runtime_scipy_lu.best}")

A typical output is below, with some variation to be expected between different computers.

The residual is: 2.285256427880218e-13
50.6 ms ± 474 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
2.26 ms ± 461 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Runtime of our LU (s): 0.05018079660003423
Runtime of Scipy LU (s): 0.0018813237100403057
The speed ratio between the two implementations is: 26.673132503581297