Measuring distances

Measuring distances#

Much of computational mathematics concerns the question of how to approximate the infinite by the finite. The reason is obvious: In mathematics infinity is ubiquitous, while computers are limited to the finite.

For this reason, many problems in computational mathematics fall into one of the following categories:

Infinite-dimensional problems: Solving an ordinary differential equation involves finding a solution within an infinite-dimensional space, like the vector space \(C^1[a, b]\) of continuously differentiable functions. In other words, the problem generally has an infinite number of degrees of freedom. While mathematical theory may provide analytic solutions for special ODEs, for challenging real-world problems, it is often only possible to make general qualitative statements about the solution. The question arises whether a computer can provide quantitative information about the solution. Since a computer can only handle a finite number of degrees of freedom, any numerical approach inevitably introduces an approximation error, resulting in a discrepancy between the numerical and exact solutions.

Infinite sets: Even finite-dimensional vector spaces, like the real number line \(\mathbb{R}\), are infinite sets that require finite representation on computers. This necessitates the use of floating-point numbers (discussed later) and introduces approximation errors.

Iterative problems: The solution to many mathematical problems can be expressed as the limit of a sequence, where each sequence element can be computed in a finite number of steps. For example, consider a Riemann integral as the limit of increasingly refined Riemann sums or the root of a function as the limit of a Newton iteration. While computers can calculate sequence elements in finite time, it’s crucial to consider the error, i.e., the difference between a computed element and the true limit.

When the objects of interest, such as errors, belong to a vector space, a norm is often the most appropriate tool for quantifying their magnitude.

Definition 1 (Norm axioms)

Let \(V\) be a real vector space, \(\mathbb{R}^{+} := [0, \infty)\), and \(\rho: V \rightarrow \mathbb{R}^{+}\) a mapping from \(V\) into the non-negative numbers. \(\rho\) is called a norm if it satisfies the following properties:

\(\rho(\alpha x) = |\alpha| \rho(x)~\forall \alpha \in \mathbb{R};~x \in V\)
\(\rho(x + y) \leq \rho(x) + \rho(y)~\forall x, y \in V\) (triangle inequality)
\(\rho(x) = 0\) if and only if \(x = 0\).

These axioms imply that

\[ 0 = \rho(0) = \rho(x - x) \leq 2 \rho(x) \]

and therefore \(\rho(x) \geq 0\) for all \(x \in V\). Hence, a norm is always non-negative.

The first and third norm axioms are usually straightforward to prove. The triangle inequality is typically the most challenging axiom to prove.

Frequently used vector norms#

Let \(V\) be the real vector space \(\mathbb{R}^n\). Let \(x \in V\) and denote by \(x_j\) the \(j\)-th component of \(x\).

The most frequently used vector norms are:

The \(1\)-norm: \(\|x\|_1 = \sum_{j=1}^n |x_j|\).
The \(2\)-norm: \(\|x\|_2 = \left(\sum_{j=1}^n |x_j|^2\right)^{1/2}\).
The \(\infty\)-norm: \(\|x\|_{\infty} = \max_j |x_j|\).

The first two are special cases of the \(\ell_p\)-norm, defined as

\[ \|x\|_p = \left(\sum_{j=1}^n |x_j|^p\right)^{1/p}. \]

The third is the limit as \(p \to \infty\): \(\|x\|_{\infty} = \lim_{p \to \infty} \|x\|_p\) for all \(x\).

Norm equivalence#

Which norm should we use? This usually depends on the application. A New York taxi driver will prefer the \(1\)-norm when calculating distances as they need to follow a street map and cannot travel diagonally to locations.

In Numerical Analysis, we often want to prove whether a given numerical approximation converges. We consider several computations indexed by \(n\) and have a corresponding error quantity \(e_n\). We want to show that the sequence of errors \(\left\{e_n\right\}\) converges to \(0\) as \(n \to \infty\). Can it happen that an error converges to \(0\) in one norm but not in another norm? In finite-dimensional vector spaces, the answer is no. If a sequence converges in one norm, it also converges in all others. This is due to the following norm equivalence theorem.

Theorem 1 (Norm equivalence in finite-dimensional vector spaces)

Let \(V\) be a finite-dimensional vector space. Consider two norms \(\|\cdot\|_a\) and \(\|\cdot\|_b\) on \(V\). Then, there exist positive constants \(C_1\), \(C_2\) such that for all \(x\in V\)

\[ C_1\|x\|_a\leq \|x\|_b\leq C_2\|x\|_a \]

Proof. The proof requires some analysis and is not part of the lecture. A good reference for those interested in how it works is the following note by Steve Johnson at MIT.

We note that norm equivalence is only a property of finite-dimensional vector spaces. For infinite-dimensional spaces (e.g., spaces of functions), it is easy to construct examples of sequences converging with respect to one norm but not another.

Matrix norms#

Let \(A \in \mathbb{R}^{m \times n}\). Consider vector norms \(\|x\|_a\) for \(x \in \mathbb{R}^n\) and \(\|y\|_b\) for \(y \in \mathbb{R}^m\). Then, an induced matrix norm can be defined as follows:

\[ \|A\|_{b, a} := \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_b}{\|x\|_a} \]

If the same norm is used for both the domain and co-domain, we abbreviate \(\|A\|_{a, a}\) as \(\|A\|_a\).

The following three matrix norms derive from the corresponding vector norms (where \(\lambda_{\max}\) denotes the largest eigenvalue):

\(\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^m |a_{ij}| = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_1}{\|x\|_1}\),
\(\|A\|_2 = \sqrt{\lambda_{\max}(A^\top A)} = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_2}{\|x\|_2}\),
\(\|A\|_{\infty} = \max_{1 \leq i \leq m} \sum_{j=1}^n |a_{ij}| = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_\infty}{\|x\|_\infty}\).

Observe that \(\|A\|_1 = \|A^\top\|_{\infty}\). Calculating the \(||A||_2\) matrix norm is computationally expensive, as it requires determining the largest eigenvalue of \(A^\top A\). Therefore, the Frobenius norm is often used as an alternative:

\[ \|A\|_F := \left(\sum_{ij} |a_{ij}|^2 \right)^{1/2}. \]

One can show that

\[ \|A\|_2 \leq \|A\|_F \leq \sqrt{n} \|A\|_2, \qquad A \in \mathbb{R}^{n \times n}. \]

Submultiplicativity#

Let \(A \in \mathbb{R}^{m \times n}\) and \(B \in \mathbb{R}^{n \times k}\) with a corresponding vector-induced matrix norm \(\|\cdot\|\). For any \(v \in \mathbb{R}^k\), it holds \(\| A w \| \leq \| A \| \, \| w \|\). Thus

\[ \|ABv\| \leq \|A\|\cdot\|Bv\|\leq \|A\|\cdot \|B\| \cdot \|v\| \]

and therefore

\[ \|A \cdot B\|\leq \|A\|\cdot \|B\|. \]

This property is called submultiplicativity. We have just proved it for any vector-induced matrix norm.

Relative and absolute errors#

Let \(V\) be a real vector space such as \(\mathbb{R}^n\) or \(\mathbb{R}^{m \times n}\). Let \(x \in V\) be a quantity of interest and \(\tilde{x} \in V\) be its approximation. Given a norm \(\| \cdot \|\) of \(V\), we call \(\| x - \tilde{x} \|\) the absolute error of the approximation (with respect to \(\| \cdot \|\)). If \(x \neq 0\) then \(\| x - \tilde{x} \| / \| x \|\) is the relative error.

Python skills#

Defining vectors and matrices with NumPy#

To work with vectors and matrices in Python, we can use the numpy library. Here is how you can define them:

import numpy as np

# Define a vector
x = np.array([1, 2, 3])

# Define a matrix
A = np.array([[2, 3], 
              [0, 1]])

Computing vector norms#

NumPy provides built-in functions to compute various norms of vectors:

from numpy.linalg import norm

# Compute 1-norm, 2-norm, and infinity-norm of a vector
norm_1 = norm(x, ord=1)       # 1-norm
norm_2 = norm(x)              # 2-norm (default)
norm_inf = norm(x, ord=np.inf)  # Infinity-norm

print(f"1-norm: {norm_1}, 2-norm: {norm_2}, Infinity-norm: {norm_inf}")

Computing matrix norms#

Similarly, matrix norms can also be computed:

# Compute Frobenius norm of a matrix
norm_F = norm(A, ord='fro')   # Frobenius norm

# Compute 1-norm, 2-norm, and infinity-norm of a matrix
norm_1 = norm(A, ord=1)       # 1-norm
norm_2 = norm(A, ord=2)       # 2-norm
norm_inf = norm(A, ord=np.inf)  # Infinity-norm

print(f"Frobenius norm: {norm_F}, 1-norm: {norm_1}, 2-norm: {norm_2}, Infinity-norm: {norm_inf}")

Self-check questions#

Question

Let \(A\in\mathbb{R}^{m\times n}\). Prove that

\(\|A\|_1 = \max_j \sum_i|a_{ij}|\).
\(\|A\|_{\infty} = \max_i \sum_j|a_{ij}|\).

Question

For the matrix \(A = \begin{pmatrix} 2 & 3 \\ 0 & 1\end{pmatrix}\), compute \(\|A\|_p\) for \(p=1, 2, \infty, F\).

Question

Show that \(\|x\|_{\infty} = \lim_{p\rightarrow\infty} \|x\|_p\) for \(x\in\mathbb{R}^n\).

Question

Show that the Frobenius norm is submultiplicative: For matrices \(A\) and \(B\) with compatible dimensions such that the product \(AB\) exists, it holds that

\[ \|AB\|_F\leq \|A\|_F\cdot \|B\|_F. \]

Measuring distances

Contents

Measuring distances#

Frequently used vector norms#

Norm equivalence#

Matrix norms#

Submultiplicativity#

Relative and absolute errors#

Python skills#

Defining vectors and matrices with NumPy#

Computing vector norms#

Computing matrix norms#

Self-check questions#