Measuring distances#

Much of computational mathematics concerns the question of how to approximate the infinite by the finite. The reason is obvious: In mathematics infinity is ubiquitous, while computers are limited to the finite.

For this reason, many problems in computational mathematics fall into one of the following categories:

Infinite-dimensional problems: Solving an ordinary differential equation involves finding a solution within an infinite-dimensional space, like the vector space \(C^1[a, b]\) of continuously differentiable functions. In other words, the problem generally has an infinite number of degrees of freedom. While mathematical theory may provide analytic solutions for special ODEs, for challenging real-world problems, it is often only possible to make general qualitative statements about the solution. The question arises whether a computer can provide quantitative information about the solution. Since a computer can only handle a finite number of degrees of freedom, any numerical approach inevitably introduces an approximation error, resulting in a discrepancy between the numerical and exact solutions.

Infinite sets: Even finite-dimensional vector spaces, like the real number line \(\mathbb{R}\), are infinite sets that require finite representation on computers. This necessitates the use of floating-point numbers (discussed later) and introduces approximation errors.

Iterative problems: The solution to many mathematical problems can be expressed as the limit of a sequence, where each sequence element can be computed in a finite number of steps. For example, consider a Riemann integral as the limit of increasingly refined Riemann sums or the root of a function as the limit of a Newton iteration. While computers can calculate sequence elements in finite time, it’s crucial to consider the error, i.e., the difference between a computed element and the true limit.

When the objects of interest, such as errors, belong to a vector space, a norm is often the most appropriate tool for quantifying their magnitude.

Definition 1 (Norm axioms)

Let \(V\) be a real vector space, \(\mathbb{R}^{+} := [0, \infty)\), and \(\rho: V \rightarrow \mathbb{R}^{+}\) a mapping from \(V\) into the non-negative numbers. \(\rho\) is called a norm if it satisfies the following properties:

  1. \(\rho(\alpha x) = |\alpha| \rho(x)~\forall \alpha \in \mathbb{R};~x \in V\)

  2. \(\rho(x + y) \leq \rho(x) + \rho(y)~\forall x, y \in V\) (triangle inequality)

  3. \(\rho(x) = 0\) if and only if \(x = 0\).

These axioms imply that

\[ 0 = \rho(0) = \rho(x - x) \leq 2 \rho(x) \]

and therefore \(\rho(x) \geq 0\) for all \(x \in V\). Hence, a norm is always non-negative.

The first and third norm axioms are usually straightforward to prove. The triangle inequality is typically the most challenging axiom to prove.

Frequently used vector norms#

Let \(V\) be the real vector space \(\mathbb{R}^n\). Let \(x \in V\) and denote by \(x_j\) the \(j\)-th component of \(x\).

The most frequently used vector norms are:

  • The \(1\)-norm: \(\|x\|_1 = \sum_{j=1}^n |x_j|\).

  • The \(2\)-norm: \(\|x\|_2 = \left(\sum_{j=1}^n |x_j|^2\right)^{1/2}\).

  • The \(\infty\)-norm: \(\|x\|_{\infty} = \max_j |x_j|\).

The first two are special cases of the \(\ell_p\)-norm, defined as

\[ \|x\|_p = \left(\sum_{j=1}^n |x_j|^p\right)^{1/p}. \]

The third is the limit as \(p \to \infty\): \(\|x\|_{\infty} = \lim_{p \to \infty} \|x\|_p\) for all \(x\).

Norm equivalence#

Which norm should we use? This usually depends on the application. A New York taxi driver will prefer the \(1\)-norm when calculating distances as they need to follow a street map and cannot travel diagonally to locations.

In Numerical Analysis, we often want to prove whether a given numerical approximation converges. We consider several computations indexed by \(n\) and have a corresponding error quantity \(e_n\). We want to show that the sequence of errors \(\left\{e_n\right\}\) converges to \(0\) as \(n \to \infty\). Can it happen that an error converges to \(0\) in one norm but not in another norm? In finite-dimensional vector spaces, the answer is no. If a sequence converges in one norm, it also converges in all others. This is due to the following norm equivalence theorem.

Theorem 1 (Norm equivalence in finite-dimensional vector spaces)

Let \(V\) be a finite-dimensional vector space. Consider two norms \(\|\cdot\|_a\) and \(\|\cdot\|_b\) on \(V\). Then, there exist positive constants \(C_1\), \(C_2\) such that for all \(x\in V\)

\[ C_1\|x\|_a\leq \|x\|_b\leq C_2\|x\|_a \]

Proof. The proof requires some analysis and is not part of the lecture. A good reference for those interested in how it works is the following note by Steve Johnson at MIT.

We note that norm equivalence is only a property of finite-dimensional vector spaces. For infinite-dimensional spaces (e.g., spaces of functions), it is easy to construct examples of sequences converging with respect to one norm but not another.

Matrix norms#

Let \(A \in \mathbb{R}^{m \times n}\). Consider vector norms \(\|x\|_a\) for \(x \in \mathbb{R}^n\) and \(\|y\|_b\) for \(y \in \mathbb{R}^m\). Then, an induced matrix norm can be defined as follows:

\[ \|A\|_{b, a} := \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_b}{\|x\|_a} \]

If the same norm is used for both the domain and co-domain, we abbreviate \(\|A\|_{a, a}\) as \(\|A\|_a\).

The following three matrix norms derive from the corresponding vector norms (where \(\lambda_{\max}\) denotes the largest eigenvalue):

  • \(\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^m |a_{ij}| = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_1}{\|x\|_1}\),

  • \(\|A\|_2 = \sqrt{\lambda_{\max}(A^\top A)} = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_2}{\|x\|_2}\),

  • \(\|A\|_{\infty} = \max_{1 \leq i \leq m} \sum_{j=1}^n |a_{ij}| = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_\infty}{\|x\|_\infty}\).

Observe that \(\|A\|_1 = \|A^\top\|_{\infty}\). Calculating the \(||A||_2\) matrix norm is computationally expensive, as it requires determining the largest eigenvalue of \(A^\top A\). Therefore, the Frobenius norm is often used as an alternative:

\[ \|A\|_F := \left(\sum_{ij} |a_{ij}|^2 \right)^{1/2}. \]

One can show that

\[ \|A\|_2 \leq \|A\|_F \leq \sqrt{n} \|A\|_2, \qquad A \in \mathbb{R}^{n \times n}. \]

Submultiplicativity#

Let \(A \in \mathbb{R}^{m \times n}\) and \(B \in \mathbb{R}^{n \times k}\) with a corresponding vector-induced matrix norm \(\|\cdot\|\). For any \(v \in \mathbb{R}^k\), it holds \(\| A w \| \leq \| A \| \, \| w \|\). Thus

\[ \|ABv\| \leq \|A\|\cdot\|Bv\|\leq \|A\|\cdot \|B\| \cdot \|v\| \]

and therefore

\[ \|A \cdot B\|\leq \|A\|\cdot \|B\|. \]

This property is called submultiplicativity. We have just proved it for any vector-induced matrix norm.

Relative and absolute errors#

Let \(V\) be a real vector space such as \(\mathbb{R}^n\) or \(\mathbb{R}^{m \times n}\). Let \(x \in V\) be a quantity of interest and \(\tilde{x} \in V\) be its approximation. Given a norm \(\| \cdot \|\) of \(V\), we call \(\| x - \tilde{x} \|\) the absolute error of the approximation (with respect to \(\| \cdot \|\)). If \(x \neq 0\) then \(\| x - \tilde{x} \| / \| x \|\) is the relative error.

Python skills#

Defining vectors and matrices with NumPy#

To work with vectors and matrices in Python, we can use the numpy library. Here is how you can define them:

import numpy as np

# Define a vector
x = np.array([1, 2, 3])

# Define a matrix
A = np.array([[2, 3], 
              [0, 1]])

Computing vector norms#

NumPy provides built-in functions to compute various norms of vectors:

from numpy.linalg import norm

# Compute 1-norm, 2-norm, and infinity-norm of a vector
norm_1 = norm(x, ord=1)       # 1-norm
norm_2 = norm(x)              # 2-norm (default)
norm_inf = norm(x, ord=np.inf)  # Infinity-norm

print(f"1-norm: {norm_1}, 2-norm: {norm_2}, Infinity-norm: {norm_inf}")

Computing matrix norms#

Similarly, matrix norms can also be computed:

# Compute Frobenius norm of a matrix
norm_F = norm(A, ord='fro')   # Frobenius norm

# Compute 1-norm, 2-norm, and infinity-norm of a matrix
norm_1 = norm(A, ord=1)       # 1-norm
norm_2 = norm(A, ord=2)       # 2-norm
norm_inf = norm(A, ord=np.inf)  # Infinity-norm

print(f"Frobenius norm: {norm_F}, 1-norm: {norm_1}, 2-norm: {norm_2}, Infinity-norm: {norm_inf}")

Self-check questions#

Question

Let \(A\in\mathbb{R}^{m\times n}\). Prove that

  • \(\|A\|_1 = \max_j \sum_i|a_{ij}|\).

  • \(\|A\|_{\infty} = \max_i \sum_j|a_{ij}|\).

Answer

From the definition of a vector-induced matrix norm, we have that

\[\begin{split} \begin{aligned} \|A\|_1 &= \max_{\|x\|_1 = 1} \sum_i |\left[Ax\right]_i|\\ &\leq \max_{\|x\|_1 = 1} \sum_{i} \sum_j |a_{ij}| |x_j|\\ &= \max_{\|x\|_1 = 1} \sum_j |x_j|\sum_i |a_{ij}|\\ &\leq \Big( \max_j\sum_i |a_{ij}| \Big) \, \Big( \max_{\|x\|_1 = 1} \sum_j|x_j| \Big)\\ &=\max_j\sum_i |a_{ij}|. \end{aligned} \end{split}\]

It remains to show that this upper bound is attainable. Let \(\ell = \text{argmax}_j \sum_i |a_{ij}|\) be an index \(\ell\) associated with the largest column sum. We then set \(y_j = 0\) for \(j\neq \ell\) and \(y_\ell = 1\). It follows that \(\| y \|_1 = 1\) and

\[ \frac{\| A y \|_1}{\| y \|_1} = \max_j\sum_i |a_{ij}|. \]

For \(\|A\|_{\infty}\) we obtain

\[ \|A\|_{\infty} = \max_{\|x\|_{\infty} = 1} \max_i |\left[Ax\right]_i| \leq \max_i \sum_{j}|a_{ij}|. \]

Let \(k\) be the row index for which the upper bound is attained. By choosing \(x_j = \text{sign}~a_{kj}\) we have that \(\|Ax\|_{\infty} = \max_i \sum_{j}|a_{ij}|\), which confirms that the upper bound can be attained.

Question

For the matrix \(A = \begin{pmatrix} 2 & 3 \\ 0 & 1\end{pmatrix}\), compute \(\|A\|_p\) for \(p=1, 2, \infty, F\).

Answer

We have \(\|A\|_1 = 4\), \(\|A\|_{\infty} = 5\), and \(\|A\|_F = \sqrt{14}\). For \(\|A\|_2\), we compute the eigenvalues of

\[\begin{split} A^\top A = \begin{pmatrix}4 & 6\\ 6 & 10\end{pmatrix}, \end{split}\]

giving us \(\lambda_{1, 2} = 7 \pm 3\sqrt{5}\). Hence, \(\|A\|_2 = \sqrt{7 + 3\sqrt{5}}\).

Question

Show that \(\|x\|_{\infty} = \lim_{p\rightarrow\infty} \|x\|_p\) for \(x\in\mathbb{R}^n\).

Answer

For \(x = 0\), the result holds because \(\| x \|_\infty = \|x\|_p = 0\) for all \(p \in [1,\infty)\). Thus, assume for the remainder that \(x \in \mathbb{R}^n \setminus \{ 0 \}\). Let \(k\) be an index of the largest element by magnitude in \(x\). We have that

\[ \lim_{p\rightarrow\infty} \|x\|_p = \lim_{p\rightarrow\infty} \Bigl[ \sum_{j}|x_j|^p\Bigr]^{1/p} = |x_k|\lim_{p\rightarrow\infty} \Bigl[\underbrace{1 + \sum_{j\neq k}|x_j/x_k|^p}_{=: d_p}\Bigr]^{1/p} = |x_k|. \]

For the last inequality, we used that the term \(d_p\) is bounded from below by \(1\) and from above by \(n\) (accounting for the possibility that \(x\) can have multiple elements of the largest magnitude) so that \(\lim_{p \to \infty} \sqrt[p]{d_p} \to 1\).

Question

Show that the Frobenius norm is submultiplicative: For matrices \(A\) and \(B\) with compatible dimensions such that the product \(AB\) exists, it holds that

\[ \|AB\|_F\leq \|A\|_F\cdot \|B\|_F. \]
Answer

First of all, since \(\|A\|_2\leq \|A\|_F\), we have that

\[ \|Ax\|_2 \leq \|A\|_2\|x\|_2 \leq \|A\|_F\|x\|_2. \]

We denote by \(B_{:, j}\) the \(j\)-th column of the matrix \(B\). We can now write

\[ \|AB\|_F^2 = \sum_{j}\|AB_{:, j}\|_2^2\leq \|A\|_F^2\sum_{j}\|B_{:, j}\|_2^2 = \|A\|_F^2\cdot \|B\|_F^2, \]

from which the result follows.