Measuring distances#
Much of computational mathematics concerns the question of how to approximate the infinite by the finite. The reason is obvious: In mathematics infinity is ubiquitous, while computers are limited to the finite.
For this reason, many problems in computational mathematics fall into one of the following categories:
Infinite-dimensional problems: Solving an ordinary differential equation involves finding a solution within an infinite-dimensional space, like the vector space \(C^1[a, b]\) of continuously differentiable functions. In other words, the problem generally has an infinite number of degrees of freedom. While mathematical theory may provide analytic solutions for special ODEs, for challenging real-world problems, it is often only possible to make general qualitative statements about the solution. The question arises whether a computer can provide quantitative information about the solution. Since a computer can only handle a finite number of degrees of freedom, any numerical approach inevitably introduces an approximation error, resulting in a discrepancy between the numerical and exact solutions.
Infinite sets: Even finite-dimensional vector spaces, like the real number line \(\mathbb{R}\), are infinite sets that require finite representation on computers. This necessitates the use of floating-point numbers (discussed later) and introduces approximation errors.
Iterative problems: The solution to many mathematical problems can be expressed as the limit of a sequence, where each sequence element can be computed in a finite number of steps. For example, consider a Riemann integral as the limit of increasingly refined Riemann sums or the root of a function as the limit of a Newton iteration. While computers can calculate sequence elements in finite time, it’s crucial to consider the error, i.e., the difference between a computed element and the true limit.
When the objects of interest, such as errors, belong to a vector space, a norm is often the most appropriate tool for quantifying their magnitude.
Definition 1 (Norm axioms)
Let \(V\) be a real vector space, \(\mathbb{R}^{+} := [0, \infty)\), and \(\rho: V \rightarrow \mathbb{R}^{+}\) a mapping from \(V\) into the non-negative numbers. \(\rho\) is called a norm if it satisfies the following properties:
\(\rho(\alpha x) = |\alpha| \rho(x)~\forall \alpha \in \mathbb{R};~x \in V\)
\(\rho(x + y) \leq \rho(x) + \rho(y)~\forall x, y \in V\) (triangle inequality)
\(\rho(x) = 0\) if and only if \(x = 0\).
These axioms imply that
and therefore \(\rho(x) \geq 0\) for all \(x \in V\). Hence, a norm is always non-negative.
The first and third norm axioms are usually straightforward to prove. The triangle inequality is typically the most challenging axiom to prove.
Frequently used vector norms#
Let \(V\) be the real vector space \(\mathbb{R}^n\). Let \(x \in V\) and denote by \(x_j\) the \(j\)-th component of \(x\).
The most frequently used vector norms are:
The \(1\)-norm: \(\|x\|_1 = \sum_{j=1}^n |x_j|\).
The \(2\)-norm: \(\|x\|_2 = \left(\sum_{j=1}^n |x_j|^2\right)^{1/2}\).
The \(\infty\)-norm: \(\|x\|_{\infty} = \max_j |x_j|\).
The first two are special cases of the \(\ell_p\)-norm, defined as
The third is the limit as \(p \to \infty\): \(\|x\|_{\infty} = \lim_{p \to \infty} \|x\|_p\) for all \(x\).
Norm equivalence#
Which norm should we use? This usually depends on the application. A New York taxi driver will prefer the \(1\)-norm when calculating distances as they need to follow a street map and cannot travel diagonally to locations.
In Numerical Analysis, we often want to prove whether a given numerical approximation converges. We consider several computations indexed by \(n\) and have a corresponding error quantity \(e_n\). We want to show that the sequence of errors \(\left\{e_n\right\}\) converges to \(0\) as \(n \to \infty\). Can it happen that an error converges to \(0\) in one norm but not in another norm? In finite-dimensional vector spaces, the answer is no. If a sequence converges in one norm, it also converges in all others. This is due to the following norm equivalence theorem.
Theorem 1 (Norm equivalence in finite-dimensional vector spaces)
Let \(V\) be a finite-dimensional vector space. Consider two norms \(\|\cdot\|_a\) and \(\|\cdot\|_b\) on \(V\). Then, there exist positive constants \(C_1\), \(C_2\) such that for all \(x\in V\)
Proof. The proof requires some analysis and is not part of the lecture. A good reference for those interested in how it works is the following note by Steve Johnson at MIT.
We note that norm equivalence is only a property of finite-dimensional vector spaces. For infinite-dimensional spaces (e.g., spaces of functions), it is easy to construct examples of sequences converging with respect to one norm but not another.
Matrix norms#
Let \(A \in \mathbb{R}^{m \times n}\). Consider vector norms \(\|x\|_a\) for \(x \in \mathbb{R}^n\) and \(\|y\|_b\) for \(y \in \mathbb{R}^m\). Then, an induced matrix norm can be defined as follows:
If the same norm is used for both the domain and co-domain, we abbreviate \(\|A\|_{a, a}\) as \(\|A\|_a\).
The following three matrix norms derive from the corresponding vector norms (where \(\lambda_{\max}\) denotes the largest eigenvalue):
\(\|A\|_1 = \max_{1 \leq j \leq n} \sum_{i=1}^m |a_{ij}| = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_1}{\|x\|_1}\),
\(\|A\|_2 = \sqrt{\lambda_{\max}(A^\top A)} = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_2}{\|x\|_2}\),
\(\|A\|_{\infty} = \max_{1 \leq i \leq m} \sum_{j=1}^n |a_{ij}| = \max_{x \in \mathbb{R}^n \setminus \{0\}} \frac{\|Ax\|_\infty}{\|x\|_\infty}\).
Observe that \(\|A\|_1 = \|A^\top\|_{\infty}\). Calculating the \(||A||_2\) matrix norm is computationally expensive, as it requires determining the largest eigenvalue of \(A^\top A\). Therefore, the Frobenius norm is often used as an alternative:
One can show that
Submultiplicativity#
Let \(A \in \mathbb{R}^{m \times n}\) and \(B \in \mathbb{R}^{n \times k}\) with a corresponding vector-induced matrix norm \(\|\cdot\|\). For any \(v \in \mathbb{R}^k\), it holds \(\| A w \| \leq \| A \| \, \| w \|\). Thus
and therefore
This property is called submultiplicativity. We have just proved it for any vector-induced matrix norm.
Relative and absolute errors#
Let \(V\) be a real vector space such as \(\mathbb{R}^n\) or \(\mathbb{R}^{m \times n}\). Let \(x \in V\) be a quantity of interest and \(\tilde{x} \in V\) be its approximation. Given a norm \(\| \cdot \|\) of \(V\), we call \(\| x - \tilde{x} \|\) the absolute error of the approximation (with respect to \(\| \cdot \|\)). If \(x \neq 0\) then \(\| x - \tilde{x} \| / \| x \|\) is the relative error.
Python skills#
Defining vectors and matrices with NumPy#
To work with vectors and matrices in Python, we can use the numpy
library. Here is how you can define them:
import numpy as np
# Define a vector
x = np.array([1, 2, 3])
# Define a matrix
A = np.array([[2, 3],
[0, 1]])
Computing vector norms#
NumPy provides built-in functions to compute various norms of vectors:
from numpy.linalg import norm
# Compute 1-norm, 2-norm, and infinity-norm of a vector
norm_1 = norm(x, ord=1) # 1-norm
norm_2 = norm(x) # 2-norm (default)
norm_inf = norm(x, ord=np.inf) # Infinity-norm
print(f"1-norm: {norm_1}, 2-norm: {norm_2}, Infinity-norm: {norm_inf}")
Computing matrix norms#
Similarly, matrix norms can also be computed:
# Compute Frobenius norm of a matrix
norm_F = norm(A, ord='fro') # Frobenius norm
# Compute 1-norm, 2-norm, and infinity-norm of a matrix
norm_1 = norm(A, ord=1) # 1-norm
norm_2 = norm(A, ord=2) # 2-norm
norm_inf = norm(A, ord=np.inf) # Infinity-norm
print(f"Frobenius norm: {norm_F}, 1-norm: {norm_1}, 2-norm: {norm_2}, Infinity-norm: {norm_inf}")
Self-check questions#
Question
Let \(A\in\mathbb{R}^{m\times n}\). Prove that
\(\|A\|_1 = \max_j \sum_i|a_{ij}|\).
\(\|A\|_{\infty} = \max_i \sum_j|a_{ij}|\).
Answer
From the definition of a vector-induced matrix norm, we have that
It remains to show that this upper bound is attainable. Let \(\ell = \text{argmax}_j \sum_i |a_{ij}|\) be an index \(\ell\) associated with the largest column sum. We then set \(y_j = 0\) for \(j\neq \ell\) and \(y_\ell = 1\). It follows that \(\| y \|_1 = 1\) and
For \(\|A\|_{\infty}\) we obtain
Let \(k\) be the row index for which the upper bound is attained. By choosing \(x_j = \text{sign}~a_{kj}\) we have that \(\|Ax\|_{\infty} = \max_i \sum_{j}|a_{ij}|\), which confirms that the upper bound can be attained.
Question
For the matrix \(A = \begin{pmatrix} 2 & 3 \\ 0 & 1\end{pmatrix}\), compute \(\|A\|_p\) for \(p=1, 2, \infty, F\).
Answer
We have \(\|A\|_1 = 4\), \(\|A\|_{\infty} = 5\), and \(\|A\|_F = \sqrt{14}\). For \(\|A\|_2\), we compute the eigenvalues of
giving us \(\lambda_{1, 2} = 7 \pm 3\sqrt{5}\). Hence, \(\|A\|_2 = \sqrt{7 + 3\sqrt{5}}\).
Question
Show that \(\|x\|_{\infty} = \lim_{p\rightarrow\infty} \|x\|_p\) for \(x\in\mathbb{R}^n\).
Answer
For \(x = 0\), the result holds because \(\| x \|_\infty = \|x\|_p = 0\) for all \(p \in [1,\infty)\). Thus, assume for the remainder that \(x \in \mathbb{R}^n \setminus \{ 0 \}\). Let \(k\) be an index of the largest element by magnitude in \(x\). We have that
For the last inequality, we used that the term \(d_p\) is bounded from below by \(1\) and from above by \(n\) (accounting for the possibility that \(x\) can have multiple elements of the largest magnitude) so that \(\lim_{p \to \infty} \sqrt[p]{d_p} \to 1\).
Question
Show that the Frobenius norm is submultiplicative: For matrices \(A\) and \(B\) with compatible dimensions such that the product \(AB\) exists, it holds that
Answer
First of all, since \(\|A\|_2\leq \|A\|_F\), we have that
We denote by \(B_{:, j}\) the \(j\)-th column of the matrix \(B\). We can now write
from which the result follows.