During the first lockdown, with more free time on my hands, I began to think a little more philosophically about the foundations of mathematics. In particular how can we know for sure that maths is correct? I discovered that at the start of the 20th century this was a question that an enormous amount of time and effort was devoted to trying to answer. Of the solutions proposed, the work of Zermelo and Fraenkel alongside the axiom of choice, commonly abbreviated to ZFC, remains one of the most complete answers to date. In this post I begin by looking at why ZFC is necessary, what it is, how it might be used to ground many of the important foundations of mathematics today.
The foundations of mathematics lie in a set of statements that we are happy to accept as true. Let’s start with a fundamental observation:
“the world is full of different things and that different things can be grouped together”
In mathematics, this observations is encoded in our idea of sets . In defining the notion of a mathematical set lets try and make as few assumptions as possible:
These are axioms: a set of assumptions that are assumed true from which everything else follows. In this case:
Different things can belong to more than one group
We can comprehend an apple both as satisfying the condition “comes from a plant” and “is a food” (as well as “being a thing”) therefore both groups must exits and and it follows that each thing can be long to more than one group.
All things that are not A must be something else
We could quite legitimately think about the set of all things that are not in $A$. This is a perfectly valid statement. And implies that everything else that is not $A$ must be its own set. We refer to this set $A’$ as the complement of $A$.
A set can contain no things
Lets think about defining a set $C$ which contains all of the elements of $A$ which are also in the set $B$. In the case when $B$ does not contain any things found in $A$ then $C$ must now contain no things. In this case $C$ is an empty set.
All things are things
This implies that there must be a group containing all things which is referred to as the universal set.
Naive set theory on the surface appears to provide a powerful framework for reasoning logically about sets relying on two very intuitive axioms that most people would be happy to accept as true.
Unfortunately, for the formulation above several contradictions arise. One of the most famous was discover by Bertrand Russel and arises if we consider “the set of all sets that are not members of themselves”. This is a perfectly legitimate definition satisfying both axioms (1) and (2). But here is the problem…
This results in a paradox; a statement that appears to be both true and false at the same time. That is, any statement that is written based on the set of base assumptions must be either true or false.
So why is this a problem? In logic the principal of explosion says that any statement can be proven as true by proving that it is not false and forms one of the foundations for logical reasoning. Equivalently it can be stated as, “If it is not false then it must be true” or even more simply “statements can either be true or false”. Paradoxes are worrying because they imply a statement which can be seen as neither true nor false. In order to be compatible with the principle of explosion a system needs to be consistent.
The existence of paradoxes like the one above calls into question the very meaning of true and false for the systems of sets we have defined above. And everything else we derive from it. As much of the foundation of maths rests on the existence of sets this is very worrying indeed! This sparked a crisis in mathematics at the turn of the 20th century and led to the Hilbert programme; a challenge to the mathematical world to sure up the foundation of modern mathematics.
Zermelo-Franco set theory along with the axiom of choice (ZFC) is one of the most complete approaches proposed to meet this challenge. The central observation here is that paradoxes like Russel’s arise from too general a definition of sets. In shoring up the foundations of modern mathematics, we must forgo the notion of the universal set, instead insisting that only sets satisfying particular properties are allowed to exist.
For this to work, we now need to define the set of rules that dictate which sets are okay, ideally using as few and as un-restrictive rules as possible, so as to maintain a sufficiently rich theory to describe everything in modern mathematics without contradiction or paradox.
In ZFC the following axioms are proposed to this end:
Note, the axiom of replacement is similar to another axiom often included in ZF set theory referred to as the the Axiom Schema of Comprehension. The wikipedia articles gives a good discussion about the differences between the two. It turns out that the axiom schema of comprehension can be derived from the axiom schema of replacement alongside the existence of the empty set and it is often emitted from modern statements of ZF set theory as a result.
The axiom of choice was somewhat controversial and took a while to be accepted by the mathematics community. It is perhaps better understood through an analogy coined by Betrand Russel:
“for any (even infinite) collection of pairs of shoes, one can pick out the left shoe from each pair to obtain an appropriate selection; this makes it possible to directly define a choice function. For an infinite collection of pairs of socks (assumed to have no distinguishing features), there is no obvious way to make a function that selects one sock from each pair. At least, not without the axiom of choice, which always allows you to say ‘choose one sock from each pair’”
When combined with ZF the theorem is referred to as ZFC and remains the most widely accepted, complete, and well studied theorem to date.
So far these axioms are pretty abstract. To see how and why they might be useful in this section several important mathematical objects will be constructed using only the axioms proposed above. We will start with the definition of the ordered pair. This will allow us to define the natural numbers, the integers, rationals and finally the reals. These sets and their properties represent key foundation elements of mathematics which underpins much of modern mathematics. Throughout, the approach to construction is similar. We begin with a set of simpler entities and use these to generate the next level of complexity.
Perhaps one of the most basic ideas we rely on in mathematics is the idea of ordering; that one thing comes before another. Using the axioms above we are able to formally define this notion through the definition of an ordered pair written as,
\[(x, y) = \{x ,\{x, y \} \}\]To see which element of the set comes first we check to see if each element in the set contains the other.
What is so special about natural numbers $0, 1, 2, 3, ….$? Perhaps the most glaring obvious property of natural numbers is that they are ordered. That is we can say $0$ is less than $1$ is less than $2$ and so on! We write this (and similar statements) more succinctly by defining the comparison operators greater than $>$ and less than $<$. This allows us to write $0 < 1 <2$.
The axioms defined above provide us with an elegant way of defining the natural numbers, alongside their order. We represent each number as a particular set. We start by considering $0$ to be the empty set $0 = \varnothing$. After this we make each number the set of all the numbers that came before it. This gives:
\[0 = \varnothing\] \[1 = \{0\} = \{\varnothing \}\] \[2 = \{0, 1\} = \{\varnothing,\{\varnothing\} \}\] \[3 = \{0, 1, 2\} = \{\varnothing,\{\varnothing\} , \{\varnothing,\{\varnothing\} \}\] \[\vdots\]As a result of the infinity axiom we can also group these numbers together into a single set, the set of natural numbers $\mathbb{N} = \{0, 1, 2, … \}$.
So why is this definition useful? Just as for the ordered pair, it allows us to concretely define what we mean by ordering. We can check whether any number is bigger than another by seeing if that number is a member. For example if we wanted to check whether a number $3$ is less than $100$ we simply check whether $3$ is in $100 = \{1, 2, 3, …, 99\}$.
Through the inherent ordering of the natural numbers we can see that every number follows another. The successor $s(x)$ of each number $x$ is the set $x \cup \{x \}$. For example the successor of $1 = \{0\} = \{\varnothing \}$ is given as
\[s(1) = 1 \cup \{1 \} = \{\varnothing \} \cup \{\{\varnothing \}\} = \{\varnothing,\{\varnothing\} \} = 2\]Using the idea of a successor we have just defined we are able to define the basic arithmetic operations:
To see that these rules do in deed give us the answers that we would expect lets consider adding $4$ to $3$. We do this by following rule (2) recursively until we end up at a rule (1) so that,
\[\begin{aligned} 4+3 &=4+s(2) \\ &=s(4+2) \\ &=s(4+s(1)) \\ &=s(s(4+1)) \\ &=s(s(4+s(0))) \\ &=s(s(s(4+0))) \\ &=s(s(s(4))) \\ &=s(s(5)) \\ &=s(6) \\ &=7 \end{aligned}\]which we can see gives us the right answer. A key property of addition is that the order with which we add two numbers together does not matter. To check that this is the case consider adding $3$ to $4$ using the same rules. This will still give us $7$! For additional examples for multiplication and subtraction see here.
Unfortunately, $a - b$ is only defined when $a \geq b$ for the natural numbers (using our definition as greater than as $a \notin b$). We will revisit this shortly!
The set $\omega = \{0, 1, 2, … \}$ is referred to as the first infinite ordinal. Intuitively the arithmetic operations do not stop once we reach $\omega$. We can still think of doing something like $\omega + 1$. The result is still infinite but we would expect it to be bigger than $\omega$. Likewise we could think about $\omega \times \omega$ and $\omega^\omega$. We are also able to write down similar relations for each of these terms. These result can be thought of as generalising the idea of performing operations on ordered sets to infinite sized sets. Collectively they are referred to as ordinal arithmetic. However, these operations do not always work as we might expect. For example $\omega + 1 \neq 1 + \omega$. A story for another time!
The set of integers $\mathbb{I}=\{…, -\mathsf{2}, -\mathsf{1 }, \mathsf{0}, \mathsf{1}, \mathsf{2}, …\}$ allows us to write $ \mathsf{3} - \mathsf{7} = -\mathsf{4}$. Commonly we think of the integers $\mathbb{I}$ as the natural numbers and their negatives, so that the integers contain the natural numbers. However, in defining the integers we are better off thinking about them as an entirely new set of things. To make this perfectly clear $3$ will represent the natural number $3$ whilst $\mathsf{3}$ will represent an integer (in sans serif font). After this we can go back to thinking of the positive integers (and zeros) as the natural numbers.
As for the natural numbers, we now need to define the integers in terms of sets. Armed with the idea of an ordered pair, each integer is represented as a pair of natural numbers $(a, b)$ who’s difference $a - b$ can be thought of as representing the value of the integer. For example we write $\mathsf{4} = (7, 3)$ and $-\mathsf{4} = (3, 7)$.
As \(4 = 8 - 4 = 6 - 2 = 10 - 6 = ....\) in defining integers in this way, each integer is now equivalently represented by multiple sets. For example $4 = (8, 4)$, $4 = (6, 2)$ and $4 = (10, 6)$. Whilst this means the sets $(8, 4)$ and $(6, 2)$ are equivalent in the sense of integers they are not in the sense of sets: $(8, 4)$ and $(6, 2)$ have different elements. In order to define sets as being equivalent we need to:
So far our choice to represent integers in this way seems fairly arbitrary. So why bother? It allows us to define integer arithmetic in terms of only natural number arithmetic. Given integers $\mathsf{x} = (a, b)$ and $\mathsf{y} = (c, d)$ we can define the integer arithmetic operations as:
Ordering $\mathsf{x} > \mathsf{y}$
\[(a, b) > (c, d) \text{ if } a + d > c + b\]Addition $\mathsf{x} + \mathsf{y}$
\[(a, b) + (b, c) = (a + c, b + d)\]Subtraction $\mathsf{x} - \mathsf{y}$
\[(a, b) - (c, d) = (a, b) + (d, c) = (a + d, b +c)\]Multiplication $\mathsf{x} \times \mathsf{y}$
\[(a, b) \times (c, d) = \big((a \times c) + (b \times d), (a \times d) + (b \times c)\big)\]For example:
For more see here
In the same way integers allowed us to answer questions like “What must we add to 7 to give us 4?” the rational numbers allow us to ask “How many lots of 7 do we need to make 4?”. They allow us to define an inverse for multiplications for all the integers. We do this by considering ordered pairs, this time of integers instead of natural numbers. We can think of $(\mathsf{7}, \mathsf{3})$ as representing the fraction $\frac{7}{3}$. As \(\frac{7}{3} = \frac{70}{30} = \frac{21}{9}=...\), we say that two rational numbers are equivalent $(a, b) \sim (c, d)$ if $a \times d = c \times b$. Using this representation it can be shown that the arithmetic operations on natural numbers can all be defined:
Ordering $\mathsf{x} > \mathsf{y}$
\[(a, b) > (c, d) \text{ if } a \times d > c \times b\]Addition $\mathsf{x} + \mathsf{y}$
\[(a, b) + (b, c) = (a \times d + c \times b, b \times d)\]Subtraction $\mathsf{x} - \mathsf{y}$
\[(a, b) - (c, d) = (a \times d - c \times b, b \times d)\]Multiplication $\mathsf{x} \times \mathsf{y}$
\[(a, b) \times (c, d) = (a \times c, b \times d)\]Division $\mathsf{x} / \mathsf{y}$
\[(a, b) / (c, d) = (a \times d, c \times b)\]Numbers are not very useful unless we can do something with them. In defining addition and subtraction for the natural numbers we defined a set of rules which when followed allowed us to take two numbers as input and generate a third. This is an example of what we refer to as a function.
By using the idea of ordered pairs we are able to define what a function is in terms of only sets. A function which takes in two inputs is thought of as taking in an ordered pair. This allows us to define which input is first and which input is second. Of course for addition we don’t care about the order but for other functions this might not be the case.
We can think of addition as taking the ordered pair $(3, 4)$ as input and producing $7$ from it. It also produces $2$ from $(1, 1)$ and $101$ from $(99, 2)$ as well as from $(2, 99)$. We can represent each of these cases also as an ordered pair where the first element is always the input and the second element is always the output for example, $((3, 4), 7)$ represents the case $3 + 4 = 7$. In this sense the function addition is thought of as the set of all ordered pairs of its inputs and outputs. For addition we have,
\[f_{+} = \{\big((0, 0), 0\big), \big((0, 1), 1\big), \big((0, 2), 2\big), ..., \big((1, 0), 1\big), \big((1, 1), 2\big), \big((1, 2), 3\big), ...\}\]Any two functions which produce the exact same outputs from all of the exact same inputs are exactly the same function, or two functions will be the same if $f_1 = f_2$.
As a result of the axiom schema of replacement some particularly useful sets can be constructed:
Not all numbers can be expressed as fractions. To see that this is the case lets consider representing fractions as decimals. We can think of the decimal numbers as a short-hand way of writing
\[0.078125 = 0 \frac{1}{1} + 0 \frac{1}{10} + 7 \frac{1}{100} + 8 \frac{1}{1000} + 1 \frac{1}{10000} + 2 \frac{1}{100000} + 5 \frac{1}{1000000}\]Two cases arise for fractions expressed in this way. Either we can express a sequence in a finite number of terms (as above) or their will be finitely many terms followed by a term that will repeat itself in a regular pattern for example $1 / 3 = 0.333…$ or $1 / 99 = 0.01010…$. But what happens if we consider adding digits to a decimal indefinitely and never forming a repeated pattern. For example,
\[0.078125136729420100357399387401377382...\]Each time we add a decimal point we can think of adding an ever increasingly small part to the number $0.078125$. Expressing this slightly differently we can think of the number $0.07812513…$ as corresponding to a sequence $0, 0.0, 0.07, 0.0781, 0.07812, …$ where the difference between elements becomes arbitrarily small as the sequence progresses. Sequences of this kind are referred to as Cauchy Sequences and in the limit can be thought of as converging to a single fixed value. In this way we can think of each real number as a unique cauchy sequence!
In order to represent real numbers in this way we need to construct a sequence from sets. We can think of a sequence of a set of pairs between the natural numbers $(0, 1, 2, ..)$ and the rational numbers $\mathbb{Q}$. For example the sequence $0, 0.0, 0.07, 0.07, 0.0781, …$ can be thought of as a set \(\{(0, 0), (1, 0.0), (2, 0.07), (3, 0.078), (4, 0.0781), ... \}\). Sets of this kind can be generated from the axioms we have defined so far as follows:
Steps 2 and 3 can be thought of as a function mapping the set $\mathbb{N} \times \mathbb{Q}$ onto the set of real numbers. As we know $\mathbb{N} \times \mathbb{Q}$ already exists and the steps 2 and 3 can be carried out by using natural number and integer arithmetic to check which pairs are valid by the axiom of replacement the real numbers are therefore defined!
Once again we can define arithmetic operations on the real numbers by defining functions which take in one Cauchy sequence and produce another. It can be shown that the sum, product, difference and cauchy sequences all result in cauchy sequences. In each case the operators are defined by summing over corresponding elements of the sequence. For example if $a = (1, 1.01, 1.0105, 1.01052, …)$ and $b = (2, 2.7, 2.75, 2.758…)$ then $a + b = (3, 3.71, 3.7605, 3.76852)$.
We have seen that ZFC set theory allows us to build increasingly complex mathematical ideas: starting with the idea of sets we were able to define the natural numbers, the integers, the rationals and the reals. Continuing, we might also derive imaginary numbers, continuity, vectors and in fact nearly all of modern day mathematics. All while avoiding paradoxes like Russels, putting the foundation of mathematics on a much surer footing. Turn of the century mathematicians began to sigh a sigh of relief…
Unfortunately, this relief was short lived. In 1931 Goedel published his famous “incompleteness theorem” which ensures that for any formal axiomatic system describing the arithmetic of the natural numbers there will be statements which are impossible to prove nor refute. This means that for ZFC there will be statements that are neither provable nor disprovable. An approach to overcoming this limitation is to add in whatever axioms are needed to make the unprovable statement provable. But this just kicks the can down the road; eventually, much like a game of wack a mole, another unprovable statement will appear.
The first unprovable statement discovered for ZFC was the continuum hypothesis, and several others have been discovered since (eg. incompleteness from finite trees and incompleteness from functions). The perhaps one upside is that the unprovable statements that have been discovered so far all seem to fall a long way from what we might consider as useful maths. The maths that we rely on to ensure the world continues to function as expected. For example the continum hypothesis is concerned with whether the infinity of the real numbers is greater than that of the natural numbers. Whilst it would be reassuring to have an answer to such a question, not being able to do so does not seem to have limited our understanding of the real world. And if we get to a point where we think it has become a limitation, we can always add in another axiom to see where that gets us.
In some ways Goedel’s hypothesis does not change much. All the results which were previously provable by the set of axioms we were previously willing to accept as true still remain valid; this still hinges on the axioms themselves being correct in the first place. Regardless ZFC is a truly remarkable achievement of human perseverance and ingenuity.