Machine Logic is Lawrence Paulson's blog on Isabelle/HOL and related topics.
https://lawrencecpaulson.github.io/
Two Small Examples by Fields Medallists<p>A couple of weeks ago, Tim Gowers posted on Twitter an unusual characterisation of bijective functions: that they preserve set complements.
Alex Kontorovich re-tweeted that post accompanied by a Lean proof detailing Gowers’ argument.
I took a look, and lo and behold! Isabelle can prove it with a single sledgehammer call.
(That one line proof isn’t necessarily the best proof, however.
Remember, we want proofs that are easy to read and maintain.)
And Terrence Tao published a small example on Mastodon; let’s look at that one too.</p>
<h3 id="gowers-original-tweet">Gowers’ original tweet</h3>
<p>Here is the original tweet, a thread in classical Twitter style:</p>
<blockquote>
<p>I’ve just noticed that a function f:X->Y is a bijection if and only if it preserves complements, that is, if and only if f(X\A)=Y\f(A) for every subset A of X. Is this a standard fact that has somehow passed me by for four decades? Simple proof in rest of (short) thread. 1/3</p>
</blockquote>
<blockquote>
<p>If f is a bijection and B=X\A, then f preserves unions and intersections and f(X)=Y, so f(A) and f(B) are disjoint and have union equal to Y. Conversely, if f preserves complements, then setting A = emptyset, we see that f(X)=Y, so f is a surjection. 2/3</p>
</blockquote>
<blockquote>
<p>And for every x we also have that f(X{x})=Y{f(x)}. Therefore, if x and y are distinct, then so are f(x) and f(y). So f is an injection as well. 3/3</p>
</blockquote>
<p>In standard mathematical notation, the claim is that if a function $f:X\to Y$ is given,
then $f$ is a bijection from $X$ to $Y$ if and only if it preserves complements, i.e.
if $f[X\setminus A] = Y \setminus f[A]$ for all $A\subseteq X$.
Incidentally, there are various ways of writing the image under a function of a set;
Here I use square brackets, while Lean and Isabelle provide their own image operators.</p>
<h3 id="the-lean-formalisation">The Lean formalisation</h3>
<p>Kontorovich posted his version as an image:</p>
<p><img src="/images/Gowers-example.jpeg" alt="Formalisation of the bijection proof in Lean by Alex Kontorovich" /></p>
<p>Note that he has written out the argument in detail,
with plenty of comments to explain what is going on.</p>
<h3 id="investigating-the-problem-in-isabelle">Investigating the problem in Isabelle</h3>
<p>This problem looked intriguing, so I tried it with Isabelle.
The brute force way to tackle such a proof is</p>
<ol>
<li>try the <code class="language-plaintext highlighter-rouge">auto</code> proof method to solve or at least break up the problem</li>
<li>invoke <a href="https://isabelle.in.tum.de/dist/doc/sledgehammer.pdf">sledgehammer</a>
on the subgoals that are produced.</li>
</ol>
<p>The proof you get this way is likely to be horrible.
However, once you have your first proof, it’s easy to get a nicer proof.
(And you should take the trouble.)
For the current problem, if you type <code class="language-plaintext highlighter-rouge">auto</code>, you get four ugly subgoals, each of which sledgehammer proves automatically.
I don’t want to show this, but you can try it for yourself.
The Isabelle theory file is <a href="/Isabelle-Examples/Gowers_Bijection.thy">here</a>.</p>
<p>What I actually tried, first, was to split the logical equivalence into its two directions.
I was pleased to see that sledgehammer could prove both.
Then I thought, let’s see if it can prove the whole claim at once, and indeed it could!</p>
<h3 id="the-isabelle-proofs">The Isabelle proofs</h3>
<p>To begin, a technicality about notation.
In Isabelle, <em>set difference</em> is written with a minus sign, $A-B$,
because the standard backslash character is reserved for other purposes.
The usual set difference symbol can be selected from the Symbols palette
or typed as <code class="language-plaintext highlighter-rouge">\setminus</code> (autocomplete will help).
So let’s begin by setting that up, allowing us to use the conventional symbol.
It will be accepted for input and used to display results.</p>
<pre class="source">
<span class="keyword1 command">abbreviation</span> <span class="entity">set_difference</span> <span class="main">::</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">[</span><span class="tfree">'a</span> set</span><span class="main">,</span><span class="tfree">'a</span> set</span><span class="main">]</span> <span class="main">⇒</span> <span class="tfree">'a</span> set<span>"</span> <span class="main">(</span><span class="keyword2 keyword">infixl</span> <span class="quoted"><span>"</span><span class="keyword1">∖</span><span>"</span></span> 65<span class="main">)</span><span>
</span><span class="keyword2 keyword">where</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free bound entity">A</span> <span class="main free">∖</span> <span class="free bound entity">B</span> <span class="main">≡</span> <span class="free bound entity">A</span><span class="main">-</span></span><span class="free bound entity">B</span><span>"</span></span>
</pre>
<p>The following is the nicest of the one-shot proofs found by sledgehammer.
This problem turned out to be relatively easy; three of the constituent provers
solved it.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span>bij_betw</span> <span class="free">f</span> <span class="free">X</span> <span class="free">Y</span> <span class="main">⟷</span></span> <span class="main">(</span><span class="main">∀</span><span class="bound">A</span><span class="main">.</span> <span class="bound">A</span><span class="main">⊆</span><span class="free">X</span> <span class="main">⟶</span> <span class="free">f</span> <span class="main">`</span> <span class="main">(</span><span class="free">X</span><span class="main">∖</span><span class="bound">A</span><span class="main">)</span> <span class="main">=</span> <span class="free">Y</span> <span class="main">∖</span> <span class="free">f</span><span class="main">`</span><span class="bound">A</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">metis</span> Diff_empty Diff_eq_empty_iff Diff_subset bij_betw_def image_is_empty<span>
</span>inj_on_image_set_diff subset_antisym subset_image_inj<span class="main">)</span>
</pre>
<p>I don’t actually recommend that you allow proofs of this sort
to accumulate in your development.
It leaves us completely in the dark as to why the claim holds.
Moreover, if you want your development to be maintainable,
it needs to be resilient in the presence of change.
I’m always having to make corrections and adjustments (because I’m always making mistakes),
And while rerunning the proofs can be an anxious moment, usually they all work fine.
At worst, they can be fixed by another sledgehammer call.
Opaque proofs like the one above will be hard to fix when they break.</p>
<p>The simplest way to get a clearer proof for this particular problem
is by separately treating the left-to-right and right-to-left directions.
This is also an opportunity to see the <code class="language-plaintext highlighter-rouge">is</code> mechanism for matching a pattern to a formula.
An arbitrary pattern is permitted, and here we set up <code class="language-plaintext highlighter-rouge">?L</code> and <code class="language-plaintext highlighter-rouge">?R</code>
to denote the left and right hand sides.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span>bij_betw</span> <span class="free">f</span> <span class="free">X</span> <span class="free">Y</span> <span class="main">⟷</span></span> <span class="main">(</span><span class="main">∀</span><span class="bound">A</span><span class="main">.</span> <span class="bound">A</span><span class="main">⊆</span><span class="free">X</span> <span class="main">⟶</span> <span class="free">f</span> <span class="main">`</span> <span class="main">(</span><span class="free">X</span><span class="main">∖</span><span class="bound">A</span><span class="main">)</span> <span class="main">=</span> <span class="free">Y</span> <span class="main">∖</span> <span class="free">f</span><span class="main">`</span><span class="bound">A</span><span class="main">)</span><span>"</span> <span class="main">(</span><span class="keyword2 keyword">is</span> <span class="quoted"><span class="quoted"><span>"</span><span class="var">?L</span><span class="main">=</span></span><span class="var">?R</span><span>"</span></span><span class="main">)</span><span>
</span><span class="keyword1 command">proof</span><span>
</span><span class="keyword3 command">show</span> <span class="quoted quoted"><span>"</span><span class="var">?L</span> <span class="main">⟹</span> <span class="var">?R</span><span>"</span></span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">metis</span> Diff_subset bij_betw_def inj_on_image_set_diff<span class="main">)</span><span>
</span><span class="keyword3 command">assume</span> <span class="var quoted var">?R</span><span>
</span><span class="keyword1 command">then</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>inj_on</span> <span class="free">f</span> <span class="free">X</span><span>"</span></span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">f</span> <span class="main">`</span></span> <span class="free">X</span> <span class="main">=</span></span> <span class="free">Y</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> inj_on_def<span class="main">)</span><span>
</span><span class="keyword1 command">then</span> <span class="keyword3 command">show</span> <span class="var quoted var">?L</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> bij_betw_def<span class="main">)</span><span>
</span><span class="keyword1 command">qed</span>
</pre>
<p>This proof is much clearer. The left to write proof requires only three previous facts.
The right to left proof is practically automatic.
You might argue that even here, the actual reasoning is still opaque.
However, this proof tells us that the right to left direction
is essentially a calculation from the definitions,
while the opposite direction is the consequence of three facts (rather than eight, as before).
This sort of proof will be much easier to maintain.</p>
<p>A further Isabelle bonus: note that both the Lean proof and Gowers’ informal argument
begin by assuming $f:X\to Y$.
The Isabelle version states unconditionally that
$f$ is a bijection from $X$ to $Y$ if and only if it preserves complements.
The implicit typing of $f$ ensures only that it is a function:
over arbitrary types that we don’t even mention.</p>
<h3 id="taos-example">Tao’s example</h3>
<p>Unfortunately, I wasn’t able to locate Tao’s original post.
But he stated a nice little problem and gave a formalisation using Lean, and again I couldn’t help trying it out in Isabelle. I liked my proof more.</p>
<p>We are given a decreasing real-valued sequence $\{a_k\}$
and a family of non-negative reals $\{D_k\}$
such that $a_k\le D_k - D_{k+1}$ for all $k$.
The task is to prove $a_k \le \frac{D_0}{k+1}$.</p>
<pre class="source">
<span class="keyword1 command">lemma</span><span>
</span><span class="keyword2 keyword">fixes</span> <span class="free">a</span> <span class="main">::</span> <span class="quoted"><span class="quoted"><span>"</span>nat</span> <span class="main">⇒</span> real</span><span>"</span><span>
</span><span class="keyword2 keyword">assumes</span> a<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>decseq</span> <span class="free">a</span><span>"</span></span> <span class="keyword2 keyword">and</span> D<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">⋀</span><span class="bound">k</span><span class="main">.</span> <span class="free">D</span> <span class="bound">k</span> <span class="main">≥</span></span> <span class="main">0</span></span><span>"</span> <span class="keyword2 keyword">and</span> aD<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">⋀</span><span class="bound">k</span><span class="main">.</span> <span class="free">a</span> <span class="bound">k</span> <span class="main">≤</span></span> <span class="free">D</span> <span class="bound">k</span> <span class="main">-</span></span> <span class="free">D</span><span class="main">(</span>Suc <span class="bound">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">a</span> <span class="free">k</span> <span class="main">≤</span></span> <span class="free">D</span> <span class="main">0</span></span> <span class="main">/</span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">proof</span> <span class="operator">-</span><span>
</span><span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">a</span> <span class="free">k</span> <span class="main">=</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">i</span><span class="main">≤</span><span class="free">k</span><span class="main">.</span> <span class="free">a</span> <span class="free">k</span><span class="main">)</span> <span class="main">/</span></span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="operator">simp</span><span>
</span><span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">≤</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">i</span><span class="main">≤</span><span class="free">k</span><span class="main">.</span> <span class="free">a</span> <span class="bound">i</span><span class="main">)</span> <span class="main">/</span></span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">using</span> a sum_mono<span class="main">[</span><span class="operator">of</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">{..</span></span><span class="free">k</span><span class="main">}</span></span><span>"</span> <span class="quoted quoted"><span>"</span><span class="main">λ</span><span class="bound">i</span><span class="main">.</span> <span class="free">a</span> <span class="free">k</span><span>"</span></span> <span class="quoted free">a</span><span class="main">]</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> monotone_def <span class="dynamic dynamic">divide_simps</span> mult.commute<span class="main">)</span><span>
</span><span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">≤</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">i</span><span class="main">≤</span><span class="free">k</span><span class="main">.</span> <span class="free">D</span> <span class="bound">i</span> <span class="main">-</span></span> <span class="free">D</span><span class="main">(</span>Suc <span class="bound">i</span><span class="main">)</span><span class="main">)</span> <span class="main">/</span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> aD divide_right_mono sum_mono<span class="main">)</span><span>
</span><span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">≤</span></span> <span class="free">D</span> <span class="main">0</span></span> <span class="main">/</span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> sum_telescope D divide_right_mono<span class="main">)</span><span>
</span><span class="keyword1 command">finally</span> <span class="keyword3 command">show</span> <span class="var quoted var">?thesis</span> <span class="keyword1 command">.</span><span>
</span><span class="keyword1 command">qed</span>
</pre>
<p>Isabelle’s calculational style is perfect for this sort of inequality chain.</p>
<h3 id="final-remarks">Final remarks</h3>
<p><strong>Always</strong> break up your problem
into its constituents – probably by calling <code class="language-plaintext highlighter-rouge">auto</code> – before calling sledgehammer.
The effort needed to prove all the separate parts
is generally much less than that needed prove the whole in one go.
Besides which, part of your problem may simply be too difficult for sledgehammer.
Better to isolate that part to work on later, while disposing of the easier bits.</p>
<p>The Isabelle theory file is available to <a href="/Isabelle-Examples/Gowers_Bijection.thy">download</a>.</p>
Wed, 28 Feb 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/02/28/Gowers_bijection_example.html
https://lawrencecpaulson.github.io//2024/02/28/Gowers_bijection_example.htmlContradictions and the Principle of Explosion<p>That logic should be <a href="https://plato.stanford.edu/entries/contradiction/#">free from contradiction</a> is probably its most fundamental principle,
dating back to Aristotle.
As described <a href="/2024/01/31/Russells_Paradox.html">last time</a>,
the emergence of a contradiction in set theory – in the form of Russell’s paradox – was catastrophic. Few question the claim that no statement can be both true and false
at the same time.
But the law of contradiction is widely associated with something else,
the <a href="https://plato.stanford.edu/entries/logic-paraconsistent/#BrieHistExContQuod"><em>principle of explosion</em></a>:
<em>ex contradictione quodlibet</em>, a contradiction implies everything.
This principle has been disputed. One can formulate predicate logic without it:
<em>minimal logic</em>.<br />
And allegedly a student challenged Bertrand Russell
by saying “suppose 1=0; prove that you are the Pope”.
Russell is said to have replied that if 1=0 then 2=1 and therefore
the 2-element set consisting of himself and the Pope actually contains only one element.
It’s an amusing tale, but is the argument rigorous?</p>
<h3 id="origins">Origins</h3>
<p>A 12th century Parisian logician named
<a href="https://en.wikipedia.org/wiki/William_of_Soissons">William of Soissons</a>
is said to have been the first to derive the principle of explosion.
There is a simple logical proof of an arbitrary conclusion $Q$
from the two assumptions $P$ and $\neg P$.
For if we know $P$ then surely $P\lor Q$ follows by the meaning of logical OR.
So either $P$ or $Q$ holds, but the former is impossible by $\neg P$.
Hence, we have derived $Q$.</p>
<p>Unfortunately, this argument cannot be carried out in a typical natural deduction calculus.
The proof turns out to rely on the principle of explosion itself,
which is built into most formalisms: the reasoning would be circular.
I think the informal version of the proof is pretty convincing,
but we can look for other evidence.
(And yes, <strong>evidence</strong> is what we should be looking for when trying to justify a principle
too fundamental to be proved.)
In many specific contexts, a contradicting fact leads to an explosion by calculation.</p>
<h3 id="the-explosion-in-arithmetic">The explosion in arithmetic</h3>
<p>As we saw in the argument attributed to Russell, 1=0 in an arithmetic setting
allows other identities to be derived by adding or multiplying the two sides by a constant.
It’s trivial to obtain $m=n$ for all pairs of numbers.
Conversely, the assumption $m=n$ can be transformed by subtraction and division into 1=0.
On the other hand, it is possible postulate something like 5=0
if the other axioms are weak enough, and then you have simply supplied the axioms for
a version of modular arithmetic.</p>
<h3 id="the-explosion-in-the-λ-calculus">The explosion in the λ-calculus</h3>
<p>The λ-calculus is an extremely simple formalism in which a great many
computational notions can be encoded.
Familiar data types such as the Booleans, natural numbers, integers, lists and trees
can be represented, as well as algorithms operating on them.
We can even have infinite lists and trees operated on by “lazy” algorithms.
The standard representations of true and false are
$\lambda x y.x$ and $\lambda x y.y$, respectively.
So what happens if we are given that true equals false? Then
$M = (\lambda x y.x)MN = (\lambda x y.y)MN = N$. Therefore we can show $M=N$
for any two given λ-terms, $M$ and $N$.
The same sort of thing happens given 1=0 and the standard representation of natural numbers,
though the details are complicated.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
<h3 id="the-explosion-in-axiomatic-set-theory">The explosion in axiomatic set theory</h3>
<p>Here things get a little more technical. And with all due respect to Bertrand Russell,<br />
he is not a set, and neither is the Pope.
In set theory, 0 is the empty set and 1 is $\{0\}$, which implies $0\in 1$.
So 1=0 then we have big problems: $0\in 1$ is both true and false
(because nothing can belong to the empty set).
And so, for any given set $A$, the set $\{x\in A\mid 0\in 1\}$ equals $A$
if we take $0\in 1$ to be true, but otherwise the resulting set is empty.
It follows that $A$ equals the empty set for all $A$, so all sets are equal.</p>
<h3 id="deriving-the-explosion-in-natural-deduction-logic">Deriving the explosion in natural deduction logic</h3>
<p>The rule of disjunction elimination in natural deduction allows us to derive
an arbitrary conclusion $R$ from the following three promises:</p>
<ul>
<li>$P\lor Q$</li>
<li>a proof of $R$ that may assume $P$</li>
<li>a proof of $R$ that may assume $Q$</li>
</ul>
<p>The idea behind this rule is that one of $P$ or $Q$ must be true, and therefore,
$R$ is derivable using the corresponding premise.
The rule incorporates the key idea of natural deduction,
namely permission to make specified assumptions locally
that are <em>discharged</em> (“paid off”, so to speak) further on.</p>
<p>This rule can obviously be generalised to an $n$-ary disjunction. We may derive $R$
from the following $n+1$ promises:</p>
<ul>
<li>$P_1\lor \cdots \lor P_n$</li>
<li>a proof of $R$ that may assume $P_i$, for $i=1$, …, $n$</li>
</ul>
<p>Obviously, if $n=2$, we get the same rule as before.
If $n=1$, it degenerates to a tautology.
And what happens if $n=0$?
Then the rule says that $R$ follows from the empty disjunction alone.
The empty disjunction is falsity.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
If our calculus can derive falsity from $P$ and $\neg P$,
then it has the principle of explosion built in.</p>
<h3 id="final-remarks">Final remarks</h3>
<p>As promised, in specific formal systems, the principle of explosion arises all by itself.
It doesn’t have to be assumed. Taking it as a general logical principle
is then simply a form of abstraction.<br />
But it also arises naturally in logical formalisms from the basic principles of natural deduction.</p>
<p><a href="https://plato.stanford.edu/entries/logic-paraconsistent/">Paraconsistent logics</a>
are formal systems in which the impact of a contradiction is contained.
I can’t comment on the value of such work to philosophy,
but they have also been studied in the context of artificial intelligence.
There, the point is that it’s easy for the facts in an inexact real-world situation
to be inconsistent, and you don’t want everything to collapse.
I would argue however that you should never be using formal logic
to reason directly about real-world situations.
And indeed, the symbolic/logical tendency that was so prominent in early AI work
has pretty much vanished in favour of essentially statistical techniques
based on neural networks.
There, the problem doesn’t arise because nothing is being proved.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>$M = 0(\textbf{K}N)M = 1(\textbf{K}N)M = \textbf{K}NM = N$ <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I hope you can see this: $P_1\lor \cdots \lor P_n$ is true precisely if some $P_i$ is true, $i=1$, …, $n$. If $n=0$ then it must always be false. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 14 Feb 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/02/14/Contradiction.html
https://lawrencecpaulson.github.io//2024/02/14/Contradiction.htmlRussell's Paradox: Myth and Fact<p>The story of Russell’s paradox is well known.
<a href="https://plato.stanford.edu/entries/frege/">Gottlob Frege</a> had written a treatise
on the foundations of arithmetic, presenting a formal logic and a
development of elementary mathematics from first principles.
Bertrand Russell, upon reading it, wrote to Frege mostly to praise the work, but asking a critical question. Frege replied to express his devastation at seeing his life’s work ruined.
Some later commentators went further, saying that Russell’s paradox refuted the entire <a href="https://plato.stanford.edu/entries/logicism/#">logicist approach</a>
to the foundations of mathematics (the idea that mathematics can be reduced to logic).
Much is wrong with this story.
The impact of Russell’s paradox is less than many believe, and greater.</p>
<h3 id="what-is-russells-paradox">What is Russell’s paradox?</h3>
<p>The paradox can be expressed quite simply in English.
Let $R$ denote the set of all sets that are not members of themselves;
Then, $R$ is a member of itself if and only if it is not a member of itself.
(In symbols, define $R$ as $\{x \mid x\not\in x\}$; then $R\in R$ iff $R\not\in R$.)
Both possibilities lead to a contradiction.</p>
<p>Faced with such a situation, we need to locate the problem: to scrutinise every assumption.
At the centre is clearly the notion of a <em>set</em>.
It is an abstraction of various collective nouns in common use,
such as nation, clan, family. Each of these is a unit composed of smaller units,
And we even have a hierarchy: a nation can be a collection of clans,
each of which is a collection of families.
For further examples we have herds, and armies (with their hierarchy of divisions, regiments, etc).
None of these collections can be members of themselves.
But if we accept the <em>universal set</em> $V$, to which everything belongs,
then surely $V\in V$: it belongs to itself.
So maybe the universal set is the root of the problem.
However, this insight does not show us the way out.
Two very different solutions emerged:</p>
<ul>
<li>
<p><em>Axiomatic set theory</em>. Unrestricted set comprehension is replaced by the <em>separation axiom</em>.
You cannot just form the set $\{x\mid\phi(x)\}$ from an arbitrary property $\phi(x)$;
you can only form <strong>subsets</strong> of some existing set $A$ as $\{x\in A\mid\phi(x)\}$.
Axioms are provided allowing the construction of sets according to certain specific principles.
Now $R$ cannot be constructed, and there is no universal set.
For technical purposes, a further axiom is usually assumed: to forbid
nonterminating membership chains,
such as $x\in y\in x$ and $\cdots \in x_3\in x_2\in x_1$.
Thus, no set can be a member of itself.
This route is due to <a href="https://plato.stanford.edu/entries/zermelo-set-theory/">Zermelo</a>
and Fraenkel.</p>
</li>
<li>
<p><em>Type theory</em>.
A type hierarchy is introduced to classify all values, and $x\in y$
can only be written if the type of $y$ is higher than that of $x$.
It is thus forbidden to write $x\in x$.
With types there is no universal set either, but there are universal sets for each type.
This route is due to Whitehead and Russell, who further complicated their type theory
to enforce the “<a href="https://plato.stanford.edu/entries/russell-paradox/#ERP">vicious circle principle</a>”,
which they saw as the root of all paradoxes.
Their <a href="https://plato.stanford.edu/entries/type-theory/#RamiHierImprPrin">ramified type theory</a>
turned out to be unworkable.
Simplified by <a href="https://plato.stanford.edu/entries/ramsey/">Frank Ramsey</a>
and formalised by Alonzo Church,
it became <em>higher-order logic</em> as used today.</p>
</li>
</ul>
<p>Modern constructive type theories, such as Martin-Löf’s,
amalgamate ideas from both approaches,
providing a richer language of types and giving them a prominent role.</p>
<p>Of other approaches, one must mention
Quine’s <a href="https://plato.stanford.edu/entries/quine-nf/">New Foundations</a>.
He aimed to have a universal set containing itself as an element.
In order to prevent Russell’s paradox, he introduced the notion of a <em>stratified</em> formula,
a kind of local type checking that prohibited $\{x \mid x\not\in x\}$.
The problem was, nobody was sure for decades whether NF was consistent or not,
making it a rather scurvy candidate for the foundations of mathematics.</p>
<h3 id="what-was-its-impact">What was its impact?</h3>
<p>Russell’s paradox comes from Cantor’s theorem, which states that
there is no injection from the powerset ${\cal P}(A)$ of a given $A$ into $A$ itself.
(There is no way to assign each element of ${\cal P}(A)$
a unique element of $A$.) But if $V$ is the universal set,
then ${\cal P}(V)$ is actually a subset of $V$, contradiction.
Or, to quote Gödel:</p>
<blockquote>
<p>By analyzing the paradoxes to which Cantor’s set theory had led, he freed them from all mathematical technicalities, thus bringing to light the amazing fact that our logical intuitions (i.e., intuitions concerning such notions as: truth, concept, being, class, etc.) are self-contradictory.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p>This was huge. Mathematicians and philosophers have taken for granted that a <em>concept</em>
(i.e., property)
and the corresponding <em>class</em> (i.e., set) were more or less interchangeable.
The concept of red could be identified with the class of all red things;
the concept of an even number could be identified with the class of even numbers.
The universal class could be defined as the class of all $x$ satisfying $x=x$.
In fact, we can still do this. But people took for granted that these classes
were entities in themselves, and in particular, could belong to other classes.
That is what had to be sacrificed.</p>
<p>As late as the early 20th century, when Whitehead and Russell were writing
<a href="https://plato.stanford.edu/entries/principia-mathematica/">Principia Mathematica</a>,
the words <em>class</em> and <em>set</em> were synonymous.
Today, especially in the context of set theory,
a <em>class</em> is the collection of everything satisfying a specific property,
but only sets actually <strong>exist</strong>.
A reference to some proper class – say the universal class, or the class of ordinals –
is invariably described using the phrase “only a <em>façon de parler</em>”:
a manner of speaking, nothing more.</p>
<p>George Boolos <a href="https://www.jstor.org/stable/4545060">has pointed out</a>
that Russell’s paradox did not impact
Frege’s work in any significant way. Frege had indeed assumed
unrestricted set comprehension, the fatal principle that leads to Russell’s paradox.
But he used it only once, to derive a much weaker consequence
that he could have taken as an axiom instead. The paradox
did not damage Frege’s work, which survives today as the predicate calculus.
However, it laid waste to his intellectual worldview.</p>
<h3 id="wider-ramifications">Wider ramifications</h3>
<p>Russell’s paradox was about as welcome as a bomb at a wedding.
Decades later, the dust still had not settled.
Russell collected a list of other paradoxes, the most serious being
Burali-Forti’s: the set $\Omega$ of ordinal numbers is itself an ordinal number,
and therefore $\Omega\in \Omega$, which implies $\Omega<\Omega$.</p>
<p>The first part of the 20th century saw the publication of Zermelo’s
axioms for set theory, in which he introduced his separation axiom, and much more controversially, his <em>axiom of choice</em>.
In that febrile time, many had no appetite for further risk-taking
in the form of this radical new axiom.
Whitehead and Russell formalised a significant chunk of mathematics using their type theory.
Hilbert announced his programme for proving the consistency of mathematics,
but the incompleteness and undecidability results of the 1930s put an end to such ideas.
By the 1960s, we had learned that fundamental questions – such as the status of the axiom of choice and the <a href="https://plato.stanford.edu/entries/continuum-hypothesis/">continuum hypothesis</a> – could not be settled
using the axioms of Zermelo-Fraenkel set theory.</p>
<p>Russell’s paradox also made its appearance in Alonzo Church’s $\lambda$-calculus.
Church originally conceived his system as a new approach to logic,
in which sets were encoded by their characteristic functions: $MN$
meant that $N$ was an element of $M$,
while $\lambda x. M$ denoted unrestricted set comprehension over the predicate $M$.
Church devised techniques for encoding Boolean values and operations
within the $\lambda$-calculus.
However, Haskell Curry noticed that the Russell set $R$
could be expressed as $\lambda x. \neg (x x)$.
He thereby obtained a contradiction, $RR = \neg(RR)$.
Generalising from negation to an arbitrary function symbol,
Curry obtained his famous $Y$-combinator.
This gave Church’s $\lambda$-calculus tremendous expressivity,
but rendered his logic inconsistent.</p>
<h3 id="however">However</h3>
<p>Ludwig Wittgenstein wasn’t much bothered by contradictions. He wrote, with his usual lucidity,</p>
<blockquote>
<p>If a contradiction were now actually found in arithmetic that would only prove that an arithmetic with such a contradiction in it could render very good service; and it will be better for us to modify our concept of the certainty required, than to say that it would really not yet have been a proper arithmetic.</p>
</blockquote>
<p>This means apparently that what you don’t know can’t hurt you.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Kurt Gödel, <a href="https://doi.org/10.1017/CBO9781139171519.024">Russell’s mathematical logic</a>. <em>In</em>: P Benacerraf, H Putnam (eds), <em>Philosophy of Mathematics: Selected Readings</em> (CUP, 1984), 447–469. I have already posted this quotation twice before (on <a href="/2023/04/12/Wittgenstein.html">Wittgenstein</a> and then on the <a href="/2023/11/01/Foundations.html">foundations of mathematics</a>). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 31 Jan 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/01/31/Russells_Paradox.html
https://lawrencecpaulson.github.io//2024/01/31/Russells_Paradox.htmlCoinductive puzzle, by Jasmin Blanchette and Dmitriy Traytel<p>Coinduction has a reputation for being esoteric, but there are some situations where it is close to
indispensable. One such scenario arose during an Isabelle refinement proof for a verified automatic
theorem prover for first-order logic, <a href="https://doi.org/10.1145/3293880.3294100">a proof that also involved Anders Schlichtkrull</a>. In that
prover, execution traces can be finite or infinite, reflecting the undecidability of first-order
logic.
The refinement proof involves a simulation argument between two layers: an abstract specification
and a concrete theorem prover, both given as transition systems (i.e., binary relations over
states). A single “big” step of the concrete prover represents an entire iteration of the prover’s
main loop and may therefore correspond to multiple “small” steps of the abstract prover.</p>
<p>The simulation proof requires relating the concrete layer with the abstract layer. The concrete
“big-step” sequence is of the form $St_0 \leadsto^+ St_1 \leadsto^+ St_2 \leadsto^+ \cdots$, where the
$St_i$’s are states and $\leadsto^+$ is the transitive closure of the abstract transition system.
However, to complete the refinement, we must obtain a “small-step” sequence $St_0 \leadsto \cdots
\leadsto St_1 \leadsto \cdots \leadsto St_2 \leadsto \cdots$.</p>
<p>If the big-step sequence is finite, the existence of the small-step sequence can be proved using
induction. But in our semidecidable scenario, sequences may be infinite. One way to cope with this
is to use coinductive methods. This blog entry presents a solution to this coinductive puzzle.</p>
<h3 id="preliminaries">Preliminaries</h3>
<p>To represent possibly infinite sequences of states, we use the coinductive datatype of lazy lists:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codatatype 'a llist = LNil | LCons 'a "'a llist"
</code></pre></div></div>
<p>Intuitively, lazy lists are like ordinary finite lists, except that they also allow infinite values
such as <code class="language-plaintext highlighter-rouge">LCons 0 (LCons 1 (LCons 2 ...))</code>. However, the reasoning principles for
coinductive types and predicates are rather different from their inductive counterparts, as we will
see in a moment.</p>
<p>Let us review some useful vocabulary for lazy lists. First, the selectors <code class="language-plaintext highlighter-rouge">lhd : 'a llist -> 'a</code> and
<code class="language-plaintext highlighter-rouge">ltl : 'a llist -> 'a llist</code> return the head and the tail, respectively, of an <code class="language-plaintext highlighter-rouge">LCons</code> value. For an
<code class="language-plaintext highlighter-rouge">LNil</code> value, <code class="language-plaintext highlighter-rouge">lhd</code> returns a <a href="https://lawrencecpaulson.github.io/2021/12/01/Undefined.html">fixed arbitrary value</a> and <code class="language-plaintext highlighter-rouge">ltl</code> returns <code class="language-plaintext highlighter-rouge">LNil</code>. Then
the function <code class="language-plaintext highlighter-rouge">llast : 'a llist -> 'a</code> returns the last value of a finite lazy list. If there is no
such value, because the lazy list is either empty or infinite, <code class="language-plaintext highlighter-rouge">llast</code> returns a fixed arbitrary
value. Next, the function <code class="language-plaintext highlighter-rouge">prepend</code> concatenates a finite list and a lazy list:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fun prepend :: "'a list -> 'a llist -> 'a llist" where
"prepend [] ys = ys"
| "prepend (x # xs) ys = LCons x (prepend xs ys)"
</code></pre></div></div>
<p>In the simulation proof, we do not work with arbitrary lazy lists but with nonempty lazy lists
whose consecutive elements are related by the small-step or big-step transition relation. To
capture this restriction, we use a coinductive predicate that characterizes such chains:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>coinductive chain :: "('a ⇒ 'a ⇒ bool) ⇒ 'a llist ⇒ bool" for R :: "'a ⇒ 'a ⇒ bool" where
"chain R (LCons x LNil)"
| "chain R xs ⟹ R x (lhd xs) ⟹ chain R (LCons x xs)"
</code></pre></div></div>
<p>The predicate has two introduction rules, one for singleton chains and one for longer chains. Had
we worked with finite lists instead of lazy lists, we would have written the same definition
replacing the <code class="language-plaintext highlighter-rouge">coinductive</code> keyword with <code class="language-plaintext highlighter-rouge">inductive</code>. The magic of coinduction allows us to apply
the second introduction rule infinitely often. This is necessary when proving that an infinite lazy
list forms a chain.</p>
<p>The big-step sequence should be a subsequence of the small-step sequence. We formalize coinductive
subsequences via the predicate <code class="language-plaintext highlighter-rouge">emb</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>coinductive emb :: "'a llist ⇒ 'a llist ⇒ bool" where
"lfinite xs ⟹ emb LNil xs"
| "emb xs ys ⟹ emb (LCons x xs) (prepend zs (LCons x ys))"
</code></pre></div></div>
<p>Our definition requires that finite lazy lists may not be embedded in infinite lazy lists. In our
application, this matters because we want to ensure that only finite small-step sequences can
simulate finite big-step sequences.</p>
<p>In Isabelle, a coinductive predicate <code class="language-plaintext highlighter-rouge">P</code> is accompanied by corresponding coinduction principles that
allow us to prove positive statements of the form <code class="language-plaintext highlighter-rouge">P ...</code>. For <code class="language-plaintext highlighter-rouge">chain</code> and <code class="language-plaintext highlighter-rouge">emb</code> we obtain the
following principles:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X xs ⟹
(⋀xs'. X xs' ⟹
(∃x. xs' = LCons x LNil) ∨
(∃xs x. xs' = LCons x xs ∧ (X xs ∨ chain R xs) ∧ R x (lhd xs))) ⟹
chain R xs
X xs ys ⟹
(⋀xs' ys'.
X xs' ys' ⟹
(∃ys. xs' = LNil ∧ ys' = ys ∧ lfinite ys) ∨
(∃xs ys x zs. xs' = LCons x xs ∧ ys' = prepend zs (LCons x ys) ∧ (X xs ys ∨ emb xs ys))) ⟹
emb xs ys
</code></pre></div></div>
<p>These principles embody the fact that <code class="language-plaintext highlighter-rouge">chain</code> and <code class="language-plaintext highlighter-rouge">emb</code> are the greatest (“most true”) predicates
stable under the application of their respective introduction rules. For example for <code class="language-plaintext highlighter-rouge">emb</code>, given a
binary relation <code class="language-plaintext highlighter-rouge">X</code> stable under <code class="language-plaintext highlighter-rouge">emb</code>’s introduction rules, any arguments satisfying <code class="language-plaintext highlighter-rouge">X</code> also
satisfy <code class="language-plaintext highlighter-rouge">emb</code>. Stability under introduction rules means that for any arguments <code class="language-plaintext highlighter-rouge">xs'</code> and <code class="language-plaintext highlighter-rouge">ys'</code>
satisfying <code class="language-plaintext highlighter-rouge">X</code> that correspond to the arguments of <code class="language-plaintext highlighter-rouge">emb</code> in either one of <code class="language-plaintext highlighter-rouge">emb</code>’s two introduction
rules, the arguments of the self-calls also satisfy <code class="language-plaintext highlighter-rouge">X</code>.</p>
<h3 id="the-main-theorem">The main theorem</h3>
<p>We are now ready to state our desired theorem:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lemma "chain R⇧+⇧+ xs ⟹ ∃ys. chain R ys ∧ emb xs ys ∧ lhd ys = lhd xs ∧ llast ys = llast xs"
</code></pre></div></div>
<p>In words, given a big-step sequence <code class="language-plaintext highlighter-rouge">xs</code> whose consecutive elements are related by the transitive
closure <code class="language-plaintext highlighter-rouge">R⇧+⇧+</code> of a relation <code class="language-plaintext highlighter-rouge">R</code>, there exists a small-step sequence <code class="language-plaintext highlighter-rouge">ys</code> whose consecutive
elements are related by <code class="language-plaintext highlighter-rouge">R</code>. The small-step sequence must embed, using <code class="language-plaintext highlighter-rouge">emb</code>, the big-step
sequence. In addition, the sequences’ first and last elements must coincide. If both sequences are
infinite, their last elements are equal by definition to the same fixed arbitrary value, as
explained above.</p>
<h3 id="the-proof">The proof</h3>
<p>To prove the theorem, we instantiate the existential quantifier with a witness, which we define
corecursively:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>corec wit :: "('a ⇒ 'a ⇒ bool) ⇒ 'a llist ⇒ 'a llist" where
"wit R xs = (case xs of LCons x (LCons y xs) ⇒
LCons x (prepend (pick R x y) (wit R (LCons y xs))) | _ ⇒ xs)"
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">wit</code> function fills the gaps between consecutive values of the big-step sequence with
arbitrarily chosen intermediate values that form finite chains. We use Hilbert’s choice operator to
construct these chains:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>definition pick :: "('a ⇒ 'a ⇒ bool) ⇒ 'a ⇒ 'a ⇒ 'a list" where
"pick R x y = (SOME xs. chain R (llist_of (x # xs @ [y])))"
</code></pre></div></div>
<p>Here, <code class="language-plaintext highlighter-rouge">llist_of</code> converts finite lists to lazy lists, which allows us to reuse the <code class="language-plaintext highlighter-rouge">chain</code>
predicate. The <code class="language-plaintext highlighter-rouge">pick</code> function is characterized by the following property:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lemma "R⇧+⇧+ x y ⟹ chain R (llist_of (x # pick x y @ [y]))"
</code></pre></div></div>
<p>Going back to <code class="language-plaintext highlighter-rouge">wit</code>’s definition, we may wonder why Isabelle accepts it in the first place. The
definition is not obviously productive, a requirement that would ensure its totality. Productive
definitions generate at least one constructor after calling themselves. In our case, <code class="language-plaintext highlighter-rouge">LCons</code> is
that constructor, but <code class="language-plaintext highlighter-rouge">prepend</code> stands in the way and could potentially destroy constructors
produced by the self-call to <code class="language-plaintext highlighter-rouge">wit</code>. However, we know that <code class="language-plaintext highlighter-rouge">prepend</code> is friendly enough to only add
constructors and not to remove them. In Isabelle, we can register it as a <a href="https://doi.org/10.1007/978-3-662-54434-1_5">“friend”</a>, which
convinces our favorite proof assistant to accept the above definition.</p>
<p>It remains to prove the four conjuncts of our main theorem, taking <code class="language-plaintext highlighter-rouge">ys</code> to be <code class="language-plaintext highlighter-rouge">wit R xs</code>.</p>
<p>First, we prove <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ lhd (wit R xs) = lhd xs</code> by simple rewriting.</p>
<p>Second, we attempt to show <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ emb xs (wit R xs)</code> using <code class="language-plaintext highlighter-rouge">emb</code>’s coinduction principle.
To this end, Isabelle’s coinduction proof method instantiates <code class="language-plaintext highlighter-rouge">X</code> with the canonical relation
<code class="language-plaintext highlighter-rouge">λxs ys. ys = wit xs ∧ chain R⇧+⇧+ xs</code>. After some simplification, we arrive at a goal requiring us to prove
<code class="language-plaintext highlighter-rouge">(∃zs. LCons x (prepend (pick x y) (wit (LCons y xs))) = prepend zs (LCons x (wit (LCons y xs))))</code>
whose two side have the <code class="language-plaintext highlighter-rouge">prepend</code> in different positions (on one side before the <code class="language-plaintext highlighter-rouge">x</code>, on the other side after). We would like to insert a second <code class="language-plaintext highlighter-rouge">prepend zs'</code> (where <code class="language-plaintext highlighter-rouge">zs'</code> would be existentially quantified) on the right-hand side, so that we can instantiate <code class="language-plaintext highlighter-rouge">zs</code> with the empty list and <code class="language-plaintext highlighter-rouge">zs'</code> with <code class="language-plaintext highlighter-rouge">pick x y</code>, making both sides equal.
We can achieve this by modifying <code class="language-plaintext highlighter-rouge">X</code> to be <code class="language-plaintext highlighter-rouge">λxs ys. ∃zs'. ys = prepend zs' (wit xs) ∧ chain R⇧+⇧+ xs</code>.
A more principled alternative is to manually derive the following generalized coinduction principle, which inserts <code class="language-plaintext highlighter-rouge">prepend zs'</code> at the right place:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X xs ys ⟹
(⋀xs' ys'.
X xs' ys' ⟹
(∃ys. xs' = LNil ∧ ys' = ys ∧ lfinite ys) ∨
(∃xs ys x zs zs'. xs' = LCons x xs ∧ ys' = prepend zs (LCons x (prepend zs' ys)) ∧ (X xs ys ∨ emb xs ys))) ⟹
emb xs ys
</code></pre></div></div>
<p>This approach is an instance of a general technique called <a href="https://doi.org/10.1017/CBO9780511792588.007">coinduction up to</a>.</p>
<p>Third, we need to prove <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ llast (wit R xs) = llast xs</code>. Since we now know that
<code class="language-plaintext highlighter-rouge">emb xs (wit R xs)</code> holds, by definition of <code class="language-plaintext highlighter-rouge">emb</code> only two cases are possible. Either both
<code class="language-plaintext highlighter-rouge">wit R xs</code> and <code class="language-plaintext highlighter-rouge">xs</code> are finite lazy lists, in which case the property follows by induction, or both
are infinite, in which case their last elements are equal to the notorious fixed arbitrary value.</p>
<p>Fourth, when attempting to prove <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ chain R (wit R xs)</code>, we run into a similar issue
as in the proof of the second conjunct. The resolution is also similar. We manually derive a
coinduction-up-to principle for <code class="language-plaintext highlighter-rouge">chain</code> with respect to <code class="language-plaintext highlighter-rouge">prepend</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X xs ⟹
(⋀xs'. X xs' ⟹
(∃x. xs' = LCons x LNil) ∨
(∃xs x zs'. xs' = LCons x (prepend zs' xs) ∧ (X xs ∨ chain R xs) ∧ chain R (llist_of (y # zs' @ [lhd xs])))) ⟹
chain R xs
</code></pre></div></div>
<p>This principle additionally involves a generalization of the side condition <code class="language-plaintext highlighter-rouge">R y (lhd xs)</code> to
<code class="language-plaintext highlighter-rouge">chain R (llist_of (y # zs' @ [lhd xs]))</code> to incorporate <code class="language-plaintext highlighter-rouge">zs'</code>.</p>
<h3 id="conclusion">Conclusion</h3>
<p>We successfully solved the coinductive puzzle that arose during our verification of an automatic
theorem prover. At its core, the puzzle has little to do with theorem proving; instead, it is about
refinement of possibly nonterminating transition systems. Our proof can be found in the <a href="https://devel.isa-afp.org/sessions/ordered_resolution_prover/#Lazy_List_Chain.html#Lazy_List_Chain.chain_tranclp_imp_exists_chain|fact">AFP</a>. On the plus side, Isabelle conveniently
allowed us to define all the functions and predicates we needed to carry out the proof, including
functions whose productivity relied on “friends”. On the minus side, the proof of an easy-looking
theorem required some ingenuity. In particular, we found ourselves deriving coinduction-up-to
principles for coinductive predicates manually to use them for definitions involving “friends”. An
avenue for future work would be to derive such principles automatically.</p>
Wed, 08 Nov 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/11/08/CoinductivePuzzle.html
https://lawrencecpaulson.github.io//2023/11/08/CoinductivePuzzle.htmlWhat do we mean by "the foundations of mathematics"?<p>The phrase “foundations of mathematics” is bandied about frequently these days,
but it’s clear that there is widespread confusion about what it means.
Some say that a proof assistant must be based on a foundation of mathematics,
and therefore that the foundations of mathematics refers to some sort of formal system.
And yet, while set theory is frequently regarded as <em>the</em> foundation of mathematics,
none of the mainstream proof assistants are based on set theory.
These days we see everything from category theory to homotopy type theory
described as a possible foundation of mathematics.
There is a lot of wrongness here.</p>
<h3 id="what-we-mean-by-the-foundations-of-mathematics">What we mean by the foundations of mathematics?</h3>
<p>N. G. de Bruijn made the remarkable claim</p>
<blockquote>
<p>We do not possess a workable definition of the word “mathematics”. (AUT001, p. 4)</p>
</blockquote>
<p>He seemed to be referring primarily to the difficulty of defining
<em>mathematical reasoning</em>, but the dictionary definition – “the abstract science of number, quantity, and space” – does not begin to scratch the surface
of the topics studied by mathematicians, such as groups, abstract topologies,
graphs or even finite sets. If we can’t define mathematics, neither can we define
the notion of mathematical foundations.</p>
<p>One solution to this difficulty is to say, “I can’t define it but I know what it is
when I see it”. This has famously been applied to pornography and even there does not
settle the question in the case of something like
Titian’s <a href="https://en.wikipedia.org/wiki/Venus_of_Urbino">Venus d’Urbino</a>.
Mathematical reasoning can be wrong or doubtful while still being great mathematics;
Newton and Euler used infinitesimals and other methods generally rejected today.
Crank attempts to square the circle or prove the Riemann hypothesis
often look like mathematics while saying nothing.</p>
<p>The foundations of mathematics is concerned with questions of the form
“does this even make sense”? It seems to be triggered by periodic crises:</p>
<ul>
<li>the existence of irrational numbers</li>
<li>Berkeley’s <a href="https://plato.stanford.edu/entries/continuity/">criticism of infinitesimals</a></li>
<li>the infinite</li>
<li>the discovery of non-Euclidean geometries</li>
<li>Russell’s paradox (1901) and many others</li>
</ul>
<p>The story of Pythagoras trying to suppress the shocking
discovery of irrational numbers such as $\sqrt2$,
the ratio of the diagonal of a square to its side, is probably mythical.
But it seems that <a href="https://plato.stanford.edu/entries/dedekind-foundations/">they noticed</a>:</p>
<blockquote>
<p>The Greeks’ response to this startling discovery culminated in Eudoxos’ theory of ratios and proportionality, presented in Chapter V of Euclid’s Elements.</p>
</blockquote>
<p>The nature of the real numbers was still not clear in the 19th century.
Richard Dedekind devoted himself to this problem,
inventing the famous <a href="https://en.wikipedia.org/wiki/Dedekind_cut">Dedekind cuts</a>:
downwards-closed sets of rational numbers. Cantor independently chose to define
real numbers as equivalence classes of Cauchy sequences.
The point is not that a real number <em>is</em> either of those things, but simply that
we can present specific constructions exhibiting the behaviour expected of the real numbers.</p>
<p>Cantor’s work on set theory is well known. Dedekind also made major contributions, in his
by <a href="https://plato.stanford.edu/entries/dedekind-foundations/"><em>Was Sind und Was Sollen die Zahlen</em> </a>.
Their aim was finally to pin down the precise meaning of concepts such as function,
relation and class, and how to make sense of infinite collections and infinite constructions.</p>
<p>Berkeley’s attack on infinitesimals resulted in a concerted effort to banish them
in favour of $\epsilon$-$\delta$ arguments (hated by many), which remind me of
challenge-response protocols in computer science. As I’ve <a href="/2022/08/10/Nonstandard_Analysis.html">noted previously</a> on this blog,
today – thanks to set theory – we have the theoretical tools to place infinitesimals
on a completely rigorous basis.</p>
<h3 id="the-paradoxes-and-the-solutions">The paradoxes and the solutions</h3>
<p>The <a href="https://plato.stanford.edu/entries/settheory-early/#CritPeri">paradoxes of set theory</a>,
discovered around the turn of the 20th century, aroused huge disquiet. Although I have
posted this exact quote <a href="/2023/04/12/Wittgenstein.html">previously</a>, there is no better description than Gödel’s:</p>
<blockquote>
<p>By analyzing the paradoxes to which Cantor’s set theory had led, he freed them from all mathematical technicalities, thus bringing to light the amazing fact that our logical intuitions (i.e., intuitions concerning such notions as: truth, concept, being, class, etc.) are self-contradictory.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p>Russell’s paradox was seen as a potentially fatal blow to much of the 19th century
foundational work, including that of Frege, Dedekind and Cantor.
Russell (and Whitehead) decided to continue in the spirit of Frege’s <em>Logicist</em>
programme of reducing mathematics to logic. But does this make sense? I can imagine
De Bruijn saying</p>
<blockquote>
<p>We do not possess a workable definition of the word “logic”.</p>
</blockquote>
<p>The system they created in the multivolume
<a href="https://plato.stanford.edu/entries/principia-mathematica/"><em>Principia Mathematica</em></a>
was needlessly complicated and in some respects curiously imprecise.
But it led to today’s first-order logic and especially higher-order logic.
Russell and Whitehead formalised some chunks of mathematics in great detail
and with immense tedium, but they could not have predicted how powerful
their system would turn out to be.</p>
<p>Many philosphers had contemplated the essence of mathematics in prior centuries,
but the crisis gave the issues urgency. Roughly speaking, there are three main schools of thought:</p>
<ul>
<li>The <em>Platonist</em> or <em>realist</em> viewpoint: ideal mathematical objects, such as the complex plane, exist objectively and independently of us, though we may deduce their properties. Gödel held this view.</li>
<li>The <em>formalist</em> viewpoint: mathematics is concerned with symbols. For Hilbert,
I think that <a href="https://plato.stanford.edu/entries/hilbert-program/">his programme</a>
was a technical approach to abolish the paradoxes rather than
an expression of his true beliefs. How can one person adhere to
a <a href="https://plato.stanford.edu/entries/hilbert-program/#2">finitary point of view</a>
and simultaneously describe Cantor’s world of transfinite ordinals and cardinals
as a paradise? But it seems that others, such as Curry, regarded mathematics
as nothing but a symbolic game.</li>
<li>The <a href="https://plato.stanford.edu/entries/intuitionism/"><em>intuitionists</em></a> held that mathematical objects were nothing but creations of the human mind.
This gave them a radical attitude to proof and the wholesale rejection of many techniques and concepts regarded by others as indispensable.
Their rejection of the reality of mathematical objects and their stance against
symbolic formulas (other than as a means of communicating ideas)
set them firmly against the other schools.</li>
</ul>
<p>It seems clear from the reactions of Frege, Russell, Hilbert, Brouwer and many others
that the paradoxes constituted an emergency. Russell’s “vicious circle principle”
and his solution, namely ramified type theory, Brouwer’s intuitionism and Hilbert’s formalism
– these were the equivalent of burning all your clothes and furniture upon the discovery of bedbugs.
That the solution could lie in something as simple as
<a href="https://plato.stanford.edu/entries/zermelo-set-theory/">Zermelo’s separation axiom</a>
and the conception of the <a href="/papers/Boolos-iterative-concept-of-set.pdf">cumulative hierarchy of sets</a><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
was seemingly not anticipated.
It was a miracle.</p>
<h3 id="modern-foundations-of-mathematics">Modern foundations of mathematics</h3>
<p>Today one commonly sees all kinds of things described as “foundations of mathematics”,
especially category theory and type theory. Foundational work has definitely been done
within the framework of category theory, but that is not the same thing as saying that
category theory itself is foundational. The objects in category theory are equipped with
structure and the morphisms between objects are structure preserving, just as we have homomorphisms between groups and continuous maps between topological spaces.
By contrast, classical sets have no notion of structure beyond the membership relation,
which we might regard as bare metal.
Since a large part of mathematics is concerned with structure,
category theory is a natural fit.
That does not mean, however, that it addresses foundational issues.
It tends rather to introduce new ones, especially because of its unfortunate and
needless habit of assuming the existence of proper classes everywhere.
Far from replacing set theory, it relies on it.</p>
<p>As to whether type theory is foundational, we need to ask which type theory you are talking about:</p>
<ul>
<li>Principia Mathematica: of course, that was its precise purpose. Gödel’s essay, <a href="/papers/Russells-mathematical-logic.pdf">Russell’s mathematical logic</a>,
is an indispensable source on this and related topics.</li>
<li>Church’s simple type theory: the granddaughter of PM, it is equally expressive and a lot simpler.</li>
<li>Automath: absolutely not. De Bruijn consistently referred to it as “a <em>language</em> for mathematics”. He moreover said it was “like a big restaurant that serves all sorts of food: vegetarian, kosher, or anything else the customer wants”.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> Automath was, by design, neutral to foundational choices. (Isabelle/Pure is in the same spirit.)</li>
<li>Martin-Löf type theory: he himself said it was intended as a vehicle for formalising Bishop-style analysis, clearly a foundational claim. But one that rejects the vast majority of modern mathematics.</li>
<li>Calculus of inductive constructions (Coq, Lean): the original paper (describing a weaker system) begins “The calculus of constructions is a higher-order formalism for constructive proofs in natural deduction style,” and the paper makes no foundational claims.
Coquand’s <a href="https://www.cse.chalmers.se/~coquand/v1.pdf">retrospective paper</a> makes no such claims either.
Since it turns out to be significantly stronger than ZF set theory, one could even say it makes foundational assumptions.</li>
</ul>
<p>The world has moved on. People no longer worry about the issues that were
critical in the 19th century: the role of the real numbers, the role of infinity,
the status of infinitesimals, the very consistency of mathematics.
And the reason is simple: because Herculean work in the 19th and 20th centuries
largely banished those issues from our minds.</p>
<p>This achievement doesn’t seem to be much appreciated today.
Instead of “each real number can be understood as a set of rational numbers, and more generally, the most sophisticated mathematical constructions can be reduced
to a handful of simple principles”
people say “we are asked to believe that everything is a set”
and even “set theory is just another formal system”.</p>
<p>I give up.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Kurt Gödel, <a href="https://doi.org/10.1017/CBO9781139171519.024">Russell’s mathematical logic</a>. <em>In</em>: P Benacerraf, H Putnam (eds), <em>Philosophy of Mathematics: Selected Readings</em> (CUP, 1984), 447–469 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>George Boolos, <a href="https://doi.org/10.1017/CBO9781139171519.026">The iterative concept of set</a>. <em>Ibid</em>, 486–502 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>N.G. de Bruijn. A Survey of the Project Automath. <em>In</em>: R. P. Nederpelt, J. H. Geuvers, & R. C. Vrijer, de (Eds.), <em>Selected Papers on Automath</em> (North-Holland, 1994), 144 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 01 Nov 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/11/01/Foundations.html
https://lawrencecpaulson.github.io//2023/11/01/Foundations.htmlThe concept of proof within the context of machine mathematics<p>This post is prompted by a preprint, <a href="https://doi.org/10.48550/arXiv.2309.11457">Automated Mathematics and the Reconfiguration of Proof and Labor</a>,
recently uploaded by <a href="https://ochigame.org">Rodrigo Ochigame</a>.
It begins by contrasting two opposing ideals of proof — what any computer scientist
would call <em>top-down</em> versus <em>bottom-up</em> – and then asks how they might have to be modified
in a possible future in which mathematics is automated.
To my way of thinking, his outlook is too negative.</p>
<h3 id="the-ideals-of-proof">The ideals of proof</h3>
<p>The two ideals, which Ochigame are tributes to <a href="https://www.pet.cam.ac.uk/news/professor-ian-macdougall-hacking-1936-2023">Ian Hacking</a>,
are as follows:</p>
<ul>
<li><em>Cartesian ideal of proof</em>: “after some reflection and study, one totally understands the proof, and can get it in one’s mind ‘all at once’”</li>
<li><em>Leibnizian ideal of proof</em>: “every step is meticulously laid out, and can be
checked, line by line, in a mechanical way”</li>
</ul>
<p>I feel divided, because I seldom feel capable of understanding a proof all at once,
and yet, having instead checked a lengthy proof line by line and getting to QED,
I feel no more enlightened than before. Perhaps many people feel this way,
and look for some compromise where they have a good idea about the mathematical tools
that were deployed in the proof, and just to be careful, meticulously verify
certain tricky or suspect calculations.</p>
<p>Ochigame himself explores a number of variations of these ideals in order to take into account
modern day complications such as phenomenally long, complex or specialised proofs.
He then outlines the history of the mechanisation of mathematical proof,
beginning with <a href="/2021/11/03/AUTOMATH.html">AUTOMATH</a>
and Mizar, and concluding with today’s systems, such as Lean and Isabelle.
Regarding these as <em>proof checkers</em> (where we are “verifying existing results”),
he then briefly outlines the history of automated theorem proving,
beginning with the work of Newell and Simon and mentioning <a href="https://lawrencecpaulson.github.io/tag/Hao_Wang">Hao Wang</a>.
And now I feel obliged to mention again that while Newell and Simon got all the glory
as AI pioneers, Wang’s system was on another planet when it came to capability.
That’s because Wang actually understood logic.
The AI world has often been driven by motives quite different from
the actual competence of a particular AI system (see also <a href="https://en.wikipedia.org/wiki/SHRDLU">SHRDLU</a>: the importance of having a demo).</p>
<h3 id="the-role-of-computer-encoded-proofs">The role of computer-encoded proofs</h3>
<p>Since most theorem provers work by reducing every claim
to a string of low-level inferences in some built-in calculus,
and since they don’t understand anything, we expect them to be firmly on the Leibnizian side.
Ochigame proposes the following</p>
<ul>
<li><em>Practical standard of computer-encoded proofs</em>: every step can be checked by a computer program and derived from the axiomatic foundations of the program; and after some study, one understands or trusts the encoding of the proven statement.</li>
</ul>
<p>This formulation is natural enough, but I can imagine that mathematicians would be
dissatisfied: it gives them no way to survey the proof themselves.
They are forced to trust the computer program, its axiomatic foundations
and even the underlying hardware, and realistically, they are going to have
to trust the encoding of the proven statement as well.</p>
<p>Isabelle has supported
legibility since Makarius Wenzel introduced
his <a href="https://rdcu.be/dngL4">Isar structured language</a> in 1999.
Through this blog I have published <a href="https://lawrencecpaulson.github.io/tag/examples">numerous examples</a>
to demonstrate how much legibility you can obtain if you try.
Too often, people don’t try. Incidentally, there is nothing about Isar inherently
specific to Isabelle/HOL: it works for all of Isabelle’s incarnations,
and I believe it could be adopted by Lean or Coq without modifying the underlying formalism.
The chief difficulty is that a more sophisticated user interface would be required;
an Isar proof is not simply a series of tactic invocations.</p>
<p>My ALEXANDRIA colleagues and I have formalised an enormous amount
of advanced mathematics, but we were never satisfied with formalisation alone;
we wanted our proofs to be legible. A mathematician still has to learn
the Isabelle notation, but then should be able to read the proof
without the aid of a computer. With existing automation, the computer
seldom sees further than a mathematician, rather the opposite:
we have to spell out many things
that a mathematician would find obvious.
At the moment, the chief exceptions are lengthy calculations and occasionally, large case analyses. If the time ever came that automation could find truly deep proofs,
we would have to insist that it delivered intelligible justifications.</p>
<h3 id="the-future-of-formalised-mathematics">The future of formalised mathematics</h3>
<p>Ochigame presents a bleak future in which formalisation becomes obligatory
for mathematicians, with formalisers distinct from the mathematicians themselves
and forming an underclass. The military origins of formal verification
are also mentioned, in a vaguely ominous way.</p>
<p>I see the future differently. As proof assistants become more useful,
and as more mathematicians become aware of them, their use will grow organically.
Journals may eventually start to request formalisations of some material,
but it’s likely that there will always be mathematics not easily formalisable
in any existing system.</p>
<h3 id="and-another-thing-why-is-it-always-about-proofs">And another thing: why is it always about proofs?</h3>
<p>Mathematics is too often presented as a discipline in which axioms
are laid down and theorems proved from them. Sometimes, axioms are even conflated
with beliefs, but I’m not going there today. Instead I would like to remark
(as I have <a href="/2023/04/12/Wittgenstein.html">done before</a>)
the genius in mathematics typically lies in the definitions, not in the proofs.
For example, <a href="https://en.wikipedia.org/wiki/Szemerédi_regularity_lemma">Szemerédi’s regularity lemma</a>
is a straightforward proof — some calculations and an induction —
relying on an extraordinary string of definitions.
Why should we care about edge density? How did he come up with ε-regular pairs of sets,
ε-regular partitions, the energy of a partition?
How did he come up with the theorem statement?
His genius was grasping the importance of these concepts.</p>
<p>The central importance of definitions like these gives something of a pass
to those proof assistants (most of them) that don’t support legible proofs:
if the definitions are there, you must be on the right track.</p>
<h3 id="postscript">Postscript</h3>
<p>I have a distant memory of NG de Bruijn (visiting Caltech in 1977) describing the “mathematics assembly-line”. He wrote down the word Genius, then an arrow pointing to “first-rate mathematician”, then I believe a further arrow pointing to “student”, a further arrow pointing to “journal” and there he drew a little tombstone. To my mind this conjures up the genius who has the ideas and the junior colleagues who fill in the details to make the work publishable.
(And yet, he himself seems to have published almost exclusively as <a href="https://www.semanticscholar.org/author/de-Ng-Dick-Bruijn/66031417/">sole author</a>.)
Conceivably, formalisation will begin to play some role on the journey from the genius to the grave.</p>
Wed, 04 Oct 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/10/04/Ochigame.html
https://lawrencecpaulson.github.io//2023/10/04/Ochigame.htmlThe End (?) of the ALEXANDRIA project<p>Today marks the final day of the <a href="https://www.cl.cam.ac.uk/~lp15/Grants/Alexandria/">ALEXANDRIA</a> project.
I outlined a brief history of the project
<a href="/2023/04/27/ALEXANDRIA_outcomes.html">not long ago</a>.
It is nevertheless right to take a moment to thank
the European Research Council
<a href="https://cordis.europa.eu/project/id/742178">for funding it</a>
and to state, yet again, what the outcomes were.
Six years on, what have we learned?</p>
<h3 id="how-it-started">How it started</h3>
<p>A milestone for the start of the project is the <a href="https://www.newton.ac.uk/event/bpr/"><em>Big Proof</em> programme</a>,
organised by the Newton Institute in Cambridge. Its theme mentioned two then recent
and widely-admired achievements:</p>
<blockquote>
<p>Interactive proof assistants have been used to check complicated mathematical proofs such as those for the Kepler’s conjecture and the Feit-Thompson odd order theorem.</p>
</blockquote>
<p>It then refers to</p>
<blockquote>
<p>the challenges of bringing proof technology into mainstream mathematical practice</p>
</blockquote>
<p>and it lists specifically</p>
<ol>
<li>Novel pragmatic foundations for representing mathematical knowledge and vernacular inspired by set theory, category theory, and type theory.</li>
<li>Large-scale formal mathematical libraries that capture background knowledge spanning a range of domains</li>
</ol>
<p>A proposal for a programme
devoted entirely to <a href="https://homotopytypetheory.org">homotopy type theory</a> (HoTT)
had been rejected, but people from that community were invited to <em>Big Proof</em>.
Dependent type theory, whether HoTT or the already established type theory of Coq,
was widely assumed to be the future of the formalisation of mathematics.
I felt very lucky to get funding for a project involving simple type theory
and <a href="https://isabelle.in.tum.de">Isabelle/HOL</a>.</p>
<p>During the programme, prior formalisation efforts were criticised as lacking sophistication.
As Kevin Buzzard pointedly noted,
researchers had formalised long proofs about simple objects, but no one had formalised
<em>even the definitions</em> of more complicated objects used every day,
such as Grothendieck schemes.
Much existing work formalised 19th-century mathematics.</p>
<p>These complaints would have to be tackled.</p>
<h3 id="how-it-went">How it went</h3>
<p>I chronicled the project in my <a href="/2023/04/27/ALEXANDRIA_outcomes.html">previous post</a>.
Briefly: we formalised heaps of mathematics.
We also did groundbreaking work on applications of information retrieval and machine learning
to formalisation.
A longer and more formal account can be found <a href="https://arxiv.org/abs/2305.14407">on arXiv</a>.</p>
<h3 id="how-it-ended-formalisation-of-mathematics">How it ended (formalisation of mathematics)</h3>
<p>The sheer amount of new formalised material is impressive (and the quality is also high):</p>
<ul>
<li>formalisations of advanced mathematics, including the first ever on topics such as additive combinatorics, combinatorial block designs and ordinal partition theory</li>
<li>showing that dependent types aren’t necessary to have sophisticated objects like Grothendieck schemes or ω-categories</li>
<li>tens of thousands of lines of more basic but necessary library material, e.g. on metric and topological spaces (imported from HOL Light)</li>
<li>we formalised advanced work from some of the leading mathematicians of the age: Erdős, Gowers, Roth, Szemerédi</li>
</ul>
<p>We developed some highly fruitful techniques:</p>
<ul>
<li><a href="https://rdcu.be/dkoEr">locales</a> work exceptionally well for <a href="https://www.tandfonline.com/doi/full/10.1080/10586458.2022.2062073">structuring complicated hierarchies of definitions</a></li>
<li>“dependent” constructions can typically be formalised as families of (typed) sets</li>
</ul>
<p>We arrived at some surprising conclusions:</p>
<ul>
<li>Formalising even advanced mathematics is largely a matter of perseverance.</li>
<li>Combining material from different branches of mathematics, say probability theory and graph theory or complex analysis and set theory, works fine.</li>
<li>Dependent types aren’t necessary and probably aren’t even advantageous. We aren’t the ones fighting our formalism.</li>
</ul>
<p>To be fair, <a href="https://xenaproject.wordpress.com/2020/12/05/liquid-tensor-experiment/">astonishing progress</a> has also been made by the <a href="https://leanprover.github.io">Lean</a> community.
They have been extremely active over the same period
and formalised <a href="https://leanprover-community.github.io">mountains of material</a>.</p>
<p><strong>We can safely conclude that proof assistants already offer value to mathematicians.</strong>
Although full formalisation is still not really affordable,
neither is it necessary.
You can forego proving the results that you feel confident about,
focusing your formalisation efforts on the problematical parts.</p>
<h3 id="how-it-ended-ai-techniques">How it ended (AI techniques)</h3>
<p>The proposal included a lot of speculative ideas about search
and auto completion, in particular by somehow mining
the existing libraries for “proof idioms”.
Writing the proposal in 2016, I had no idea how such things could be done.
I was lucky to attract people who were prepared to apply their specialised knowledge.
That’s how we got</p>
<ul>
<li>the <a href="https://behemoth.cl.cam.ac.uk/search/">SErAPIS search engine</a>, a one-of-a-kind tool to search the libraries even on the basis of abstract mathematical concepts</li>
<li>a tremendous amount of infrastructure to analyse the Isabelle libraries and extract information</li>
<li>a string of advanced papers on proof synthesis, auto-formalisation, an Isabelle parallel corpus and more</li>
</ul>
<p>These projects are still at the research stage, but show great promise!</p>
<h3 id="spreading-the-word">Spreading the word</h3>
<p>For more detail and links relating to everything described above,
you can visit the <a href="https://www.cl.cam.ac.uk/~lp15/Grants/Alexandria/">ALEXANDRIA</a> webpage
or read the <a href="https://arxiv.org/abs/2305.14407">project summary</a>.</p>
<p>The team has worked hard to share the knowledge we discovered. We have written</p>
<ul>
<li>13 journal articles, including half (3 out of 6) of a special issue of <em>Experimental Mathematics</em></li>
<li>15 articles in conference proceedings</li>
<li>2 refereed chapters in a <a href="https://link.springer.com/book/10.1007/978-3-030-15655-8"><em>Synthese Library</em> volume</a></li>
<li>33 formal proof developments accepted to Isabelle’s <a href="https://www.isa-afp.org"><em>Archive of Formal Proofs</em></a></li>
</ul>
<p>More are forthcoming.
In addition, we’ve worked on formalisation projects with about two dozen interns and students,
many of whom have gone on to do PhD research. We’ve given dozens of talks at variety of venues. We are open to collaboration to take our work forward.</p>
Thu, 31 Aug 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/08/31/ALEXANDRIA_finished.html
https://lawrencecpaulson.github.io//2023/08/31/ALEXANDRIA_finished.htmlPropositions as types: explained (and debunked)<p>The principle of <em>propositions as types</em> (a.k.a. <a href="https://en.wikipedia.org/wiki/Curry–Howard_correspondence">Curry-Howard isomorphism</a>),
is much discussed, but there’s a lot of confusion and misinformation.
For example, it is widely believed that propositions as types is the basis of most modern proof assistants;
even, that it is necessary for any computer implementation of logic.
In fact, propositions as types was found to be unworkable
as the basis for conducting actual proofs
the first time it was tried, in the earliest days of the AUTOMATH system.
All of the main proof assistants in use today maintain a clear distinction
between propositions and types.
The principle is nevertheless elegant, beautiful and theoretically fruitful.</p>
<h3 id="material-implication-versus-intuitionistic-truth">Material implication versus intuitionistic truth</h3>
<p>The most natural route to propositions as types runs through <em>material implication</em>.
“If it rained then the path will be muddy” sounds like a reasonable instance
of logical implication.
“If Caesar was a chain-smoker then mice kill cats” does not sound reasonable, and yet it is deemed to be true,
at least in classical logic, where $A\to B$ is simply an abbreviation for
$\neg A\lor B$.</p>
<p>Many people have thought that $A\to B$ should hold only if there is some sort
of connection between $A$ and $B$, and many different interpretations of $\to$ have been tried.
The most convincing interpretation comes from the intuitionists,
specifically, from Heyting’s
<a href="https://plato.stanford.edu/entries/intuitionistic-logic-development/#ProoInte">conception of mathematical truth</a> itself:</p>
<blockquote>
<p>Here, then, is the Brouwerian assertion of $p$: It is known how to prove $p$. We will denote this by $\vdash p$. The words “to prove” must be taken in the sense of “to prove by construction”. … $\vdash \neg p$ will mean: “It is known how to reduce $p$ to a contradiction”.</p>
</blockquote>
<p>Propositions as types is already contained in this principle: we identify
each proposition with the set of the mathematical constructions that make it true.
The word <em>proof</em> is often used in place of <em>construction</em>,
but these constructions are not proofs in some formal calculus.</p>
<p>In the case of implication, we now have</p>
<ul>
<li>a construction of $A\to B$ is a function that effectively transforms a construction of $A$ into a construction of $B$</li>
</ul>
<p>This function surely is the sought-for connection between $A$ and $B$.</p>
<h3 id="prositions-as-types-in-action">Prositions as types in action</h3>
<p>We can codify the principle above by asserting a rule of inference that derives
\(\lambda x. b(x) : A\to B\)
provided $b(x):B$ for arbitrary $x:A$.
If we regard $A\to B$ as a type, then this is one of the typing rules
for the <a href="https://en.wikipedia.org/wiki/Simply_typed_lambda_calculus">λ-calculus</a>.
And if we regard $A\to B$ as a formula, then
(ignoring the constructions) this is the introduction rule for implication
in a standard system of <a href="https://plato.stanford.edu/entries/natural-deduction/">natural deduction</a>,
proving $A\to B$ provided that $B$ can be proved assuming $A$.</p>
<p>Setting aside natural deduction for the moment, we can codify
the intuitionistic idea of implication rather differently.
A simple proof system for intuitionistic propositional logic has just two axioms:</p>
<ul>
<li>axiom K: $\quad A\to(B\to A)$</li>
<li>axiom S: $\quad(A\to(B\to C))\to ((A\to B)\to(A\to C))$</li>
</ul>
<p>And it has one inference rule, <em>modus ponens</em>, which from $A\to B$ and $A$
infers $B$. Here is a proof of $A\to A$:</p>
\[\begin{align}
(A\to((D\to A)\to A))\to{} & \\
((A\to (D\to A))\to(A\to A)) & \quad\text{by S}\notag \\[1ex]
A\to((D\to A)\to A) & \quad\text{by K} \\
(A\to (D\to A))\to(A\to A) & \quad\text{by MP, (1), (2)} \\
A\to (D\to A) & \quad\text{by K} \\
A\to A & \quad\text{by MP, (3), (4)}
\end{align}\]
<p>As a proof system, it sucks. But the propositions as types principle holds: this is essentially the same as the <strong>S</strong>-<strong>K</strong> <a href="https://en.wikipedia.org/wiki/Combinatory_logic">system of combinators</a>.
Function application corresponds to modus ponens,
The combinators correspond to the axioms (which give their types),
and the derivation of the identity combinator
as <strong>SKK</strong> corresponds to the proof above (with $A\to A$ as the type of <strong>I</strong>). The system of combinators also sucks:
it can be used to translate any λ-calculus term into combinators, but the blowup is exponential (exactly as with the proof system).
These observations are Curry’s—except he thought combinators were rather good—and Howard would not come along for a couple of decades.</p>
<p>Note by the way that we have not used dependent types. They are only needed if we want to have quantifiers.
In a <a href="/2021/11/24/Intuitionism.html">prior post</a> I have described how other logical symbols are rendered as types, in the context of Martin-Löf type theory.
In particular, the type $(\Pi x:A) B(x)$ consists of functions $\lambda x. b(x)$ where $b(x):B(x)$ for all $x:A$. The function space $A\to B$ is the special case where $B$ does not depend on $x$.</p>
<p>We need further types, namely $(\Sigma x:A) B(x)$ and $A+B$,
to get the full intuitionistic predicate calculus.
AUTOMATH provided the $\Pi$ type alone,
and de Bruijn even <a href="https://pure.tue.nl/ws/files/4428179/597611.pdf">wrote a paper</a>
cautioning against building too much into the framework itself.</p>
<h3 id="automath-and-irrelevance-of-proofs">AUTOMATH and irrelevance of proofs</h3>
<p>AUTOMATH, which I have
<a href="/2021/11/03/AUTOMATH.html">written about earlier</a>,
is the first proof checker to actually implement propositions as types.
It did this in the literal sense of providing symbols TYPE and PROP,
which internally were synonymous—at first. However</p>
<blockquote>
<p>One of the forms of the logical double negation axiom, written by means of “prop”, turns into the axiom about Hilbert’s $\epsilon$-operator if we replace prop by type. So if we want to do classical logic and do not want to accept the axiom of choice, we need some distinction.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p>It’s not surprising that a primitive DN for double-negation,
mapping $\neg\neg A \to A$, would also map a proof that $A$
was nonempty into $A$ itself.
This is the contrapositive of <a href="https://doi.org/10.2307/2039868">Diaconescu’s result</a> that
the axiom of choice implies the excluded middle (and therefore DN).</p>
<p>De Bruijn mentions another solution to this problem: to declare a type of Booleans and to set up the entire system of predicate logic for this new type BOOL, rather than at the level of propositions.
It’s like how how predicate logic is
formalised <a href="/2022/07/13/Isabelle_influences.html">in Isabelle</a>:
separately from the logical framework.
This solution allows PROP and TYPE to be identified,
only then propositions actually have type BOOL.</p>
<p>A more compelling reason to distinguish PROP from TYPE
is <em>irrelevance of proofs</em>:</p>
<blockquote>
<p>If $x$ is a real number, then $P(x)$ stands for “proof of $x > 0$”. Now we define “$\log$” (the logarithm) in the context [x : real] [y : P(x)],and if we want to talk about $\log 3$ we have to write $\log(3,p)$, where $p$ is some proof for $3 > 0$. Now the $p$ is relevant, and we have some trouble in saying that $\log(3,p)$ does not depend on $p$. … Some time and some annoyance can be saved if we extend the language by proclaiming that proofs of one and the same proposition are always definitionally equal.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
</blockquote>
<p>As de Bruijn and others comment, irrelevance of proofs is
mainly pertinent to classical reasoning. For constructivists, it
utterly destroys Heyting’s conception of intuitionistic truth.
But even proof assistants that are mostly used constructively, such as Agda and Coq, provide
<em>definitionally proof-irrelevant propositions</em>.</p>
<h3 id="intuitionistic-predicate-logic-continued">Intuitionistic predicate logic, continued</h3>
<p>Other logical connectives are easily represented by types.
First, the intuitionistic interpretation:</p>
<ul>
<li>a construction of $A\land B$ consists of a construction of $A$ paired with a construction of $B$</li>
<li>a construction of $\exists x. B(x)$ consists of a specific witnessing value $a$, paired with a construction of $B(a)$.</li>
<li>a construction of $A\lor B$ consists of a construction of $A$ or a construction of $B$ <em>along with an indication of which</em>. (So, we don’t have $A\lor\neg A$ when we don’t know which one holds.)</li>
</ul>
<p>The first two cases are handled by type $(\Sigma x:A) B(x)$,
which consists of pairs $\langle a,b \rangle$ where $a:A$ and $b:B(a)$, generalising the binary Cartesian product. The third case
is handled by type $A+B$, the binary disjoint sum.
The most faithful realisation of this scheme is
<a href="https://lawrencecpaulson.github.io/tag/Martin-Löf_type_theory">Martin-Löf type theory</a>.</p>
<p>As soon as we impose irrelevance of proofs, this beautiful scheme falls apart. The point of the intuitionist interpretation is to capture the structure of the constructions;
with irrelevance, all constructions are identical and even $A+B$ can have at most one element.</p>
<p>Proof assistants do not actually use propositions as types
for the same reason that functional programming languages do not
actually use the λ-calculus: because something that is beautiful in theory need not have any practical value whatever.
It is still possible to take inspiration from the theory.</p>
<h3 id="postscript">Postscript</h3>
<p>Two conclusions:</p>
<ol>
<li>You can have propositions as types without dependent types, but only for propositional logic.</li>
<li>You can have dependent types without propositions as types.</li>
</ol>
<p>And maybe a third: propositions as types can render type checking undecidable
unless you adopt a strict system of type uniqueness,
but then you can no longer infer $p(y)$ from $p(x)$ and $x=y$.
A decent notion of proposition ought to respect the substitution of equals for equals.</p>
<p>Phil Wadler has written a hagiographic but still useful
<a href="https://homepages.inf.ed.ac.uk/wadler/papers/propositions-as-types/propositions-as-types.pdf">article</a>
about the principle. See in particular the appendix
for its informative discussion with William Howard,
whose name is attached to the principle.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>NG de Bruijn, <a href="https://pure.tue.nl/ws/files/1892191/597622.pdf">A Survey of the Project Automath</a>, in: Seldin, J.P. and Hindley, J.R.,eds., To H.B. Curry: Esaays on Combinatory Logic, Lambda Calculus and Formalism (Academic Press, 1980), 152. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Ibid, p. 159. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 23 Aug 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/08/23/Propositions_as_Types.html
https://lawrencecpaulson.github.io//2023/08/23/Propositions_as_Types.htmlWhen is a computer proof a proof?<p>In 1976, Appel and Haken caused delight mixed with consternation by proving
the celebrated four colour theorem, but with heavy reliance on a computer calculation.
An assembly language program was used to check 1936 “reducible configurations”;
people were rightly concerned about errors in the code.
However, the Appel–Haken proof also required “the investigation by hand
about ten thousand neighbourhoods of countries with positive charge”,<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>
much of which was done by Appel’s 13-year-old daughter,
and nobody seemed bothered about the possibility of errors there.
Computer algebra systems were emerging around that time and would eventually
become widely used in many branches of science and engineering,
although I can’t imagine a computer algebra calculation being uncritically accepted
by any mathematics journal.
Today, I and others hope to see mathematicians using proof assistants in their work.
Whether machine proofs will be accepted as valid requires serious thought
about the nature of mathematical proof.
We have come a long way since 1976, but many mathematicians still distrust computers,
and many struggle to agree on the computer’s precise role.</p>
<h3 id="the-idea-of-mathematical-certainty">The idea of mathematical certainty</h3>
<p>In a <a href="/2022/07/27/Truth_Models.html">previous post</a>, I’ve discussed the difference
between scientific knowledge, which sometimes needs to be revised
in the light of new evidence, as opposed to with mathematical truth,
which is not evidence-based. I also mentioned
the work of <a href="https://plato.stanford.edu/entries/lakatos/">Imre Lakatos</a>, a philosopher
who studied a particular theorem
(<a href="https://www.ams.org/publicoutreach/feature-column/fcarc-eulers-formula">Euler’s polyhedron formula</a>) for which counterexamples
were actually discovered. Lakatos’ discussion is largely focused on
strategies for dealing with counterexamples, e.g.</p>
<ul>
<li><em>monster-barring</em>: modifying your definition to exclude the counterexamples, regardless of how ad hoc it looks</li>
<li><em>exception-barring</em>: retreating to an overly conservative but completely safe definition</li>
</ul>
<p>And I have to remark, when you have one theorem and one definition that you are allowed to change at will, it looks like cheating. More commonly, errors are found in proofs (rather than in definitions) but can be fixed, with the theorem statement at worst marginally affected.</p>
<p>Perhaps we still have in mind a group of students watching Archimedes draw circles
in the sand and all agreeing that his proof is valid.
We don’t have that immediacy any more.
Today, doing mathematics necessarily requires trusting
tens of thousands of pages of other people’s work.
So, how about trusting some software?</p>
<h3 id="some-fundamental-desiderata-for-proofs">Some fundamental desiderata for proofs</h3>
<p>A <a href="https://www.degruyter.com/document/doi/10.1515/krt-2022-0015/html">recent paper</a><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
discusses some widely accepted characteristics of mathematical proofs:</p>
<ol>
<li>Proofs are convincing.</li>
<li>Proofs are surveyable.</li>
<li>Proofs are formalisable.</li>
</ol>
<p>I like these because, ipso facto, any proof conducted using a proof assistant
is not merely formalisable but has literally just been formalised,
both in the sense of being expressed in some sort of high-level logical language
and in the sense of having been reduced to primitive logical inferences.
Not all formal proofs are surveyable but some certainly are,
with both Mizar and Isabelle/HOL’s Isar language specifically designed for legibility.
As for convincing: there can be no handwaving in a machine proof.
As a rule, machines are much harder to convince than a knowledgeable mathematician,
and machine proof contains not gaps but rather excessive detail.</p>
<p>Incidentally, and remarkably, the paper explains “convincing” as meaning
“convincing for mathematicians” and mere acceptance by the mathematical community
is sufficient (even if they cannot survey the proof itself).
It sounds a bit like “justification by faith”: all we need is for mathematicians to believe
in proof assistants!</p>
<p>In short, formal proofs, provided that they are written to be legible, easily satisfy
all three criteria. Unfortunately, the great majority of formal proofs are not legible.
But as I have shown by <a href="https://lawrencecpaulson.github.io/tag/examples">numerous examples</a>,
they can be.</p>
<h3 id="what-do-we-do-about-super-long-proofs">What do we do about super-long proofs?</h3>
<p>The Internet tells me that the millionth digit of Pi is 1, and I’m certain
there is no simple proof of that
(though similar claims would be trivial for rational numbers).
Similarly, <a href="https://en.wikipedia.org/wiki/Largest_known_prime_number">Wikipedia claims</a></p>
<blockquote>
<p>The largest known prime number (as of June 2023) is $2^{82,589,933} − 1$, a number which has 24,862,048 digits.</p>
</blockquote>
<p>To assert that this number is prime is essentially no different from asserting that
2 is prime, but the proof is rather longer.
It relies on the theory of Mersenne primes
and on an enormous computer calculation.
Any fact obtained by a calculation is a mathematical statement and it seems clear
that the vast majority of these do not have short proofs.
Fortunately, few of these claims have weighty implications, so it’s okay
if we can’t survey their proofs.</p>
<p>But not every fact obtained by a monster computation is mathematically trivial.</p>
<p>The <a href="https://www.cs.utexas.edu/~marijn/ptn/">largest ever proof</a>
is that of the Boolean Pythagorean Triples problem, which weighs in at 200 terabytes.
(Read the <a href="https://arxiv.org/abs/1605.00723">gory details</a>.)
This proof was generated by a SAT solver, a piece of software capable of
finding a model of a set of assertions written in Boolean logic,
or if no such model exists, proving that fact.
It is without doubt a mathematical proof, one that is overwhelmingly too large
for a human being to survey.
At most we can survey tiny but arbitrary parts of it,
which may be a means of establishing confidence in its correctness.</p>
<p>My favourite example though is the non-existence of a
<a href="/papers/Lam_finite_Proj_plane_order_10.pdf">projective plane of order 10</a>.
When I was an undergraduate at Caltech, Prof Herbert Ryser stressed the importance of
settling this question. What is strange is that such a plane could be represented
by an incidence matrix of zeros and ones, 111×111. A huge but finite search.
Such a search was carried out and the question settled negatively in 1989,
too late for Ryser, who had died in 1985.
The result has major implications for combinatorics despite the absence
of a surveyable proof.
It was <a href="https://arxiv.org/abs/2012.04715">confirmed recently</a> with the help of a SAT solver,
and therefore has been proved in logic, even though the proof is colossal.</p>
<p>People are legitimately uneasy about being wholly dependent on
a piece of software. This cannot be compared to astronomers using a powerful telescope
to observe stars far too faint for the human eye, because observational error
has always been an inherent part of all empirical science.
Mathematics is supposed to be different.
We have to take this proof at least somewhat on faith, and yet the theorem statement
cannot be dismissed as trivial.</p>
<p>It seems we are forced to be pragmatic and accept the amplification
of human reasoning by machine. This does not mean that mathematics has become empirical:
extensive numerical calculations suggest that the <a href="https://en.wikipedia.org/wiki/Riemann_hypothesis">Riemann hypothesis</a> is true,
but absolutely nobody accepts that as a proof.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Robin Wilson, <em>Four Colours Suffice: How the Problem Was Solved</em> (Princeton, 2002) <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Parshina, Katia. “Philosophical assumptions behind the rejection of computer-based proofs” <em>KRITERION – Journal of Philosophy</em>, 2023 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 09 Aug 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/08/09/computer_proof.html
https://lawrencecpaulson.github.io//2023/08/09/computer_proof.htmlHao Wang on the formalisation of mathematics<p>Since I have already devoted a blog post to a
<a href="/2023/04/12/Wittgenstein.html">wildly overrated philosopher</a>
who barely understood logic, it’s time to pay tribute
to an underrated philosopher who wrote thoughtfully and presciently
on the formalisation of mathematics:
<a href="https://en.wikipedia.org/wiki/Hao_Wang_(academic)">Hao Wang</a>.
Wang must be seen primarily as a philosopher in the traditional sense,
who wrote essays and books, but he also wrote code.
He is the creator of the first truly powerful automatic theorem prover,
using what we now know as a <a href="https://en.wikipedia.org/wiki/Method_of_analytic_tableaux">tableau method</a>.
Indeed, most of today’s technology for automated deduction has its origins
in the work of philosophers — we can add <a href="https://en.wikipedia.org/wiki/Hilary_Putnam">Hilary Putnam</a> and <a href="https://en.wikipedia.org/wiki/John_Alan_Robinson">J Alan Robinson</a> –
who decided to write more than just words.</p>
<h3 id="wang-on-formalisation">Wang on formalisation</h3>
<p>Here is Hao Wang, writing in 1955 (!):<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
<blockquote>
<p>We are led to believe that there is a fairly simple axiom system from which it is possible to derive almost all mathematical theorems and truths mechanically. This is at present merely a theoretical possibility, for no serious attempts seem to have been made to prove, for instance, all the theorems, of an elementary textbook of calculus.</p>
</blockquote>
<p>Of course, we are now well past that stage.</p>
<blockquote>
<p>Nevertheless, we seem to get a feeling of grandeur from the realization that a simple axiom system which we can quite easily memorize by heart embodies, in a sense, practically all the mathematical truths. It is not very hard to get to know the axiom system so well that people would say you understood the system.</p>
</blockquote>
<p>He was doubtless thinking of Zermelo Fraenkel set theory. Modern type theories are possibly a little too difficult for most people to memorise.</p>
<blockquote>
<p>Unfortunately just to be able thus to understand the system neither gives you very deep insight into the nature of mathematics nor makes you a very good mathematician.</p>
</blockquote>
<p>Very true. But we find ourselves asking again, how was he thinking about this back in 1955?
And his points are absolutely topical today. He considers what level of formality
is best for communication: an intuitive abstract, a more detailed exposition, and so on
leading ultimately to a formalisation in the style of Russell and Whitehead.
It’s clear that the increasing precision is not always beneficial!
But he continues</p>
<blockquote>
<p>To put thoughts on physics into mathematical symbols is one way of formalization. Through accumulation and tradition this way of formalization has also become a powerful way of communication: for those who understand the language, a short formula may express more precisely thought which could only be explained by many pages of ordinary words, and much less satisfactorily.</p>
</blockquote>
<p>The use of symbolism can be powerful and effective, but it needs to be done right.</p>
<p>Of course, he wrote much more. He described formalisation as a continuing process
dating back to Euclid, whose famous axiomatic system provided a common framework
for many schools of ancient Greek mathematics.
It continued with the 19th-century “arithmetisation of analysis”:</p>
<blockquote>
<p>There is the long story of how Lagrange, Cauchy, Weierstrass, and others strove to formalize exactly the basic notions of limits, continuity, derivatives, etc., providing thereby rigorous (though not necessarily reliable) foundations for mathematical analysis.</p>
</blockquote>
<p>And this continued into the 20th century with the origins of set theory, the paradoxes
and the promulgation of the seemingly consistent Zermelo-Fraenkel axioms.</p>
<p>Wang’s paper continues with insightful and relatively informal observations on many aspects
of formalisation and precision that are still relevant today.
He’s written a more technical paper:<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
covering similar themes, but which goes on to sketch an actual axiomatic framework
specifically designed for formalisation.</p>
<p>It begins in his usual lively style:</p>
<blockquote>
<p>Zest for both system and objectivity is the formal logician’s original sin. He pays for it by constant frustrations and by living ofttimes the life of an intellectual outcaste. The task of squeezing a large body of stubborn facts into a more or less rigid system can be a painful one, especially since the facts of mathematics are among the most stubborn of all facts. Moreover, the more general and abstract we get, the farther removed we are from the raw mathematical experience. As intuition ceases to operate effectively, we fall into many unexpected traps.</p>
</blockquote>
<p>And continues with remarks that are historically informed while at the same time intriguing:</p>
<blockquote>
<p>A field has often to be developed very thoroughly before it is ripe for a systematic and rigorous organization. The history of the calculus illustrates this point clearly: founded in the seventeenth century, rapidly expanded in the eighteenth, the calculus got acceptable foundations only in the nineteenth century and even today logicians generally have misgivings on the matter.</p>
</blockquote>
<p>This foundational development, Wang argues, also required a coherent treatment of set theory.
He notes how claims about “arbitrary curves” or “arbitrary functions” required elucidation.</p>
<blockquote>
<p>The problem is that of representing functions by trigonometric series which interested many a mathematician when Cantor began his research career around 1870. In trying to extend the uniqueness of representation to certain functions with infinitely many singular points, he was led to the notion of a derived set which not only marked the beginning of his study of the theory of point sets but led him later on to the construction of transfinite ordinal numbers.
Such historical facts ought to help combat the erroneous impression that Cantor invented, by one stroke of genius, a whole theory of sets which was entirely isolated from the main stream of mathematics at his time.</p>
</blockquote>
<p>The article goes on to discuss various aspects of Cantor’s set theory
and then start to speculate on how the formalisation of mathematics might be undertaken.
From today’s perspective, his focus looks excessively set-theoretic, with only hints
of a phenomenon that might have influenced today’s type theories:</p>
<blockquote>
<p>There are… numerous attempts to construct artificial systems which are both demonstrably consistent and also adequate to the development of the “principal” (or “useful”) parts of mathematics. Most of these systems modify our basic logical principles such as the law of excluded middle and the principle of extensionality (for sets), and it is not easy to become familiar with them. So far as I know, none of these has been accepted widely.</p>
</blockquote>
<p>Wang then embarks on the development of a system he calls Σ, which he advocates as a basis
for formalising mathematics. He calls it “constructive” but this appears to be in the sense
of avoiding <a href="https://en.wikipedia.org/wiki/Impredicativity">impredicative definitions</a>
rather than banishing the law of excluded middle.
So it is a ramified type theory in the sense of Principia Mathematica.
I confess to having skipped the sequel at this point;
such a formalism is not attractive today.</p>
<p>Nevertheless, I am amazed at how Wang could see so far ahead, writing in the 1950s.
He wrote with fluency, clarity and wit, although his native language was Chinese.
Such a contrast with
<a href="/2023/04/12/Wittgenstein.html">that other person</a>,
whose gnomic writings on logic offer no insights and don’t appear to be informed
by any sort of reflection, knowledge, or even familiarity with the basic concepts of logic.</p>
<h3 id="wang-writing-actual-code">Wang writing actual code</h3>
<p>Wang appears to have been a true polymath, with his knowledge of the
history and philosophy of mathematics,
technical details of set theory (on which he has written at length)
while being at the same time a coder.
Programming an <a href="https://en.wikipedia.org/wiki/IBM_704">IBM 704</a> could not have been easy.
Fortran had been invented by 1960, but Wang does not mention it.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup>
He almost certainly wrote in assembly language.</p>
<p>Because I have mentioned this work in previous posts
(<a href="/2022/02/02/Formalising_Math_Set_theory.html">this one</a>
and <a href="/2023/01/11/AI_at_Stanford.html">that one</a>),
I will resist the temptation to repeat myself and instead refer you to
Martin Davis’s <a href="/papers/Early-History-of-ATP.pdf">chapter</a>
in the <em>Handbook of Automated Reasoning</em>
entitled “The Early History of Automated Deduction”, which is dedicated to Hao Wang.
There you will find many more technical details about Wang’s implementation of
Gentzen-style proof calculi and decision procedures for fragments of first-order logic.</p>
<p>In this post, I’ve only been able to touch on a few of Wang’s many contributions.
I can’t go into his discussions with Gödel and his communication of Gödel’s
philosophical views, for example. Fortunately, much can be found on-line and elsewhere.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Hao Wang. <a href="https://www.jstor.org/stable/2251469">On Formalization</a>. <em>Mind</em> <strong>64</strong> (1955), 226–238 (also <a href="/papers/Wang-Formalisation.pdf">here</a>) <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Hao Wang. <a href="https://doi.org/10.2307/2267732">The Formalization of Mathematics</a>. <em>Journal of Symbolic Logic</em> <strong>19</strong> (1954), 241–266 (also <a href="/papers/Wang-Orginal-Sin.pdf">here</a>) <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Hao Wang. <a href="https://doi.org/10.1147/rd.41.0002">Toward Mechanical Mathematics</a>. <em>IBM Journal of Research and Development</em> <strong>4</strong>:1 (1960), 15. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 26 Jul 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/07/26/Wang.html
https://lawrencecpaulson.github.io//2023/07/26/Wang.html