Machine Logic is Lawrence Paulson's blog on Isabelle/HOL and related topics.
https://lawrencecpaulson.github.io/
Probabilistic reasoning and formal proof<p>Many are the pitfalls awaiting anybody trying to formalise a proof,
but the worst are appeals to intuition.
These are typically a sign that the author can’t be bothered to outline
a calculation or argument. Perhaps the claim is obvious
(to them, not to you).
Probabilistic claims, say about drawing coloured balls from a sack, may look particularly dubious.
But as <a href="https://www.scientificamerican.com/article/this-nomadic-eccentric-was-the-most-prolific-mathematician-in-history/">Paul Erdős</a> has shown, such arguments can yield short, intuitive proofs that are absolutely rigorous.
To formalise them simply requires a bit of measure theory boilerplate.</p>
<h3 id="a-simple-example">A simple example</h3>
<p>Let’s consider an example from the website <a href="https://www.cut-the-knot.org/Probability/ProbabilisticMethod.shtml">Cut the Knot</a>,
created by Alexander Bogomolny.
He credits the example to a 1963 paper by Erdős
(I could not work out which one; your hint would be welcome).
It goes as follows:</p>
<blockquote>
<p>Let $A_k$, for $k = 1, \ldots, m$,
be a family of $n$-element subsets of a set $X$. If $m < 2^{n-1}$,
then there exists a bichromatic colouring<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> of $X$ such that no $A_k$ is monochromatic.</p>
</blockquote>
<p>And here’s the proof, as presented by Bogomolny:</p>
<blockquote>
<p>Let $\cal F$ be a collection of $n$-sets (sets with exactly $n$ elements), and assume that $\vert\cal F\vert = m < 2^{n-1}$. Colour $X$ randomly with two colours, all colourings being equally likely. For $A\in \cal F$ let $E_A$ be the event that $A$ is monochromatic. Since there are two such colourings and $|A| = n$, probability $P(E_A)$ of the event $E_A$ is given by $P(E_A) = 2\times 2^{-n} = 2^{1-n}$.</p>
<p>Since the events $E_A$ are not necessarily disjoint, $P(\bigcup_{A\in\cal F} E_A) \le \sum_{A\in\cal F} P(E_A) = m\times2^{1-n} < 1$.</p>
<p>So the probability that at least one $A\in \cal F$ is monochromatic is less than 1. Thus there must be a bichromatic colouring of $X$ with no monochromatic $A\in\cal F$. QED.</p>
</blockquote>
<p>This example is clearly a simplified version
of <a href="https://theoremoftheweek.wordpress.com/2010/05/02/theorem-25-erdoss-lower-bound-for-the-ramsey-numbers/">Erdős’s celebrated proof</a> of a lower bound for Ramsey numbers,
which is often claimed to be the first application of the probabilistic method.
Note that the existence claim is nonconstructive:
we have shown that the probability of a certain outcome is less than one.
So the opposite outcome has nonzero probability
and therefore forms a non-empty set.</p>
<h3 id="formalising-the-probability-space">Formalising the probability space</h3>
<p>The theorem statement assumes the family $\cal F$ of $n$-sets
of the finite set $X$. The family has cardinality
$\vert \cal F \vert = m<2^{n-1}$.
Necessary is the constraint $0<n\le\vert X\vert$,
omitted from the problem statement.
As for the conclusion, the required 2-colouring is expressed
as a function from $X$ to the set $\{0,1\}$.
The <em>extensional</em> function space
<span class="keyword1">→<span class="hidden">⇩</span><sub>E</sub></span>
is required: by constraining the functions outside their domain ($X$)
to some arbitrary fixed value,
this operator accurately represents the set $X\to\{0,1\}$.
It’s vital because we are actually counting these functions
and their values outside $X$ are irrelevant.</p>
<pre class="source">
<span class="keyword1 command">theorem</span> Erdos_1963<span class="main">:</span>
<span class="keyword2 keyword">assumes</span> X<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">𝓕</span> <span class="main">⊆</span></span> nsets</span> <span class="free">X</span> <span class="free">n</span><span>"</span> <span class="quoted"><span class="quoted"><span>"</span>finite</span> <span class="free">X</span><span>"</span></span>
<span class="keyword2 keyword">and</span> <span class="quoted"><span class="quoted"><span>"</span>card</span> <span class="free">𝓕</span> <span class="main">=</span></span> <span class="free">m</span><span>"</span> <span class="keyword2 keyword">and</span> m<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">m</span> <span class="main"><</span></span> <span class="numeral">2</span><span class="main">^</span></span><span class="main">(</span><span class="free">n</span><span class="main">-</span><span class="main">1</span><span class="main">)</span><span>"</span> <span class="keyword2 keyword">and</span> n<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">0</span></span> <span class="main"><</span></span> <span class="free">n</span><span>"</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">n</span> <span class="main">≤</span></span> card</span> <span class="free">X</span><span>"</span>
<span class="keyword2 keyword">obtains</span> <span class="free">f</span><span class="main">::</span><span class="quoted"><span class="quoted"><span>"</span><span class="tfree">'a</span><span class="main">⇒</span>nat</span><span>"</span></span> <span class="keyword2 keyword">where</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">f</span> <span class="main">∈</span></span> <span class="free">X</span> <span class="keyword1">→<span class="hidden">⇩</span><sub>E</sub></span></span> <span class="main">{..<</span><span class="numeral">2</span><span class="main">}</span><span>"</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">⋀</span><span class="bound">F</span> <span class="bound">c</span><span class="main">.</span> <span class="main">⟦</span><span class="bound">F</span> <span class="main">∈</span></span> <span class="free">𝓕</span><span class="main">;</span> <span class="bound">c</span><span class="main"><</span></span><span class="numeral">2</span><span class="main">⟧</span> <span class="main">⟹</span> <span class="main">¬</span> <span class="free">f</span> <span class="main">`</span> <span class="bound">F</span> <span class="main">⊆</span> <span class="main">{</span><span class="bound">c</span><span class="main">}</span><span>"</span>
<span class="keyword1 command">proof</span> <span class="operator">-</span>
</pre>
<p>Now we have to set up the probabilities.
Our “colours” are actually 0 and 1, constrained to type <code class="language-plaintext highlighter-rouge">nat</code>.
The <em>sample space</em> $\Omega$ is the set of all 2-colourings of $X$.
Then the <em>probability space</em> $M$ is the corresponding measure space
where all colourings have the same probability.
A non-uniform probability distribution would be a little more work,
e.g. we’d have to show that the probabilities summed to 1.</p>
<pre class="source">
<span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>finite</span> <span class="free">𝓕</span><span>"</span></span>
<span class="keyword1 command">using</span> X finite_imp_finite_nsets finite_subset <span class="keyword1 command">by</span> <span class="operator">blast</span>
<span class="keyword1 command">let</span> <span class="var quoted var">?two</span> <span class="main">=</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">{..<</span></span><span class="numeral">2</span><span class="main">::</span>nat</span><span class="main">}</span><span>"</span>
<span class="keyword3 command">define</span> <span class="skolem skolem">Ω</span> <span class="keyword2 keyword">where</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">Ω</span> <span class="main">≡</span> <span class="free">X</span> <span class="keyword1">→<span class="hidden">⇩</span><sub>E</sub></span></span> <span class="var">?two</span><span>"</span></span>
<span class="keyword3 command">define</span> <span class="skolem skolem">M</span> <span class="keyword2 keyword">where</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">M</span> <span class="main">≡</span> uniform_count_measure</span> <span class="skolem">Ω</span><span>"</span></span>
</pre>
<p>Next comes some boilerplate relating $\Omega$ and $M$,
allowing the interpretation of the <code class="language-plaintext highlighter-rouge">prob_space</code> locale.
Isabelle/HOL’s tools of probability reasoning are now at our disposal.</p>
<pre class="source">
<span class="keyword1 command">have</span> space_eq<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>space</span> <span class="skolem">M</span> <span class="main">=</span></span> <span class="skolem">Ω</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> M_def space_uniform_count_measure<span class="main">)</span>
<span class="keyword1 command">have</span> sets_eq<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>sets</span> <span class="skolem">M</span> <span class="main">=</span></span> Pow <span class="skolem">Ω</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> M_def sets_uniform_count_measure<span class="main">)</span>
<span class="keyword1 command">have</span> cardΩ<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>card</span> <span class="skolem">Ω</span> <span class="main">=</span></span> <span class="numeral">2</span> <span class="main">^</span> card <span class="free">X</span><span>"</span>
<span class="keyword1 command">using</span> <span class="quoted"><span class="quoted"><span>‹</span>finite</span> <span class="free">X</span><span>›</span></span> <span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> Ω_def card_funcsetE<span class="main">)</span>
<span class="keyword1 command">have</span> Ω<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>finite</span> <span class="skolem">Ω</span><span>"</span></span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">Ω</span> <span class="main">≠</span></span> <span class="main">{}</span></span><span>"</span>
<span class="keyword1 command">using</span> cardΩ less_irrefl <span class="keyword1 command">by</span> <span class="operator">fastforce</span><span class="main keyword3">+</span>
<span class="keyword1 command">interpret</span> P<span class="main">:</span> prob_space <span class="quoted skolem">M</span>
<span class="keyword1 command">unfolding</span> M_def <span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">intro</span> prob_space_uniform_count_measure Ω<span class="main">)</span>
</pre>
<p>The idea of a colouring being monochromatic on a set is easily expressed in terms of set image.
For any given colour $c$ and set $F$,
the set of monochromatic maps is an <em>event</em> of the probability space.</p>
<pre class="source">
<span class="keyword3 command">define</span> <span class="skolem skolem">mchrome</span> <span class="keyword2 keyword">where</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">mchrome</span> <span class="main">≡</span> <span class="main">λ</span><span class="bound">c</span> <span class="bound">F</span><span class="main">.</span> <span class="main">{</span><span class="bound bound">f</span> <span class="main">∈</span> <span class="skolem">Ω</span><span class="main">.</span> <span class="bound">f</span> <span class="main">`</span></span> <span class="bound">F</span> <span class="main">⊆</span></span> <span class="main">{</span><span class="bound">c</span><span class="main">}</span><span class="main">}</span><span>"</span>
<span class="keyword1 command">have</span> mchrome<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">mchrome</span> <span class="skolem">c</span> <span class="skolem">F</span> <span class="main">∈</span></span> P.events</span><span>"</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">mchrome</span> <span class="skolem">c</span> <span class="skolem">F</span> <span class="main">⊆</span></span> <span class="skolem">Ω</span><span>"</span></span> <span class="keyword2 keyword">for</span> <span class="skolem">F</span> <span class="skolem">c</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> sets_eq mchrome_def Ω_def<span class="main">)</span>
</pre>
<h3 id="the-probability-that-a-map-is-monochrome-on-some-fincal-f">The probability that a map is monochrome on some $F\in\cal F$</h3>
<p>Given $F\in\cal F$ and any fixed colour $c$,
the number of maps monochrome on $\cal F$ (coloured $c$)
is $2^{\vert X\vert-n}$.
That’s because each element of $X$ not in $F$ could
be given either colour.
The proof defines a bijection between colourings mapping
the whole of $F$ to $c$ and those that don’t colour $F$ at all.
This sort of calculation can get quite a bit more complicated
when the probability distribution is nonuniform.</p>
<pre class="source">
<span class="keyword1 command">have</span> card_mchrome<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>card</span> <span class="main">(</span><span class="skolem">mchrome</span> <span class="skolem">c</span> <span class="skolem">F</span><span class="main">)</span> <span class="main">=</span></span> <span class="numeral">2</span> <span class="main">^</span> <span class="main">(</span>card <span class="free">X</span> <span class="main">-</span> <span class="free">n</span><span class="main">)</span><span>"</span> <span class="keyword2 keyword">if</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">F</span> <span class="main">∈</span></span> <span class="free">𝓕</span><span>"</span></span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">c</span><span class="main"><</span></span><span class="numeral">2</span><span>"</span></span> <span class="keyword2 keyword">for</span> <span class="skolem">F</span> <span class="skolem">c</span>
<span class="keyword1 command">proof</span> <span class="operator">-</span>
<span class="keyword1 command">have</span> F<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>finite</span> <span class="skolem">F</span><span>"</span></span> <span class="quoted"><span class="quoted"><span>"</span>card</span> <span class="skolem">F</span> <span class="main">=</span></span> <span class="free">n</span><span>" <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">F</span> <span class="main">⊆</span></span> <span class="free">X</span><span>"</span></span>
<span class="keyword1 command">using</span> assms that <span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> nsets_def<span class="main">)</span>
<span class="keyword1 command">with</span> F <span class="quoted"><span class="quoted"><span>‹</span>finite</span> <span class="free">X</span><span>›</span></span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>card</span> <span class="main">(</span><span class="free">X</span><span class="main">-</span></span><span class="skolem">F</span></span> <span class="keyword1">→<span class="hidden">⇩</span><sub>E</sub></span> <span class="var">?two</span><span class="main">)</span> <span class="main">=</span> <span class="numeral">2</span> <span class="main">^</span> <span class="main">(</span>card <span class="free">X</span> <span class="main">-</span> <span class="free">n</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> card_funcsetE card_Diff_subset<span class="main">)</span>
<span class="keyword1 command">moreover</span>
<span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>bij_betw</span> <span class="main">(</span><span class="main">λ</span><span class="bound">f</span><span class="main">.</span> restrict</span> <span class="bound">f</span> <span class="main">(</span><span class="free">X</span><span class="main">-</span><span class="skolem">F</span><span class="main">)</span><span class="main">)</span> <span class="main">(</span><span class="skolem">mchrome</span> <span class="skolem">c</span> <span class="skolem">F</span><span class="main">)</span> <span class="main">(</span><span class="free">X</span><span class="main">-</span><span class="skolem">F</span> <span class="keyword1">→<span class="hidden">⇩</span><sub>E</sub></span> <span class="var">?two</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">proof</span> <span class="main">(</span><span class="operator">intro</span> bij_betwI<span class="main">)</span>
<span class="keyword3 command">show</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">(</span><span class="main">λ</span><span class="bound">g</span> <span class="bound">x</span><span class="main">.</span> <span class="keyword1">if</span></span> <span class="bound">x</span><span class="main">∈</span></span><span class="skolem">F</span> <span class="keyword1">then</span> <span class="skolem">c</span> <span class="keyword1">else</span> <span class="bound">g</span> <span class="bound">x</span><span class="main">)</span> <span class="main">∈</span> <span class="main">(</span><span class="free">X</span><span class="main">-</span><span class="skolem">F</span> <span class="keyword1">→<span class="hidden">⇩</span><sub>E</sub></span> <span class="var">?two</span><span class="main">)</span> <span class="main">→</span> <span class="skolem">mchrome</span> <span class="skolem">c</span> <span class="skolem">F</span><span>"</span>
<span class="keyword1 command">using</span> that <span class="quoted"><span class="quoted"><span>‹</span><span class="skolem">F</span> <span class="main">⊆</span></span> <span class="free">X</span><span>›</span></span> <span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> mchrome_def Ω_def<span class="main">)</span>
<span class="keyword1 command">qed</span> <span class="main">(</span><span class="operator">fastforce</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> mchrome_def Ω_def<span class="main">)</span><span class="main keyword3">+</span>
<span class="keyword1 command">ultimately</span> <span class="keyword3 command">show</span> <span class="var quoted var">?thesis</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">metis</span> bij_betw_same_card<span class="main">)</span>
<span class="keyword1 command">qed</span>
</pre>
<p>The probability calculation is simply $2^{\vert X\vert-n} / 2^{\vert X\vert} = 1 / 2^n$.</p>
<pre class="source">
have prob_mchrome<span class="main">:</span> <span class="quoted quoted"><span>"</span>P.prob</span> <span class="main">(</span><span class="skolem">mchrome</span> <span class="skolem">c</span> <span class="skolem">F</span><span class="main">)</span> <span class="main">=</span> <span class="main">1</span> <span class="main">/</span> <span class="numeral">2</span><span class="main">^</span><span class="free">n</span><span>"</span>
<span class="keyword2 keyword">if</span> <span class="quoted quoted"><span>"</span><span class="skolem">F</span> <span class="main">∈</span></span> <span class="free">ℱ</span><span>"</span> <span class="quoted quoted"><span>"</span><span class="skolem">c</span><span class="main"><</span></span><span class="numeral">2</span><span>"</span> <span class="keyword2 keyword">for</span> <span class="skolem">F</span> <span class="skolem">c</span>
<span class="keyword1 command">proof</span> <span class="operator">-</span>
<span class="keyword1 command">have</span> emeasure_eq<span class="main">:</span> <span class="quoted quoted"><span>"</span>emeasure</span> <span class="skolem">M</span> <span class="skolem">U</span> <span class="main">=</span> <span class="main">(</span><span class="keyword1">if</span> <span class="skolem">U</span><span class="main">⊆</span><span class="skolem">Ω</span> <span class="keyword1">then</span> ennreal<span class="main">(</span>card <span class="skolem">U</span> <span class="main">/</span> card <span class="skolem">Ω</span><span class="main">)</span> <span class="keyword1">else</span> <span class="main">0</span><span class="main">)</span><span>"</span> <span class="keyword2 keyword">for</span> <span class="skolem">U</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> M_def emeasure_uniform_count_measure_if <span class="quoted quoted"><span>‹</span>finite</span> <span class="skolem">Ω</span><span>›</span><span class="main">)</span>
<span class="keyword1 command">have</span> <span class="quoted quoted"><span>"</span>emeasure</span> <span class="skolem">M</span> <span class="main">(</span><span class="skolem">mchrome</span> <span class="skolem">c</span> <span class="skolem">F</span><span class="main">)</span> <span class="main">=</span> ennreal <span class="main">(</span><span class="numeral">2</span> <span class="main">^</span> <span class="main">(</span>card <span class="free">X</span> <span class="main">-</span> <span class="free">n</span><span class="main">)</span> <span class="main">/</span> card <span class="skolem">Ω</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">using</span> that mchrome <span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> emeasure_eq card_mchrome<span class="main">)</span>
<span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted quoted"><span>"</span><span class="main">…</span> <span class="main">=</span></span> ennreal <span class="main">(</span><span class="main">1</span> <span class="main">/</span> <span class="numeral">2</span><span class="main">^</span><span class="free">n</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> n power_diff cardΩ<span class="main">)</span>
<span class="keyword1 command">finally</span> <span class="keyword3 command">show</span> <span class="var quoted var">?thesis</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> P.emeasure_eq_measure<span class="main">)</span>
<span class="keyword1 command">qed</span>
</pre>
<h3 id="finishing-up-the-argument">Finishing up the argument</h3>
<p>The rest of the proof should be straightforward,
but needs to be annoyingly detailed in Isabelle.
We begin by showing that the union is a subset of $\Omega$,
and therefore an event.</p>
<pre class="source">
<span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">(</span><span class="main">⋃</span><span class="bound">F</span><span class="main">∈</span><span class="free">𝓕</span><span class="main">.</span> <span class="main">⋃</span><span class="bound">c</span><span class="main"><</span><span class="numeral">2</span><span class="main">.</span> <span class="skolem">mchrome</span> <span class="bound">c</span> <span class="bound">F</span><span class="main">)</span> <span class="main">⊆</span></span> <span class="skolem">Ω</span><span>"</span></span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> mchrome_def Ω_def<span class="main">)</span>
</pre>
<p>To show that the union is actually a <strong>strict</strong> subset
involves formalising the proof that $P(\bigcup_{A\in\cal F} E_A) < 1$.</p>
<pre class="source">
<span class="keyword1 command">moreover</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">(</span><span class="main">⋃</span><span class="bound">F</span><span class="main">∈</span><span class="free">𝓕</span><span class="main">.</span> <span class="main">⋃</span><span class="bound">c</span><span class="main"><</span><span class="numeral">2</span><span class="main">.</span> <span class="skolem">mchrome</span> <span class="bound">c</span> <span class="bound">F</span><span class="main">)</span> <span class="main">≠</span></span> <span class="skolem">Ω</span><span>"</span></span>
<span class="keyword1 command">proof</span> <span class="operator">-</span>
<span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>P.prob</span> <span class="main">(</span><span class="main">⋃</span><span class="bound">F</span><span class="main">∈</span><span class="free">𝓕</span><span class="main">.</span> <span class="main">⋃</span><span class="bound">c</span><span class="main"><</span><span class="numeral">2</span><span class="main">.</span> <span class="skolem">mchrome</span> <span class="bound">c</span> <span class="bound">F</span><span class="main">)</span> <span class="main">≤</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">F</span><span class="main">∈</span><span class="free">𝓕</span><span class="main">.</span> P.prob <span class="main">(</span><span class="main">⋃</span><span class="bound">c</span><span class="main"><</span><span class="numeral">2</span><span class="main">.</span> <span class="skolem">mchrome</span> <span class="bound">c</span> <span class="bound">F</span><span class="main">)</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">intro</span> measure_UNION_le<span class="main">)</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> countable_Un_Int mchrome <span class="quoted"><span class="quoted"><span>‹</span>finite</span> <span class="free">𝓕</span><span>›</span></span><span class="main">)</span>
<span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">≤</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">F</span><span class="main">∈</span><span class="free">𝓕</span><span class="main">.</span> <span class="main">∑</span><span class="bound">c</span><span class="main"><</span><span class="numeral">2</span><span class="main">.</span> P.prob</span> <span class="main">(</span><span class="skolem">mchrome</span> <span class="bound">c</span> <span class="bound">F</span><span class="main">)</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">intro</span> sum_mono measure_UNION_le<span class="main">)</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> mchrome<span class="main">)</span>
<span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">=</span></span> <span class="free">m</span> <span class="main">*</span></span> <span class="numeral">2</span> <span class="main">*</span> <span class="main">(</span><span class="main">1</span> <span class="main">/</span> <span class="numeral">2</span><span class="main">^</span><span class="free">n</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> prob_mchrome <span class="quoted"><span class="quoted"><span>‹</span>card</span> <span class="free">𝓕</span> <span class="main">=</span></span> <span class="free">m</span><span>›</span><span class="main">)</span>
<span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main"><</span></span> <span class="main">1</span></span><span>"</span>
<span class="keyword1 command">proof</span> <span class="operator">-</span>
<span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>real</span> <span class="main">(</span><span class="free">m</span> <span class="main">*</span></span> <span class="numeral">2</span><span class="main">)</span> <span class="main"><</span> <span class="numeral">2</span> <span class="main">^</span> <span class="free">n</span><span>"</span>
<span class="keyword1 command">using</span> mult_strict_right_mono <span class="main">[</span><span class="operator">OF</span> m<span class="main">,</span> <span class="operator">of</span> <span class="quoted numeral">2</span><span class="main">]</span> <span class="quoted"><span class="quoted"><span>‹</span><span class="free">n</span><span class="main">></span></span><span class="main">0</span></span><span>›</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">metis</span> of_nat_less_numeral_power_cancel_iff pos2 power_minus_mult<span class="main">)</span>
<span class="keyword1 command">then</span> <span class="keyword3 command">show</span> <span class="var quoted var">?thesis</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> <span class="dynamic dynamic">divide_simps</span><span class="main">)</span>
<span class="keyword1 command">qed</span>
<span class="keyword1 command">finally</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>P.prob</span> <span class="main">(</span><span class="main">⋃</span><span class="bound">F</span><span class="main">∈</span><span class="free">𝓕</span><span class="main">.</span> <span class="main">⋃</span><span class="bound">c</span><span class="main"><</span><span class="numeral">2</span><span class="main">.</span> <span class="skolem">mchrome</span> <span class="bound">c</span> <span class="bound">F</span><span class="main">)</span> <span class="main"><</span></span> <span class="main">1</span><span>"</span> <span class="keyword1 command">.</span>
<span class="keyword1 command">then</span> <span class="keyword3 command">show</span> <span class="var quoted var">?thesis</span>
<span class="keyword1 command">using</span> P.prob_space space_eq <span class="keyword1 command">by</span> <span class="operator">force</span>
<span class="keyword1 command">qed</span>
</pre>
<p>The conclusion of the theorem is now immediate.
Recall that <span class="keyword1 command">moreover</span>
accumulates prior lemmas,
which <span class="keyword1 command">ultimately</span>
makes available to the next proof.</p>
<pre class="source">
<span class="keyword1 command">ultimately</span> <span class="keyword3 command">obtain</span> <span class="skolem skolem">f</span> <span class="keyword2 keyword">where</span> f<span class="main">:</span><span class="quoted"><span class="quoted"><span>"</span><span class="skolem">f</span> <span class="main">∈</span></span> <span class="skolem">Ω</span> <span class="main">-</span></span> <span class="main">(</span><span class="main">⋃</span><span class="bound">F</span><span class="main">∈</span><span class="free">𝓕</span><span class="main">.</span> <span class="main">⋃</span><span class="bound">c</span><span class="main"><</span><span class="numeral">2</span><span class="main">.</span> <span class="skolem">mchrome</span> <span class="bound">c</span> <span class="bound">F</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="operator">blast</span>
<span class="keyword1 command">with</span> that <span class="keyword3 command">show</span> <span class="var quoted var">?thesis</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">fastforce</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> mchrome_def Ω_def<span class="main">)</span>
<span class="keyword1 command">qed</span>
</pre>
<h3 id="postscript">Postscript</h3>
<p>The probabilistic method is simply a more intuitive way of presenting
a proof by counting. The original example of such a proof
is claimed to be Erdős’s “<a href="https://www.ams.org/journals/bull/1947-53-04/S0002-9904-1947-08785-1/S0002-9904-1947-08785-1.pdf">Some remarks on the theory of graphs</a>” (1947).
This paper indeed presents a proof of a lower bound for Ramsey numbers,
but it makes no reference to probability and instead
enumerates the total number of graphs satisfying certain properties.
Presumably he published a probabilistic version of that proof
at a later date.</p>
<p>A recent <a href="/papers/Edmonds-CPP2024.pdf">paper</a> by Chelsea Edmonds
describes the formalisation of probabilistic proofs in
considerably more detail.</p>
<p>The examples for this post are online <a href="/Isabelle-Examples/Probabilistic_Example_Erdos.thy">here</a>.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>A <em>bichromatic colouring</em> of $X$ is a map taking each element of $X$ to either red or blue, and $Y\subseteq X$ is <em>monochromatic</em> if all its elements have the same colour. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 21 Aug 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/08/21/Probabilistic_Example.html
https://lawrencecpaulson.github.io//2024/08/21/Probabilistic_Example.htmlA tricky lower bound proof<p>The <a href="/2024/07/25/Numeric_types.html">previous post</a> concerned exact numerical calculations, culminating in an example of establishing – automatically –
a numerical lower bound for a simple mathematical formula.
Although automation is the key to the success of formal verification,
a numerical approach is not always good enough. In that example,
we could get three significant digits quickly, four significant digits slowly
and the exact lower bound never.
As every calculus student knows,
to locate a minimum or maximum you take the derivative
and solve for the point at which it vanishes.
The desired property can then be shown using the main value theorem.
Let’s do it!</p>
<h3 id="a-simple-problem-with-surprising-complications">A simple problem with surprising complications</h3>
<p>Our task is simply to find the minimum of the function $x\ln x$
for $x\ge0$. And the first question is whether $x\ln x$ is even <strong>defined</strong>
when $x=0$. A purist would say that it is not, because $\ln 0$ is undefined
and multiplying $\ln 0$ by $0$ does not help matters.
However, most mathematicians would say that $x\ln x$ has a <em>removable singularity</em> at zero.
That means that we can define it at the singularity so that it is continuous.
The function certainly looks continuous:</p>
<p><img src="/images/plot_x_ln_x.png" alt="graph of the function x ln x" width="400" /></p>
<p>The derivative of $x\ln x$ is $\ln x+1$, which goes to zero when
$x=1/e$. At that point, $x\ln x$ achieves its minimum, namely $-1/e$.
Proving this fact formally, for all $x\ge0$, is tricky because
that derivative is certainly undefined when $x=0$.
The way around the problem involves proving that $x\ln x$ is continuous for all $x\ge0$.</p>
<h3 id="proving-continuity">Proving continuity</h3>
<p>We prove the continuity of $x\ln x$ in separate stages: first, simply for $x=0$.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> continuous_at_0<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>continuous</span> <span class="main">(</span>at_right</span> <span class="main">0</span><span class="main">)</span> <span class="main">(</span><span class="main">λ</span><span class="bound">x</span><span class="main">::</span>real<span class="main">.</span> <span class="bound">x</span> <span class="main">*</span> ln <span class="bound">x</span><span class="main">)"
</span><span class="keyword1 command">unfolding</span> continuous_within <span class="keyword1 command">by</span> <span class="operator">real_asymp</span>
</pre>
<p>Unfolding <code class="language-plaintext highlighter-rouge">continuous_within</code> reduces the continuity claim to the limit claim
$x\ln x \longrightarrow 0\ln 0$, which of course is simply
$x\ln x\longrightarrow0$. Such limits are trivially proved by
Manuel Eberl’s wonderful <a href="http://cl-informatik.uibk.ac.at/users/meberl//pubs/real_asymp.html"><code class="language-plaintext highlighter-rouge">real_asymp</code> proof method</a>,
which will surely be the subject
of a future blogpost.
From this result, we can quickly prove continuity for all $x\ge0$.
Incorporating the zero case requires a little black magic (the topological definition of continuity), while the non-zero case is straightforward.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> continuous_nonneg<span class="main">:
</span><span class="keyword2 keyword">fixes</span> <span class="free">x</span><span class="main">::</span><span class="quoted">real
</span><span class="keyword2 keyword">assumes</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">≥</span></span> <span class="main">0</span>"
</span><span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span>continuous</span> <span class="main">(</span><span class="keyword1">at</span></span> <span class="free">x</span> <span class="keyword1">within</span> <span class="main">{</span><span class="main">0</span><span class="main">..}</span><span class="main">)</span> <span class="main">(</span><span class="main">λ</span><span class="bound">x</span><span class="main">.</span> <span class="bound">x</span> <span class="main">*</span> ln <span class="bound">x</span><span class="main">)"
</span><span class="keyword1 command">proof</span> <span class="main">(</span><span class="operator">cases</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">=</span></span> <span class="main">0</span>"</span><span class="main">)
</span><span class="keyword3 command">case</span> True <span class="keyword1 command">with</span> continuous_at_0 <span class="keyword3 command">show</span> <span class="var quoted var">?thesis
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">force</span> <span class="quasi_keyword">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> continuous_within_topological less_eq_real_def<span class="main">)
</span><span class="keyword1 command">qed</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">intro</span><span class="main main">!</span><span class="main main">:</span> <span class="dynamic dynamic">continuous_intros</span><span class="main">)</span>
</pre>
<p>This result is then repackaged in a more convenient form for later use,
replacing the “at-within” filter by the obvious closed half-interval.
It would be preferable to incorporate the two lemmas above into the body
of <code class="language-plaintext highlighter-rouge">continuous_on_x_ln</code>; I have split them up here simply to ease the presentation.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> continuous_on_x_ln<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>continuous_on</span> <span class="main">{</span></span><span class="main">0</span><span class="main">..}</span> <span class="main">(</span><span class="main">λ</span><span class="bound">x</span><span class="main">::</span>real<span class="main">.</span> <span class="bound">x</span> <span class="main">*</span> ln <span class="bound">x</span><span class="main">)"
</span><span class="keyword1 command">unfolding</span> continuous_on_eq_continuous_within
<span class="keyword1 command">using</span> continuous_nonneg <span class="keyword1 command">by</span> <span class="operator">blast</span>
</pre>
<p><em>Remark</em>: the identical proof works for the continuity of the function $x\sin(1/x)$,
but not of $x\exp(1/x)$, even though all three functions
have a singularity at $x=0$. What is the essential difference?</p>
<h3 id="proving-the-lower-bound-claim">Proving the lower bound claim</h3>
<p>The first step is to prove that the derivative of $x\ln x$
is indeed $\ln x+1$.
In Isabelle/HOL, calculating a derivative is easy.
If you know the derivative already (perhaps from a computer algebra system),
then verifying the result is even easier.
The technique, already illustrated in an <a href="/2022/02/16/Irrationals.html">earlier post</a>,
relies on Isabelle’s inbuilt Prolog-like proof calculus.
Repeated application of rules from the list <code class="language-plaintext highlighter-rouge">derivative_eq_intros</code>
constructs the derivative and sometimes can simplify it
or prove that it equals a derivative already supplied.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> xln_deriv<span class="main">:
</span><span class="keyword2 keyword">fixes</span> <span class="free">x</span><span class="main">::</span><span class="quoted">real
</span><span class="keyword2 keyword">assumes</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">></span></span> <span class="main">0</span>"
</span><span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">(</span><span class="main">(</span><span class="main">λ</span><span class="bound">u</span><span class="main">.</span> <span class="bound">u</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="bound">u</span><span class="main">)</span><span class="main">)</span> <span class="keyword1">has_real_derivative</span> ln <span class="free">x</span> <span class="main">+</span> <span class="main">1</span><span class="main">)</span> <span class="main">(</span><span class="keyword1">at</span> <span class="free">x</span><span class="main">)"
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">rule</span> <span class="dynamic dynamic">derivative_eq_intros</span> refl <span class="main keyword3">|</span> <span class="operator">use</span> assms <span class="keyword2 keyword quasi_keyword">in</span> <span class="operator">force</span><span class="main">)</span><span class="main keyword3">+</span>
</pre>
<p>We will use this derivative in the main proof,
which for clarity is presented in chunks.
First, here’s the theorem statement itself:</p>
<pre class="source">
<span class="keyword1 command">theorem</span> x_ln_lowerbound<span class="main">:
</span><span class="keyword2 keyword">fixes</span> <span class="free">x</span><span class="main">::</span><span class="quoted">real
</span><span class="keyword2 keyword">assumes</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">≥</span></span> <span class="main">0</span>"
</span><span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="free">x</span><span class="main">)</span> <span class="main">≥</span> <span class="main">-</span><span class="main">1</span> <span class="main">/</span> exp <span class="main">1"
</span><span class="keyword1 command">proof</span> <span class="operator">-</span>
</pre>
<p>Next, define <code class="language-plaintext highlighter-rouge">xmin</code> (the $x$-value of the minimum) to be $1/e$,
then show that it is positive, a fact needed later.</p>
<pre class="source">
<span class="keyword3 command">define</span> <span class="skolem skolem">xmin</span><span class="main">::</span><span class="quoted">real</span> <span class="keyword2 keyword">where</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">xmin</span> <span class="main">≡</span> <span class="main">1</span></span> <span class="main">/</span></span> exp <span class="main">1"
</span><span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">xmin</span> <span class="main">></span></span> <span class="main">0</span>"
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> xmin_def<span class="main">)</span>
</pre>
<p>The claimed result requires a case analysis.
The nontrivial cases are $0\le x<1/e$ and $1/e<x$.
In the first case, the point is that $\ln x+1$ is negative,
with $x\ln x$ decreasing in value to its minimum.</p>
<pre class="source">
<span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="free">x</span><span class="main">)</span> <span class="main">></span> <span class="skolem">xmin</span> <span class="main">*</span> ln<span class="main">(</span><span class="skolem">xmin</span><span class="main">)"</span> <span class="keyword2 keyword">if</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main"><</span></span> <span class="skolem">xmin"</span>
</span><span class="keyword1 command">proof</span> <span class="main">(</span><span class="operator">intro</span> DERIV_neg_imp_decreasing_open <span class="main main">[</span><span class="operator">OF</span> that<span class="main main">]</span> exI conjI<span class="main">)
</span><span class="keyword3 command">fix</span> <span class="skolem">u</span> <span class="main">::</span> <span class="quoted">real
</span><span class="keyword3 command">assume</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main"><</span></span> <span class="skolem">u"</span></span> <span class="keyword2 keyword">and</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">u</span> <span class="main"><</span></span> <span class="skolem">xmin"</span>
</span><span class="keyword1 command">then</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>ln</span> <span class="skolem">u</span> <span class="main">+</span></span> <span class="main">1</span> <span class="main"><</span> ln <span class="main">1"
</span><span class="keyword1 command">unfolding</span> xmin_def
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">smt</span> <span class="main main">(</span>verit<span class="main main">,</span> del_insts<span class="main main">)</span> assms exp_diff exp_less_cancel_iff exp_ln_iff<span class="main">)
</span><span class="keyword1 command">then</span> <span class="keyword3 command">show</span> <span class="quoted"><span class="quoted"><span>"</span>ln</span> <span class="skolem">u</span> <span class="main">+</span></span> <span class="main">1</span> <span class="main"><</span> <span class="main">0"
</span><span class="keyword1 command">by</span> <span class="operator">simp
</span><span class="keyword1 command">next
</span><span class="keyword3 command">show</span> <span class="quoted"><span class="quoted"><span>"</span>continuous_on</span> <span class="main">{</span></span><span class="free">x</span><span class="main">..</span><span class="skolem">xmin</span><span class="main">}</span> <span class="main">(</span><span class="main">λ</span><span class="bound">u</span><span class="main">.</span> <span class="bound">u</span> <span class="main">*</span> ln <span class="bound">u</span><span class="main">)"
</span><span class="keyword1 command">using</span> continuous_on_x_ln continuous_on_subset assms <span class="keyword1 command">by</span> <span class="operator">fastforce
</span><span class="keyword1 command">qed</span> <span class="main">(</span><span class="operator">use</span> assms xln_deriv <span class="keyword2 keyword quasi_keyword">in</span> <span class="operator">auto</span><span class="main">)</span>
</pre>
<p>The second case is symmetric. The derivative is positive,
with $x\ln x$ increasing in value from its minimum.
In both cases, continuity is a requirement.</p>
<pre class="source">
<span class="keyword1 command">moreover
</span><span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="free">x</span><span class="main">)</span> <span class="main">></span> <span class="skolem">xmin</span> <span class="main">*</span> ln<span class="main">(</span><span class="skolem">xmin</span><span class="main">)"</span> <span class="keyword2 keyword">if</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">></span></span> <span class="skolem">xmin"</span>
</span><span class="keyword1 command">proof</span> <span class="main">(</span><span class="operator">intro</span> DERIV_pos_imp_increasing_open <span class="main main">[</span><span class="operator">OF</span> that<span class="main main">]</span> exI conjI<span class="main">)
</span><span class="keyword3 command">fix</span> <span class="skolem">u
</span><span class="keyword3 command">assume</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">></span></span> <span class="skolem">u"</span></span> <span class="keyword2 keyword">and</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">u</span> <span class="main">></span></span> <span class="skolem">xmin"</span>
</span><span class="keyword1 command">then</span> <span class="keyword3 command">show</span> <span class="quoted"><span class="quoted"><span>"</span>ln</span> <span class="skolem">u</span> <span class="main">+</span></span> <span class="main">1</span> <span class="main">></span> <span class="main">0"
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">smt</span> <span class="main main">(</span>verit<span class="main main">,</span> del_insts<span class="main main">)</span> <span class="quoted"><span class="quoted"><span>‹</span><span class="main">0</span></span> <span class="main"><</span></span> <span class="skolem">xmin›</span> exp_minus inverse_eq_divide
ln_less_cancel_iff ln_unique xmin_def<span class="main">)
</span><span class="keyword1 command">next
</span><span class="keyword3 command">show</span> <span class="quoted"><span class="quoted"><span>"</span>continuous_on</span> <span class="main">{</span></span><span class="skolem">xmin</span><span class="main">..</span><span class="free">x</span><span class="main">}</span> <span class="main">(</span><span class="main">λ</span><span class="bound">u</span><span class="main">.</span> <span class="bound">u</span> <span class="main">*</span> ln <span class="bound">u</span><span class="main">)"
</span><span class="keyword1 command">using</span> continuous_on_x_ln continuous_on_subset xmin_def <span class="keyword1 command">by</span> <span class="operator">fastforce
</span><span class="keyword1 command">qed</span> <span class="main">(</span><span class="operator">use</span> <span class="quoted"><span class="quoted"><span>‹</span><span class="main">0</span></span> <span class="main"><</span></span> <span class="skolem">xmin›</span> xln_deriv <span class="keyword2 keyword quasi_keyword">in</span> <span class="operator">auto</span><span class="main">)</span>
</pre>
<p>If $x=1/e$, then the minimum value equals $-1/e$.
Note how the previous results are collected using the keyword <span class="keyword1 command">moreover</span>.</p>
<pre class="source">
<span class="keyword1 command">moreover</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="skolem">xmin</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="skolem">xmin</span><span class="main">)</span> <span class="main">=</span> <span class="main">-</span><span class="main">1</span> <span class="main">/</span> exp <span class="main">1</span><span>"</span>
<span class="keyword1 command">using</span> assms <span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> xmin_def ln_div<span class="main">)</span>
</pre>
<p>The keyword <span class="keyword1 command"> ultimately </span> takes the previous result and the earlier two saved by <code class="language-plaintext highlighter-rouge">moreover</code>, delivering them to the following proof.
As we have covered all the cases for $x$, the conclusion follows immediately.</p>
<pre class="source">
<span class="keyword1 command">ultimately</span> <span class="keyword3 command">show</span> <span class="var quoted var">?thesis
</span><span class="keyword1 command">using</span> eq <span class="keyword1 command">by</span> <span class="operator">fastforce
</span><span class="keyword1 command">qed</span>
</pre>
<h3 id="finally-a-numerical-conclusion">Finally, a numerical conclusion</h3>
<p>In the <a href="/2024/07/25/Numeric_types.html">previous post</a>,
we used the <code class="language-plaintext highlighter-rouge">approximation</code> proof method (which operates by interval arithmetic) to calculate the minimum as -0.3679,
and it was slow (19 seconds on my zippy laptop).
Now we can get 17 significant figures more or less instantaneously.</p>
<pre class="source">
<span class="keyword1 command">corollary
</span><span class="keyword2 keyword">fixes</span> <span class="free">x</span><span class="main">::</span><span class="quoted">real
</span><span class="keyword2 keyword">assumes</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">≥</span></span> <span class="main">0</span>"
</span><span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="free">x</span><span class="main">)</span> <span class="main">≥</span> <span class="main">-</span><span class="numeral">0.36787944117144233"</span> <span class="main">(</span><span class="keyword2 keyword">is</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">_</span> <span class="main">≥</span></span> <span class="var">?rhs"</span></span><span class="main">)
</span><span class="keyword1 command">proof</span> <span class="operator">-
</span><span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">(</span><span class="main">-</span></span><span class="main">1</span></span><span class="main">::</span>real<span class="main">)</span> <span class="main">/</span> exp <span class="main">1</span> <span class="main">≥</span> <span class="var">?rhs"
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">approximation</span> 60<span class="main">)
</span><span class="keyword1 command">with</span> x_ln_lowerbound <span class="keyword3 command">show</span> <span class="var quoted var">?thesis
</span><span class="keyword1 command">using</span> assms <span class="keyword1 command">by</span> <span class="operator">force
</span><span class="keyword1 command">qed</span>
</pre>
<p>If you fancy a challenge, try the same exercise with the function $x\sin(1/x)$.
In many ways it’s similar, but the derivative hits zero infinitely often
and the formula, $\sin(1/x) - \cos(1/x)/x$, doesn’t look easy to work with.
If any of you tackle this problem, send it to me:
the first nice solution will be posted here.</p>
<p>The examples for this post are online <a href="/Isabelle-Examples/Ln_lower_bound.thy">here</a>.</p>
<p>Many thanks to Manuel Eberl for the continuity proof!</p>
Thu, 08 Aug 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/08/08/Ln_lower_bound.html
https://lawrencecpaulson.github.io//2024/08/08/Ln_lower_bound.htmlThe mysteries and frustrations of numerical proofs<p>Sometimes the smallest details are the worst. What do we mean by 2+2=4? There are many different kinds of numbers: integers, rationals, reals, etc. It should not affect the answer, and mathematical writing takes advantage of that fact, but formalised mathematics has to be unambiguous.
A further complication is that the various kinds of numbers are used in distinctive ways, e.g. recursion is only for the natural numbers.
How do we reconcile the zero-versus-successor conception of the naturals
with the need to work with decimal constants or large integers? Let’s see how it’s done in Isabelle/HOL.</p>
<h3 id="simple-arithmetic-with-symbolic-binary-notation">Simple arithmetic with symbolic binary notation</h3>
<p>If you ask Isabelle to prove 2+2=4, thankfully it will do it.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="numeral">2</span><span class="main">+</span></span><span class="numeral">2</span><span class="main">=</span></span><span class="numeral">4</span><span>"</span>
<span class="keyword1 command">by</span> <span class="operator">auto</span>
</pre>
<p>What kind of numbers are we talking about here? <a href="/2022/05/11/jEdit-tricks.html">Remember</a>, you can inspect any Isabelle syntactic element using CTRL-hover (CMD-hover on a Mac).
Hover over any of the numbers and Isabelle will display the type <code class="language-plaintext highlighter-rouge">'a</code>. Hover over that type variable to inspect its type class, which is <code class="language-plaintext highlighter-rouge">numeral</code>.
This is the class of all types for which numerals work, and Isabelle knows how to add numerals. It works just as well for large numerals, say to calculate 123456789 + 987654321 = 1111111110. But such super abstract calculations do not work for 2*3=6: this time, if we inspect the type it is again <code class="language-plaintext highlighter-rouge">'a</code>, but the type class is <code class="language-plaintext highlighter-rouge">{times,numeral}.</code>
That means Isabelle has detected that multiplication is involved but nothing more: it does not assume any ring laws.
Another example that fails is 0+2=2; here the relevant type class is <code class="language-plaintext highlighter-rouge">{zero,numeral}</code>, which does not include the identity law for 0.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="numeral">2</span><span class="main">*</span></span><span class="numeral">3</span><span class="main">=</span></span><span class="numeral">6</span><span>"</span>
<span class="keyword1 command">oops</span>
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">0</span></span><span class="main">+</span></span><span class="numeral">2</span><span class="main">=</span><span class="numeral">2</span><span>"</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">1</span></span><span class="main">*</span></span><span class="numeral">3</span><span class="main">=</span><span class="numeral">3</span><span>"</span>
<span class="keyword1 command">oops</span>
</pre>
<p>The way to avoid all such problems is simply to ensure that the type is constrained somehow. In most cases, the use of a variable declared previously will be enough. Sometimes a variable declaration will need a type constraint: <code class="language-plaintext highlighter-rouge">fix x :: real</code>. But you can always write an explicit type constraint within an expression, as in 123456789 * (987654321::int).
Isabelle will happily simplify that to 121932631112635269.
Also, the function <code class="language-plaintext highlighter-rouge">Suc</code> implies type <code class="language-plaintext highlighter-rouge">nat</code>.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="numeral">123456789</span> <span class="main">*</span></span> <span class="main">(</span><span class="numeral">987654321</span><span class="main">::</span>int</span><span class="main">)</span> <span class="main">=</span> <span class="numeral">121932631112635269</span><span>"</span>
<span class="keyword1 command">by</span> <span class="operator">simp</span>
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span>Suc</span> <span class="main">(</span>Suc</span> <span class="main">0</span><span class="main">)</span> <span class="main">*</span> <span class="free">n</span> <span class="main">=</span> <span class="free">n</span><span class="main">*</span><span class="numeral">2</span><span>"</span>
<span class="keyword1 command">by</span> <span class="operator">simp</span>
</pre>
<p>Isabelle can perform arithmetic on constants efficiently thanks to its internal representation: a form of symbolic binary notation. Positional notation is necessary to handle large integers, binary (as opposed to say decimal) makes the formal theory of arithmetic compact.
Already in the 1990s I had devised a symbolic representation of numerals that worked well; the one used today is a clever refinement of that, but I have no idea whom to credit. <em>If anybody is aware of a publication on this topic, I would be happy to cite it here.</em>
The original version is still used for Isabelle/ZF; it is a two’s complement format, eliminating the ugly case-analyses associated with signed arithmetic.</p>
<p>Isabelle/HOL has over the years accumulated some capabilities that are cool although perhaps of little practical value. For example, <code class="language-plaintext highlighter-rouge">root 100 1267650600228229401496703205376</code> is quickly and automatically simplified to 2.</p>
<h3 id="unary-notation-and-the-horror-of-1">Unary notation and the horror of 1</h3>
<p>The idea of representing natural numbers using the two symbols <code class="language-plaintext highlighter-rouge">0</code> and <code class="language-plaintext highlighter-rouge">Suc</code> (for successor) comes from logic. Unary notation, e.g. <code class="language-plaintext highlighter-rouge">Suc(Suc(Suc 0))</code> for 3, is convenient for expressing mathematical induction rules symbolically and when defining the concept of a primitive recursive function.
It is not convenient for anything else, taking exponentially more space to represent an integer compared with positional notation.
So it is surprising that one of the most prominent proof assistants uses it even for large integers!</p>
<p>When working with natural numbers (type <code class="language-plaintext highlighter-rouge">nat</code> in Isabelle/HOL), we sometimes have to translate between unary and binary notation. In particular, we may occasionally need to transform something like 100 into <code class="language-plaintext highlighter-rouge">Suc 99</code>. If one is prepared to write the transformation as an explicit proof step using the keyword <strong>have</strong>, then Isabelle can prove such identities automatically. Some lower level tricks are available to do the transformation more compactly.
Simplest is to rewrite with <code class="language-plaintext highlighter-rouge">eval_nat_numeral</code>:</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span><span class="main">^</span></span><span class="numeral">5</span> <span class="main">=</span></span> <span class="free">x</span><span class="main">*</span><span class="free">x</span><span class="main">*</span><span class="free">x</span><span class="main">*</span><span class="free">x</span><span class="main">*</span><span class="main">(</span><span class="free">x</span><span class="main">::</span>real<span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> eval_nat_numeral<span class="main">)</span>
</pre>
<p>Please note: 0 and 1 are constants, overloaded on all numeric types. They are separate from the system of symbolic binary notation mentioned above. The constant 1, when it has type <code class="language-plaintext highlighter-rouge">nat</code>, will automatically be simplified to <code class="language-plaintext highlighter-rouge">Suc 0</code>. We would much have preferred to treat the two expressions as symbolically identical, but couldn’t think of a clean way to do it. If you are working with natural numbers, especially as a beginner, it would be best to avoid the symbol 1 altogether.</p>
<h3 id="real-constants-and-interval-arithmetic">Real constants and interval arithmetic</h3>
<p>You can also write real constants such as 3.1416.
This expands to the fraction 31416 / 10^4.
After simplification, this fraction will be simplified and the denominator may get multiplied out; you may not recognise your real constant anymore.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">(</span><span class="main">1</span></span> <span class="main">-</span></span> <span class="numeral">0.3</span><span class="main">*</span><span class="keyword1">𝗂</span><span class="main">)</span> <span class="main">*</span> <span class="main">(</span><span class="numeral">2.7</span> <span class="main">+</span> <span class="numeral">5</span><span class="main">*</span><span class="keyword1">𝗂</span><span class="main">)</span> <span class="main">=</span> <span class="numeral">4.2</span> <span class="main">+</span> <span class="numeral">4.19</span><span class="main">*</span><span class="keyword1">𝗂</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> <span class="dynamic dynamic">algebra_simps</span><span class="main">)</span>
</pre>
<p>For complex numbers, the imaginary constant <code class="language-plaintext highlighter-rouge">𝗂</code> can be chosen from the Symbols palette under Letters, or typed directly as <code class="language-plaintext highlighter-rouge">\<i></code>
Please note that <code class="language-plaintext highlighter-rouge">^</code> operator requires the exponent to be a natural number; to use integer or real exponents, the corresponding operators are <code class="language-plaintext highlighter-rouge">powi</code> and <code class="language-plaintext highlighter-rouge">powr</code>.</p>
<h3 id="large-integer-computations-using-code-generation">Large integer computations using code generation</h3>
<p>Evaluating functions like the factorial, that are recursive but generate large integers, require special treatment.
The recursion requires the argument to be given in unary notation, but the results must be given in symbolic binary notation.
The most efficient way to achieve this is to make use of computational reflection, where executable Standard ML code is extracted from function definition and run directly on the computer.
We can compute factorials and test primality with reasonable efficiency.
The proof method <code class="language-plaintext highlighter-rouge">eval</code> calls for the computational evaluation of expressions in the goal statement. These cannot be done using <code class="language-plaintext highlighter-rouge">auto</code>!</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span>fact</span> <span class="numeral">20</span> <span class="main"><</span></span> <span class="main">(</span><span class="numeral">2432902008176640001</span><span class="main">::</span>nat<span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="operator">eval</span>
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span>prime</span> <span class="main">(</span><span class="numeral">179424673</span><span class="main">::</span>nat</span><span class="main">)</span><span>"</span>
<span class="keyword1 command">by</span> <span class="operator">eval</span>
</pre>
<p>It must be noted that computational reflection provides a weaker standard of proof, as we are required to trust the translation from higher-order logic into Standard ML and then to machine code for execution.
In mitigation, we can remark that the function being translated into Standard ML is trivial,
and that we already have to trust the Standard ML system to run Isabelle itself.</p>
<h3 id="precise-real-number-computations-using-interval-arithmetic">Precise real-number computations using interval arithmetic</h3>
<p>The <code class="language-plaintext highlighter-rouge">approximation</code> proof method, developed by Johannes Hölzl, provides arbitrary-precision real arithmetic via computational reflection.
It can perform calculations that otherwise would be beyond tedious.
Its argument specifies the required precision.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">¦</span></span>pi</span> <span class="main">-</span> <span class="numeral">355</span><span class="main">/</span><span class="numeral">113</span><span class="main">¦</span> <span class="main"><</span> <span class="main">1</span><span class="main">/</span><span class="numeral">10</span><span class="main">^</span><span class="numeral">6</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">approximation</span> 25<span class="main">)</span>
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">¦</span></span>sqrt</span> <span class="numeral">2</span> <span class="main">-</span> <span class="numeral">1.4142135624</span><span class="main">¦</span> <span class="main"><</span> <span class="main">1</span><span class="main">/</span><span class="numeral">10</span><span class="main">^</span><span class="numeral">10</span><span>"</span>
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">approximation</span> 35<span class="main">)</span>
</pre>
<p>Remarkably, <code class="language-plaintext highlighter-rouge">approximation</code> is not confined to single values but can also prove numerical results on closed intervals, via the <code class="language-plaintext highlighter-rouge">splitting</code> option.</p>
<pre class="source">
<span class="keyword1 command">lemma</span>
<span class="keyword2 keyword">fixes</span> <span class="free">x</span><span class="main">::</span><span class="quoted">real</span>
<span class="keyword2 keyword">assumes</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">∈</span></span> <span class="main">{</span></span><span class="numeral">0.1</span> <span class="main">..</span> <span class="main">1</span><span class="main">}</span><span>"</span>
<span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="free">x</span><span class="main">)</span> <span class="main">≥</span> <span class="main">-</span><span class="numeral">0.368</span><span>"</span>
<span class="keyword1 command">using</span> assms <span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">approximation</span> 17 <span class="quasi_keyword">splitting</span><span class="main main">:</span> <span class="quoted free">x</span><span class="main main">=</span>13<span class="main">)</span>
</pre>
<p>This particular example is problematical because $x\ln x$ is undefined when $x=0$.
The given interval avoids zero, since otherwise <code class="language-plaintext highlighter-rouge">approximation</code> would simply fail.
The example above takes a couple of seconds even though it is not that close to the exact answer, which is $-1/e \approx 0.36787944$.</p>
<p>If we try to get even a little bit closer, the <code class="language-plaintext highlighter-rouge">approximation</code> tactic takes much more than a couple of seconds.</p>
<pre class="source">
<span class="keyword1 command">lemma</span>
<span class="keyword2 keyword">fixes</span> <span class="free">x</span><span class="main">::</span><span class="quoted">real</span>
<span class="keyword2 keyword">assumes</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">∈</span></span> <span class="main">{</span></span><span class="numeral">0.1</span> <span class="main">..</span> <span class="main">1</span><span class="main">}</span><span>"</span>
<span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">x</span> <span class="main">*</span></span> ln</span><span class="main">(</span><span class="free">x</span><span class="main">)</span> <span class="main">≥</span> <span class="main">-</span><span class="numeral">0.3679</span><span>"</span>
<span class="keyword1 command">using</span> assms
<span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">approximation</span> 18 <span class="quasi_keyword">splitting</span><span class="main main">:</span> <span class="quoted free">x</span><span class="main main">=</span>16<span class="main">)</span>
</pre>
<p>I hope to treat this example precisely in a future post.
However, sometimes this sort of approximation is exactly what we need.</p>
<h3 id="a-final-remark-inclusion-chains-of-different-kinds-of-numbers">A final remark: inclusion chains of different kinds of numbers?</h3>
<p>Some proof assistants support the common convention that the different kinds of numbers form an inclusion chain:
the natural numbers are literally a subset of the reals,
as opposed to merely being embeddable into them.
To realise this aim requires ugly technicalities,
as well as making an arbitrary choice of the largest numeric type, to sit at the top of the inclusion chain.
It could be the real numbers, the complex numbers or even something above that level, such as the quaternions.
But we can always embed one kind of number into something more complicated.</p>
<p>In Isabelle/HOL, the numeric types are distinct, with embedding functions
such as <code class="language-plaintext highlighter-rouge">of_nat</code>, which embeds the natural numbers into any type that defines 0, 1 and +, and the analogous <code class="language-plaintext highlighter-rouge">of_int</code>.
To prevent formulas from being cluttered with occurrences of such functions, Isabelle by default inserts them where necessary to make an expression type-correct.
It does this its own way, and sometimes you may prefer to take control by inserting them yourself the way you like it.
Remember also that numerals such as 1729 can take on any numeric type.</p>
<p>A file containing the examples above is available to <a href="/Isabelle-Examples/Numeric.thy">download</a>.</p>
Thu, 25 Jul 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/07/25/Numeric_types.html
https://lawrencecpaulson.github.io//2024/07/25/Numeric_types.htmlTwo small examples by Fields medallists<p>A couple of weeks ago, Tim Gowers posted on Twitter an unusual characterisation of bijective functions: that they preserve set complements.
Alex Kontorovich re-tweeted that post accompanied by a Lean proof detailing Gowers’ argument.
I took a look, and lo and behold! Isabelle can prove it with a single sledgehammer call.
(That one line proof isn’t necessarily the best proof, however.
Remember, we want proofs that are easy to read and maintain.)
And Terrence Tao published a small example on Mastodon; let’s look at that one too.</p>
<h3 id="gowers-original-tweet">Gowers’ original tweet</h3>
<p>Here is the original tweet, a thread in classical Twitter style:</p>
<blockquote>
<p>I’ve just noticed that a function f:X->Y is a bijection if and only if it preserves complements, that is, if and only if f(X\A)=Y\f(A) for every subset A of X. Is this a standard fact that has somehow passed me by for four decades? Simple proof in rest of (short) thread. 1/3</p>
</blockquote>
<blockquote>
<p>If f is a bijection and B=X\A, then f preserves unions and intersections and f(X)=Y, so f(A) and f(B) are disjoint and have union equal to Y. Conversely, if f preserves complements, then setting A = emptyset, we see that f(X)=Y, so f is a surjection. 2/3</p>
</blockquote>
<blockquote>
<p>And for every x we also have that f(X{x})=Y{f(x)}. Therefore, if x and y are distinct, then so are f(x) and f(y). So f is an injection as well. 3/3</p>
</blockquote>
<p>In standard mathematical notation, the claim is that if a function $f:X\to Y$ is given,
then $f$ is a bijection from $X$ to $Y$ if and only if it preserves complements, i.e.
if $f[X\setminus A] = Y \setminus f[A]$ for all $A\subseteq X$.
Incidentally, there are various ways of writing the image under a function of a set;
Here I use square brackets, while Lean and Isabelle provide their own image operators.</p>
<h3 id="the-lean-formalisation">The Lean formalisation</h3>
<p>Kontorovich posted his version as an image:</p>
<p><img src="/images/Gowers-example.jpeg" alt="Formalisation of the bijection proof in Lean by Alex Kontorovich" /></p>
<p>Note that he has written out the argument in detail,
with plenty of comments to explain what is going on.</p>
<h3 id="investigating-the-problem-in-isabelle">Investigating the problem in Isabelle</h3>
<p>This problem looked intriguing, so I tried it with Isabelle.
The brute force way to tackle such a proof is</p>
<ol>
<li>try the <code class="language-plaintext highlighter-rouge">auto</code> proof method to solve or at least break up the problem</li>
<li>invoke <a href="https://isabelle.in.tum.de/dist/doc/sledgehammer.pdf">sledgehammer</a>
on the subgoals that are produced.</li>
</ol>
<p>The proof you get this way is likely to be horrible.
However, once you have your first proof, it’s easy to get a nicer proof.
(And you should take the trouble.)
For the current problem, if you type <code class="language-plaintext highlighter-rouge">auto</code>, you get four ugly subgoals, each of which sledgehammer proves automatically.
I don’t want to show this, but you can try it for yourself.
The Isabelle theory file is <a href="/Isabelle-Examples/Gowers_Bijection.thy">here</a>.</p>
<p>What I actually tried, first, was to split the logical equivalence into its two directions.
I was pleased to see that sledgehammer could prove both.
Then I thought, let’s see if it can prove the whole claim at once, and indeed it could!</p>
<h3 id="the-isabelle-proofs">The Isabelle proofs</h3>
<p>To begin, a technicality about notation.
In Isabelle, <em>set difference</em> is written with a minus sign, $A-B$,
because the standard backslash character is reserved for other purposes.
The usual set difference symbol can be selected from the Symbols palette
or typed as <code class="language-plaintext highlighter-rouge">\setminus</code> (autocomplete will help).
So let’s begin by setting that up, allowing us to use the conventional symbol.
It will be accepted for input and used to display results.</p>
<pre class="source">
<span class="keyword1 command">abbreviation</span> <span class="entity">set_difference</span> <span class="main">::</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">[</span><span class="tfree">'a</span> set</span><span class="main">,</span><span class="tfree">'a</span> set</span><span class="main">]</span> <span class="main">⇒</span> <span class="tfree">'a</span> set<span>"</span> <span class="main">(</span><span class="keyword2 keyword">infixl</span> <span class="quoted"><span>"</span><span class="keyword1">∖</span><span>"</span></span> 65<span class="main">)</span><span>
</span><span class="keyword2 keyword">where</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free bound entity">A</span> <span class="main free">∖</span> <span class="free bound entity">B</span> <span class="main">≡</span> <span class="free bound entity">A</span><span class="main">-</span></span><span class="free bound entity">B</span><span>"</span></span>
</pre>
<p>The following is the nicest of the one-shot proofs found by sledgehammer.
This problem turned out to be relatively easy; three of the constituent provers
solved it.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span>bij_betw</span> <span class="free">f</span> <span class="free">X</span> <span class="free">Y</span> <span class="main">⟷</span></span> <span class="main">(</span><span class="main">∀</span><span class="bound">A</span><span class="main">.</span> <span class="bound">A</span><span class="main">⊆</span><span class="free">X</span> <span class="main">⟶</span> <span class="free">f</span> <span class="main">`</span> <span class="main">(</span><span class="free">X</span><span class="main">∖</span><span class="bound">A</span><span class="main">)</span> <span class="main">=</span> <span class="free">Y</span> <span class="main">∖</span> <span class="free">f</span><span class="main">`</span><span class="bound">A</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">metis</span> Diff_empty Diff_eq_empty_iff Diff_subset bij_betw_def image_is_empty<span>
</span>inj_on_image_set_diff subset_antisym subset_image_inj<span class="main">)</span>
</pre>
<p>I don’t actually recommend that you allow proofs of this sort
to accumulate in your development.
It leaves us completely in the dark as to why the claim holds.
Moreover, if you want your development to be maintainable,
it needs to be resilient in the presence of change.
I’m always having to make corrections and adjustments (because I’m always making mistakes),
And while rerunning the proofs can be an anxious moment, usually they all work fine.
At worst, they can be fixed by another sledgehammer call.
Opaque proofs like the one above will be hard to fix when they break.</p>
<p>The simplest way to get a clearer proof for this particular problem
is by separately treating the left-to-right and right-to-left directions.
This is also an opportunity to see the <code class="language-plaintext highlighter-rouge">is</code> mechanism for matching a pattern to a formula.
An arbitrary pattern is permitted, and here we set up <code class="language-plaintext highlighter-rouge">?L</code> and <code class="language-plaintext highlighter-rouge">?R</code>
to denote the left and right hand sides.</p>
<pre class="source">
<span class="keyword1 command">lemma</span> <span class="quoted"><span class="quoted"><span>"</span>bij_betw</span> <span class="free">f</span> <span class="free">X</span> <span class="free">Y</span> <span class="main">⟷</span></span> <span class="main">(</span><span class="main">∀</span><span class="bound">A</span><span class="main">.</span> <span class="bound">A</span><span class="main">⊆</span><span class="free">X</span> <span class="main">⟶</span> <span class="free">f</span> <span class="main">`</span> <span class="main">(</span><span class="free">X</span><span class="main">∖</span><span class="bound">A</span><span class="main">)</span> <span class="main">=</span> <span class="free">Y</span> <span class="main">∖</span> <span class="free">f</span><span class="main">`</span><span class="bound">A</span><span class="main">)</span><span>"</span> <span class="main">(</span><span class="keyword2 keyword">is</span> <span class="quoted"><span class="quoted"><span>"</span><span class="var">?L</span><span class="main">=</span></span><span class="var">?R</span><span>"</span></span><span class="main">)</span><span>
</span><span class="keyword1 command">proof</span><span>
</span><span class="keyword3 command">show</span> <span class="quoted quoted"><span>"</span><span class="var">?L</span> <span class="main">⟹</span> <span class="var">?R</span><span>"</span></span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">metis</span> Diff_subset bij_betw_def inj_on_image_set_diff<span class="main">)</span><span>
</span><span class="keyword3 command">assume</span> <span class="var quoted var">?R</span><span>
</span><span class="keyword1 command">then</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span>inj_on</span> <span class="free">f</span> <span class="free">X</span><span>"</span></span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">f</span> <span class="main">`</span></span> <span class="free">X</span> <span class="main">=</span></span> <span class="free">Y</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">auto</span> <span class="quasi_keyword">simp</span><span class="main main">:</span> inj_on_def<span class="main">)</span><span>
</span><span class="keyword1 command">then</span> <span class="keyword3 command">show</span> <span class="var quoted var">?L</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> bij_betw_def<span class="main">)</span><span>
</span><span class="keyword1 command">qed</span>
</pre>
<p>This proof is much clearer. The left to write proof requires only three previous facts.
The right to left proof is practically automatic.
You might argue that even here, the actual reasoning is still opaque.
However, this proof tells us that the right to left direction
is essentially a calculation from the definitions,
while the opposite direction is the consequence of three facts (rather than eight, as before).
This sort of proof will be much easier to maintain.</p>
<p>A further Isabelle bonus: note that both the Lean proof and Gowers’ informal argument
begin by assuming $f:X\to Y$.
The Isabelle version states unconditionally that
$f$ is a bijection from $X$ to $Y$ if and only if it preserves complements.
The implicit typing of $f$ ensures only that it is a function:
over arbitrary types that we don’t even mention.</p>
<h3 id="taos-example">Tao’s example</h3>
<p>Unfortunately, I wasn’t able to locate Tao’s original post.
But he stated a nice little problem and gave a formalisation using Lean, and again I couldn’t help trying it out in Isabelle. I liked my proof more.</p>
<p>We are given a decreasing real-valued sequence $\{a_k\}$
and a family of non-negative reals $\{D_k\}$
such that $a_k\le D_k - D_{k+1}$ for all $k$.
The task is to prove $a_k \le \frac{D_0}{k+1}$.</p>
<pre class="source">
<span class="keyword1 command">lemma</span><span>
</span><span class="keyword2 keyword">fixes</span> <span class="free">a</span> <span class="main">::</span> <span class="quoted"><span class="quoted"><span>"</span>nat</span> <span class="main">⇒</span> real</span><span>"</span><span>
</span><span class="keyword2 keyword">assumes</span> a<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span>decseq</span> <span class="free">a</span><span>"</span></span> <span class="keyword2 keyword">and</span> D<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">⋀</span><span class="bound">k</span><span class="main">.</span> <span class="free">D</span> <span class="bound">k</span> <span class="main">≥</span></span> <span class="main">0</span></span><span>"</span> <span class="keyword2 keyword">and</span> aD<span class="main">:</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">⋀</span><span class="bound">k</span><span class="main">.</span> <span class="free">a</span> <span class="bound">k</span> <span class="main">≤</span></span> <span class="free">D</span> <span class="bound">k</span> <span class="main">-</span></span> <span class="free">D</span><span class="main">(</span>Suc <span class="bound">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword2 keyword">shows</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">a</span> <span class="free">k</span> <span class="main">≤</span></span> <span class="free">D</span> <span class="main">0</span></span> <span class="main">/</span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">proof</span> <span class="operator">-</span><span>
</span><span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="free">a</span> <span class="free">k</span> <span class="main">=</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">i</span><span class="main">≤</span><span class="free">k</span><span class="main">.</span> <span class="free">a</span> <span class="free">k</span><span class="main">)</span> <span class="main">/</span></span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="operator">simp</span><span>
</span><span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">≤</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">i</span><span class="main">≤</span><span class="free">k</span><span class="main">.</span> <span class="free">a</span> <span class="bound">i</span><span class="main">)</span> <span class="main">/</span></span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">using</span> a sum_mono<span class="main">[</span><span class="operator">of</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">{..</span></span><span class="free">k</span><span class="main">}</span></span><span>"</span> <span class="quoted quoted"><span>"</span><span class="main">λ</span><span class="bound">i</span><span class="main">.</span> <span class="free">a</span> <span class="free">k</span><span>"</span></span> <span class="quoted free">a</span><span class="main">]</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> monotone_def <span class="dynamic dynamic">divide_simps</span> mult.commute<span class="main">)</span><span>
</span><span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">≤</span></span> <span class="main">(</span><span class="main">∑</span><span class="bound">i</span><span class="main">≤</span><span class="free">k</span><span class="main">.</span> <span class="free">D</span> <span class="bound">i</span> <span class="main">-</span></span> <span class="free">D</span><span class="main">(</span>Suc <span class="bound">i</span><span class="main">)</span><span class="main">)</span> <span class="main">/</span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> aD divide_right_mono sum_mono<span class="main">)</span><span>
</span><span class="keyword1 command">also</span> <span class="keyword1 command">have</span> <span class="quoted"><span class="quoted"><span>"</span><span class="main">…</span> <span class="main">≤</span></span> <span class="free">D</span> <span class="main">0</span></span> <span class="main">/</span> <span class="main">(</span>Suc <span class="free">k</span><span class="main">)</span><span>"</span><span>
</span><span class="keyword1 command">by</span> <span class="main">(</span><span class="operator">simp</span> <span class="quasi_keyword">add</span><span class="main main">:</span> sum_telescope D divide_right_mono<span class="main">)</span><span>
</span><span class="keyword1 command">finally</span> <span class="keyword3 command">show</span> <span class="var quoted var">?thesis</span> <span class="keyword1 command">.</span><span>
</span><span class="keyword1 command">qed</span>
</pre>
<p>Isabelle’s calculational style is perfect for this sort of inequality chain.</p>
<h3 id="final-remarks">Final remarks</h3>
<p><strong>Always</strong> break up your problem
into its constituents – probably by calling <code class="language-plaintext highlighter-rouge">auto</code> – before calling sledgehammer.
The effort needed to prove all the separate parts
is generally much less than that needed prove the whole in one go.
Besides which, part of your problem may simply be too difficult for sledgehammer.
Better to isolate that part to work on later, while disposing of the easier bits.</p>
<p>The Isabelle theory file is available to <a href="/Isabelle-Examples/Gowers_Bijection.thy">download</a>.</p>
Wed, 28 Feb 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/02/28/Gowers_bijection_example.html
https://lawrencecpaulson.github.io//2024/02/28/Gowers_bijection_example.htmlContradictions and the Principle of Explosion<p>That logic should be <a href="https://plato.stanford.edu/entries/contradiction/#">free from contradiction</a> is probably its most fundamental principle,
dating back to Aristotle.
As described <a href="/2024/01/31/Russells_Paradox.html">last time</a>,
the emergence of a contradiction in set theory – in the form of Russell’s paradox – was catastrophic. Few question the claim that no statement can be both true and false
at the same time.
But the law of contradiction is widely associated with something else,
the <a href="https://plato.stanford.edu/entries/logic-paraconsistent/#BrieHistExContQuod"><em>principle of explosion</em></a>:
<em>ex contradictione quodlibet</em>, a contradiction implies everything.
This principle has been disputed. One can formulate predicate logic without it:
<em>minimal logic</em>.<br />
And allegedly a student challenged Bertrand Russell
by saying “suppose 1=0; prove that you are the Pope”.
Russell is said to have replied that if 1=0 then 2=1 and therefore
the 2-element set consisting of himself and the Pope actually contains only one element.
It’s an amusing tale, but is the argument rigorous?</p>
<h3 id="origins">Origins</h3>
<p>A 12th century Parisian logician named
<a href="https://en.wikipedia.org/wiki/William_of_Soissons">William of Soissons</a>
is said to have been the first to derive the principle of explosion.
There is a simple logical proof of an arbitrary conclusion $Q$
from the two assumptions $P$ and $\neg P$.
For if we know $P$ then surely $P\lor Q$ follows by the meaning of logical OR.
So either $P$ or $Q$ holds, but the former is impossible by $\neg P$.
Hence, we have derived $Q$.</p>
<p>Unfortunately, this argument cannot be carried out in a typical natural deduction calculus.
The proof turns out to rely on the principle of explosion itself,
which is built into most formalisms: the reasoning would be circular.
I think the informal version of the proof is pretty convincing,
but we can look for other evidence.
(And yes, <strong>evidence</strong> is what we should be looking for when trying to justify a principle
too fundamental to be proved.)
In many specific contexts, a contradicting fact leads to an explosion by calculation.</p>
<h3 id="the-explosion-in-arithmetic">The explosion in arithmetic</h3>
<p>As we saw in the argument attributed to Russell, 1=0 in an arithmetic setting
allows other identities to be derived by adding or multiplying the two sides by a constant.
It’s trivial to obtain $m=n$ for all pairs of numbers.
Conversely, the assumption $m=n$ can be transformed by subtraction and division into 1=0.
On the other hand, it is possible postulate something like 5=0
if the other axioms are weak enough, and then you have simply supplied the axioms for
a version of modular arithmetic.</p>
<h3 id="the-explosion-in-the-λ-calculus">The explosion in the λ-calculus</h3>
<p>The λ-calculus is an extremely simple formalism in which a great many
computational notions can be encoded.
Familiar data types such as the Booleans, natural numbers, integers, lists and trees
can be represented, as well as algorithms operating on them.
We can even have infinite lists and trees operated on by “lazy” algorithms.
The standard representations of true and false are
$\lambda x y.x$ and $\lambda x y.y$, respectively.
So what happens if we are given that true equals false? Then
$M = (\lambda x y.x)MN = (\lambda x y.y)MN = N$. Therefore we can show $M=N$
for any two given λ-terms, $M$ and $N$.
The same sort of thing happens given 1=0 and the standard representation of natural numbers,
though the details are complicated.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
<h3 id="the-explosion-in-axiomatic-set-theory">The explosion in axiomatic set theory</h3>
<p>Here things get a little more technical. And with all due respect to Bertrand Russell,<br />
he is not a set, and neither is the Pope.
In set theory, 0 is the empty set and 1 is $\{0\}$, which implies $0\in 1$.
So 1=0 then we have big problems: $0\in 1$ is both true and false
(because nothing can belong to the empty set).
And so, for any given set $A$, the set $\{x\in A\mid 0\in 1\}$ equals $A$
if we take $0\in 1$ to be true, but otherwise the resulting set is empty.
It follows that $A$ equals the empty set for all $A$, so all sets are equal.</p>
<h3 id="deriving-the-explosion-in-natural-deduction-logic">Deriving the explosion in natural deduction logic</h3>
<p>The rule of disjunction elimination in natural deduction allows us to derive
an arbitrary conclusion $R$ from the following three promises:</p>
<ul>
<li>$P\lor Q$</li>
<li>a proof of $R$ that may assume $P$</li>
<li>a proof of $R$ that may assume $Q$</li>
</ul>
<p>The idea behind this rule is that one of $P$ or $Q$ must be true, and therefore,
$R$ is derivable using the corresponding premise.
The rule incorporates the key idea of natural deduction,
namely permission to make specified assumptions locally
that are <em>discharged</em> (“paid off”, so to speak) further on.</p>
<p>This rule can obviously be generalised to an $n$-ary disjunction. We may derive $R$
from the following $n+1$ promises:</p>
<ul>
<li>$P_1\lor \cdots \lor P_n$</li>
<li>a proof of $R$ that may assume $P_i$, for $i=1$, …, $n$</li>
</ul>
<p>Obviously, if $n=2$, we get the same rule as before.
If $n=1$, it degenerates to a tautology.
And what happens if $n=0$?
Then the rule says that $R$ follows from the empty disjunction alone.
The empty disjunction is falsity.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
If our calculus can derive falsity from $P$ and $\neg P$,
then it has the principle of explosion built in.</p>
<h3 id="final-remarks">Final remarks</h3>
<p>As promised, in specific formal systems, the principle of explosion arises all by itself.
It doesn’t have to be assumed. Taking it as a general logical principle
is then simply a form of abstraction.<br />
But it also arises naturally in logical formalisms from the basic principles of natural deduction.</p>
<p><a href="https://plato.stanford.edu/entries/logic-paraconsistent/">Paraconsistent logics</a>
are formal systems in which the impact of a contradiction is contained.
I can’t comment on the value of such work to philosophy,
but they have also been studied in the context of artificial intelligence.
There, the point is that it’s easy for the facts in an inexact real-world situation
to be inconsistent, and you don’t want everything to collapse.
I would argue however that you should never be using formal logic
to reason directly about real-world situations.
And indeed, the symbolic/logical tendency that was so prominent in early AI work
has pretty much vanished in favour of essentially statistical techniques
based on neural networks.
There, the problem doesn’t arise because nothing is being proved.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>$M = 0(\textbf{K}N)M = 1(\textbf{K}N)M = \textbf{K}NM = N$ <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>I hope you can see this: $P_1\lor \cdots \lor P_n$ is true precisely if some $P_i$ is true, $i=1$, …, $n$. If $n=0$ then it must always be false. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 14 Feb 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/02/14/Contradiction.html
https://lawrencecpaulson.github.io//2024/02/14/Contradiction.htmlRussell's Paradox: myth and fact<p>The story of Russell’s paradox is well known.
<a href="https://plato.stanford.edu/entries/frege/">Gottlob Frege</a> had written a treatise
on the foundations of arithmetic, presenting a formal logic and a
development of elementary mathematics from first principles.
Bertrand Russell, upon reading it, wrote to Frege mostly to praise the work, but asking a critical question. Frege replied to express his devastation at seeing his life’s work ruined.
Some later commentators went further, saying that Russell’s paradox refuted the entire <a href="https://plato.stanford.edu/entries/logicism/#">logicist approach</a>
to the foundations of mathematics (the idea that mathematics can be reduced to logic).
Much is wrong with this story.
The impact of Russell’s paradox is less than many believe, and greater.</p>
<h3 id="what-is-russells-paradox">What is Russell’s paradox?</h3>
<p>The paradox can be expressed quite simply in English.
Let $R$ denote the set of all sets that are not members of themselves;
Then, $R$ is a member of itself if and only if it is not a member of itself.
(In symbols, define $R$ as $\{x \mid x\not\in x\}$; then $R\in R$ iff $R\not\in R$.)
Both possibilities lead to a contradiction.</p>
<p>Faced with such a situation, we need to locate the problem: to scrutinise every assumption.
At the centre is clearly the notion of a <em>set</em>.
It is an abstraction of various collective nouns in common use,
such as nation, clan, family. Each of these is a unit composed of smaller units,
And we even have a hierarchy: a nation can be a collection of clans,
each of which is a collection of families.
For further examples we have herds, and armies (with their hierarchy of divisions, regiments, etc).
None of these collections can be members of themselves.
But if we accept the <em>universal set</em> $V$, to which everything belongs,
then surely $V\in V$: it belongs to itself.
So maybe the universal set is the root of the problem.
However, this insight does not show us the way out.
Two very different solutions emerged:</p>
<ul>
<li>
<p><em>Axiomatic set theory</em>. Unrestricted set comprehension is replaced by the <em>separation axiom</em>.
You cannot just form the set $\{x\mid\phi(x)\}$ from an arbitrary property $\phi(x)$;
you can only form <strong>subsets</strong> of some existing set $A$ as $\{x\in A\mid\phi(x)\}$.
Axioms are provided allowing the construction of sets according to certain specific principles.
Now $R$ cannot be constructed, and there is no universal set.
For technical purposes, a further axiom is usually assumed: to forbid
nonterminating membership chains,
such as $x\in y\in x$ and $\cdots \in x_3\in x_2\in x_1$.
Thus, no set can be a member of itself.
This route is due to <a href="https://plato.stanford.edu/entries/zermelo-set-theory/">Zermelo</a>
and Fraenkel.</p>
</li>
<li>
<p><em>Type theory</em>.
A type hierarchy is introduced to classify all values, and $x\in y$
can only be written if the type of $y$ is higher than that of $x$.
It is thus forbidden to write $x\in x$.
With types there is no universal set either, but there are universal sets for each type.
This route is due to Whitehead and Russell, who further complicated their type theory
to enforce the “<a href="https://plato.stanford.edu/entries/russell-paradox/#ERP">vicious circle principle</a>”,
which they saw as the root of all paradoxes.
Their <a href="https://plato.stanford.edu/entries/type-theory/#RamiHierImprPrin">ramified type theory</a>
turned out to be unworkable.
Simplified by <a href="https://plato.stanford.edu/entries/ramsey/">Frank Ramsey</a>
and formalised by Alonzo Church,
it became <em>higher-order logic</em> as used today.</p>
</li>
</ul>
<p>Modern constructive type theories, such as Martin-Löf’s,
amalgamate ideas from both approaches,
providing a richer language of types and giving them a prominent role.</p>
<p>Of other approaches, one must mention
Quine’s <a href="https://plato.stanford.edu/entries/quine-nf/">New Foundations</a>.
He aimed to have a universal set containing itself as an element.
In order to prevent Russell’s paradox, he introduced the notion of a <em>stratified</em> formula,
a kind of local type checking that prohibited $\{x \mid x\not\in x\}$.
The problem was, nobody was sure for decades whether NF was consistent or not,
making it a rather scurvy candidate for the foundations of mathematics.</p>
<h3 id="what-was-its-impact">What was its impact?</h3>
<p>Russell’s paradox comes from Cantor’s theorem, which states that
there is no injection from the powerset ${\cal P}(A)$ of a given $A$ into $A$ itself.
(There is no way to assign each element of ${\cal P}(A)$
a unique element of $A$.) But if $V$ is the universal set,
then ${\cal P}(V)$ is actually a subset of $V$, contradiction.
Or, to quote Gödel:</p>
<blockquote>
<p>By analyzing the paradoxes to which Cantor’s set theory had led, he freed them from all mathematical technicalities, thus bringing to light the amazing fact that our logical intuitions (i.e., intuitions concerning such notions as: truth, concept, being, class, etc.) are self-contradictory.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p>This was huge. Mathematicians and philosophers have taken for granted that a <em>concept</em>
(i.e., property)
and the corresponding <em>class</em> (i.e., set) were more or less interchangeable.
The concept of red could be identified with the class of all red things;
the concept of an even number could be identified with the class of even numbers.
The universal class could be defined as the class of all $x$ satisfying $x=x$.
In fact, we can still do this. But people took for granted that these classes
were entities in themselves, and in particular, could belong to other classes.
That is what had to be sacrificed.</p>
<p>As late as the early 20th century, when Whitehead and Russell were writing
<a href="https://plato.stanford.edu/entries/principia-mathematica/">Principia Mathematica</a>,
the words <em>class</em> and <em>set</em> were synonymous.
Today, especially in the context of set theory,
a <em>class</em> is the collection of everything satisfying a specific property,
but only sets actually <strong>exist</strong>.
A reference to some proper class – say the universal class, or the class of ordinals –
is invariably described using the phrase “only a <em>façon de parler</em>”:
a manner of speaking, nothing more.</p>
<p>George Boolos <a href="https://www.jstor.org/stable/4545060">has pointed out</a>
that Russell’s paradox did not impact
Frege’s work in any significant way. Frege had indeed assumed
unrestricted set comprehension, the fatal principle that leads to Russell’s paradox.
But he used it only once, to derive a much weaker consequence
that he could have taken as an axiom instead. The paradox
did not damage Frege’s work, which survives today as the predicate calculus.
However, it laid waste to his intellectual worldview.</p>
<h3 id="wider-ramifications">Wider ramifications</h3>
<p>Russell’s paradox was about as welcome as a bomb at a wedding.
Decades later, the dust still had not settled.
Russell collected a list of other paradoxes, the most serious being
Burali-Forti’s: the set $\Omega$ of ordinal numbers is itself an ordinal number,
and therefore $\Omega\in \Omega$, which implies $\Omega<\Omega$.</p>
<p>The first part of the 20th century saw the publication of Zermelo’s
axioms for set theory, in which he introduced his separation axiom, and much more controversially, his <em>axiom of choice</em>.
In that febrile time, many had no appetite for further risk-taking
in the form of this radical new axiom.
Whitehead and Russell formalised a significant chunk of mathematics using their type theory.
Hilbert announced his programme for proving the consistency of mathematics,
but the incompleteness and undecidability results of the 1930s put an end to such ideas.
By the 1960s, we had learned that fundamental questions – such as the status of the axiom of choice and the <a href="https://plato.stanford.edu/entries/continuum-hypothesis/">continuum hypothesis</a> – could not be settled
using the axioms of Zermelo-Fraenkel set theory.</p>
<p>Russell’s paradox also made its appearance in Alonzo Church’s $\lambda$-calculus.
Church originally conceived his system as a new approach to logic,
in which sets were encoded by their characteristic functions: $MN$
meant that $N$ was an element of $M$,
while $\lambda x. M$ denoted unrestricted set comprehension over the predicate $M$.
Church devised techniques for encoding Boolean values and operations
within the $\lambda$-calculus.
However, Haskell Curry noticed that the Russell set $R$
could be expressed as $\lambda x. \neg (x x)$.
He thereby obtained a contradiction, $RR = \neg(RR)$.
Generalising from negation to an arbitrary function symbol,
Curry obtained his famous $Y$-combinator.
This gave Church’s $\lambda$-calculus tremendous expressivity,
but rendered his logic inconsistent.</p>
<h3 id="however">However</h3>
<p>Ludwig Wittgenstein wasn’t much bothered by contradictions. He wrote, with his usual lucidity,</p>
<blockquote>
<p>If a contradiction were now actually found in arithmetic that would only prove that an arithmetic with such a contradiction in it could render very good service; and it will be better for us to modify our concept of the certainty required, than to say that it would really not yet have been a proper arithmetic.</p>
</blockquote>
<p>This means apparently that what you don’t know can’t hurt you.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Kurt Gödel, <a href="https://doi.org/10.1017/CBO9781139171519.024">Russell’s mathematical logic</a>. <em>In</em>: P Benacerraf, H Putnam (eds), <em>Philosophy of Mathematics: Selected Readings</em> (CUP, 1984), 447–469. I have already posted this quotation twice before (on <a href="/2023/04/12/Wittgenstein.html">Wittgenstein</a> and then on the <a href="/2023/11/01/Foundations.html">foundations of mathematics</a>). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 31 Jan 2024 00:00:00 +0000
https://lawrencecpaulson.github.io//2024/01/31/Russells_Paradox.html
https://lawrencecpaulson.github.io//2024/01/31/Russells_Paradox.htmlCoinductive puzzle, by Jasmin Blanchette and Dmitriy Traytel<p>Coinduction has a reputation for being esoteric, but there are some situations where it is close to
indispensable. One such scenario arose during an Isabelle refinement proof for a verified automatic
theorem prover for first-order logic, <a href="https://doi.org/10.1145/3293880.3294100">a proof that also involved Anders Schlichtkrull</a>. In that
prover, execution traces can be finite or infinite, reflecting the undecidability of first-order
logic.
The refinement proof involves a simulation argument between two layers: an abstract specification
and a concrete theorem prover, both given as transition systems (i.e., binary relations over
states). A single “big” step of the concrete prover represents an entire iteration of the prover’s
main loop and may therefore correspond to multiple “small” steps of the abstract prover.</p>
<p>The simulation proof requires relating the concrete layer with the abstract layer. The concrete
“big-step” sequence is of the form $St_0 \leadsto^+ St_1 \leadsto^+ St_2 \leadsto^+ \cdots$, where the
$St_i$’s are states and $\leadsto^+$ is the transitive closure of the abstract transition system.
However, to complete the refinement, we must obtain a “small-step” sequence $St_0 \leadsto \cdots
\leadsto St_1 \leadsto \cdots \leadsto St_2 \leadsto \cdots$.</p>
<p>If the big-step sequence is finite, the existence of the small-step sequence can be proved using
induction. But in our semidecidable scenario, sequences may be infinite. One way to cope with this
is to use coinductive methods. This blog entry presents a solution to this coinductive puzzle.</p>
<h3 id="preliminaries">Preliminaries</h3>
<p>To represent possibly infinite sequences of states, we use the coinductive datatype of lazy lists:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>codatatype 'a llist = LNil | LCons 'a "'a llist"
</code></pre></div></div>
<p>Intuitively, lazy lists are like ordinary finite lists, except that they also allow infinite values
such as <code class="language-plaintext highlighter-rouge">LCons 0 (LCons 1 (LCons 2 ...))</code>. However, the reasoning principles for
coinductive types and predicates are rather different from their inductive counterparts, as we will
see in a moment.</p>
<p>Let us review some useful vocabulary for lazy lists. First, the selectors <code class="language-plaintext highlighter-rouge">lhd : 'a llist -> 'a</code> and
<code class="language-plaintext highlighter-rouge">ltl : 'a llist -> 'a llist</code> return the head and the tail, respectively, of an <code class="language-plaintext highlighter-rouge">LCons</code> value. For an
<code class="language-plaintext highlighter-rouge">LNil</code> value, <code class="language-plaintext highlighter-rouge">lhd</code> returns a <a href="https://lawrencecpaulson.github.io/2021/12/01/Undefined.html">fixed arbitrary value</a> and <code class="language-plaintext highlighter-rouge">ltl</code> returns <code class="language-plaintext highlighter-rouge">LNil</code>. Then
the function <code class="language-plaintext highlighter-rouge">llast : 'a llist -> 'a</code> returns the last value of a finite lazy list. If there is no
such value, because the lazy list is either empty or infinite, <code class="language-plaintext highlighter-rouge">llast</code> returns a fixed arbitrary
value. Next, the function <code class="language-plaintext highlighter-rouge">prepend</code> concatenates a finite list and a lazy list:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>fun prepend :: "'a list -> 'a llist -> 'a llist" where
"prepend [] ys = ys"
| "prepend (x # xs) ys = LCons x (prepend xs ys)"
</code></pre></div></div>
<p>In the simulation proof, we do not work with arbitrary lazy lists but with nonempty lazy lists
whose consecutive elements are related by the small-step or big-step transition relation. To
capture this restriction, we use a coinductive predicate that characterizes such chains:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>coinductive chain :: "('a ⇒ 'a ⇒ bool) ⇒ 'a llist ⇒ bool" for R :: "'a ⇒ 'a ⇒ bool" where
"chain R (LCons x LNil)"
| "chain R xs ⟹ R x (lhd xs) ⟹ chain R (LCons x xs)"
</code></pre></div></div>
<p>The predicate has two introduction rules, one for singleton chains and one for longer chains. Had
we worked with finite lists instead of lazy lists, we would have written the same definition
replacing the <code class="language-plaintext highlighter-rouge">coinductive</code> keyword with <code class="language-plaintext highlighter-rouge">inductive</code>. The magic of coinduction allows us to apply
the second introduction rule infinitely often. This is necessary when proving that an infinite lazy
list forms a chain.</p>
<p>The big-step sequence should be a subsequence of the small-step sequence. We formalize coinductive
subsequences via the predicate <code class="language-plaintext highlighter-rouge">emb</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>coinductive emb :: "'a llist ⇒ 'a llist ⇒ bool" where
"lfinite xs ⟹ emb LNil xs"
| "emb xs ys ⟹ emb (LCons x xs) (prepend zs (LCons x ys))"
</code></pre></div></div>
<p>Our definition requires that finite lazy lists may not be embedded in infinite lazy lists. In our
application, this matters because we want to ensure that only finite small-step sequences can
simulate finite big-step sequences.</p>
<p>In Isabelle, a coinductive predicate <code class="language-plaintext highlighter-rouge">P</code> is accompanied by corresponding coinduction principles that
allow us to prove positive statements of the form <code class="language-plaintext highlighter-rouge">P ...</code>. For <code class="language-plaintext highlighter-rouge">chain</code> and <code class="language-plaintext highlighter-rouge">emb</code> we obtain the
following principles:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X xs ⟹
(⋀xs'. X xs' ⟹
(∃x. xs' = LCons x LNil) ∨
(∃xs x. xs' = LCons x xs ∧ (X xs ∨ chain R xs) ∧ R x (lhd xs))) ⟹
chain R xs
X xs ys ⟹
(⋀xs' ys'.
X xs' ys' ⟹
(∃ys. xs' = LNil ∧ ys' = ys ∧ lfinite ys) ∨
(∃xs ys x zs. xs' = LCons x xs ∧ ys' = prepend zs (LCons x ys) ∧ (X xs ys ∨ emb xs ys))) ⟹
emb xs ys
</code></pre></div></div>
<p>These principles embody the fact that <code class="language-plaintext highlighter-rouge">chain</code> and <code class="language-plaintext highlighter-rouge">emb</code> are the greatest (“most true”) predicates
stable under the application of their respective introduction rules. For example for <code class="language-plaintext highlighter-rouge">emb</code>, given a
binary relation <code class="language-plaintext highlighter-rouge">X</code> stable under <code class="language-plaintext highlighter-rouge">emb</code>’s introduction rules, any arguments satisfying <code class="language-plaintext highlighter-rouge">X</code> also
satisfy <code class="language-plaintext highlighter-rouge">emb</code>. Stability under introduction rules means that for any arguments <code class="language-plaintext highlighter-rouge">xs'</code> and <code class="language-plaintext highlighter-rouge">ys'</code>
satisfying <code class="language-plaintext highlighter-rouge">X</code> that correspond to the arguments of <code class="language-plaintext highlighter-rouge">emb</code> in either one of <code class="language-plaintext highlighter-rouge">emb</code>’s two introduction
rules, the arguments of the self-calls also satisfy <code class="language-plaintext highlighter-rouge">X</code>.</p>
<h3 id="the-main-theorem">The main theorem</h3>
<p>We are now ready to state our desired theorem:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lemma "chain R⇧+⇧+ xs ⟹ ∃ys. chain R ys ∧ emb xs ys ∧ lhd ys = lhd xs ∧ llast ys = llast xs"
</code></pre></div></div>
<p>In words, given a big-step sequence <code class="language-plaintext highlighter-rouge">xs</code> whose consecutive elements are related by the transitive
closure <code class="language-plaintext highlighter-rouge">R⇧+⇧+</code> of a relation <code class="language-plaintext highlighter-rouge">R</code>, there exists a small-step sequence <code class="language-plaintext highlighter-rouge">ys</code> whose consecutive
elements are related by <code class="language-plaintext highlighter-rouge">R</code>. The small-step sequence must embed, using <code class="language-plaintext highlighter-rouge">emb</code>, the big-step
sequence. In addition, the sequences’ first and last elements must coincide. If both sequences are
infinite, their last elements are equal by definition to the same fixed arbitrary value, as
explained above.</p>
<h3 id="the-proof">The proof</h3>
<p>To prove the theorem, we instantiate the existential quantifier with a witness, which we define
corecursively:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>corec wit :: "('a ⇒ 'a ⇒ bool) ⇒ 'a llist ⇒ 'a llist" where
"wit R xs = (case xs of LCons x (LCons y xs) ⇒
LCons x (prepend (pick R x y) (wit R (LCons y xs))) | _ ⇒ xs)"
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">wit</code> function fills the gaps between consecutive values of the big-step sequence with
arbitrarily chosen intermediate values that form finite chains. We use Hilbert’s choice operator to
construct these chains:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>definition pick :: "('a ⇒ 'a ⇒ bool) ⇒ 'a ⇒ 'a ⇒ 'a list" where
"pick R x y = (SOME xs. chain R (llist_of (x # xs @ [y])))"
</code></pre></div></div>
<p>Here, <code class="language-plaintext highlighter-rouge">llist_of</code> converts finite lists to lazy lists, which allows us to reuse the <code class="language-plaintext highlighter-rouge">chain</code>
predicate. The <code class="language-plaintext highlighter-rouge">pick</code> function is characterized by the following property:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>lemma "R⇧+⇧+ x y ⟹ chain R (llist_of (x # pick x y @ [y]))"
</code></pre></div></div>
<p>Going back to <code class="language-plaintext highlighter-rouge">wit</code>’s definition, we may wonder why Isabelle accepts it in the first place. The
definition is not obviously productive, a requirement that would ensure its totality. Productive
definitions generate at least one constructor after calling themselves. In our case, <code class="language-plaintext highlighter-rouge">LCons</code> is
that constructor, but <code class="language-plaintext highlighter-rouge">prepend</code> stands in the way and could potentially destroy constructors
produced by the self-call to <code class="language-plaintext highlighter-rouge">wit</code>. However, we know that <code class="language-plaintext highlighter-rouge">prepend</code> is friendly enough to only add
constructors and not to remove them. In Isabelle, we can register it as a <a href="https://doi.org/10.1007/978-3-662-54434-1_5">“friend”</a>, which
convinces our favorite proof assistant to accept the above definition.</p>
<p>It remains to prove the four conjuncts of our main theorem, taking <code class="language-plaintext highlighter-rouge">ys</code> to be <code class="language-plaintext highlighter-rouge">wit R xs</code>.</p>
<p>First, we prove <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ lhd (wit R xs) = lhd xs</code> by simple rewriting.</p>
<p>Second, we attempt to show <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ emb xs (wit R xs)</code> using <code class="language-plaintext highlighter-rouge">emb</code>’s coinduction principle.
To this end, Isabelle’s coinduction proof method instantiates <code class="language-plaintext highlighter-rouge">X</code> with the canonical relation
<code class="language-plaintext highlighter-rouge">λxs ys. ys = wit xs ∧ chain R⇧+⇧+ xs</code>. After some simplification, we arrive at a goal requiring us to prove
<code class="language-plaintext highlighter-rouge">(∃zs. LCons x (prepend (pick x y) (wit (LCons y xs))) = prepend zs (LCons x (wit (LCons y xs))))</code>
whose two side have the <code class="language-plaintext highlighter-rouge">prepend</code> in different positions (on one side before the <code class="language-plaintext highlighter-rouge">x</code>, on the other side after). We would like to insert a second <code class="language-plaintext highlighter-rouge">prepend zs'</code> (where <code class="language-plaintext highlighter-rouge">zs'</code> would be existentially quantified) on the right-hand side, so that we can instantiate <code class="language-plaintext highlighter-rouge">zs</code> with the empty list and <code class="language-plaintext highlighter-rouge">zs'</code> with <code class="language-plaintext highlighter-rouge">pick x y</code>, making both sides equal.
We can achieve this by modifying <code class="language-plaintext highlighter-rouge">X</code> to be <code class="language-plaintext highlighter-rouge">λxs ys. ∃zs'. ys = prepend zs' (wit xs) ∧ chain R⇧+⇧+ xs</code>.
A more principled alternative is to manually derive the following generalized coinduction principle, which inserts <code class="language-plaintext highlighter-rouge">prepend zs'</code> at the right place:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X xs ys ⟹
(⋀xs' ys'.
X xs' ys' ⟹
(∃ys. xs' = LNil ∧ ys' = ys ∧ lfinite ys) ∨
(∃xs ys x zs zs'. xs' = LCons x xs ∧ ys' = prepend zs (LCons x (prepend zs' ys)) ∧ (X xs ys ∨ emb xs ys))) ⟹
emb xs ys
</code></pre></div></div>
<p>This approach is an instance of a general technique called <a href="https://doi.org/10.1017/CBO9780511792588.007">coinduction up to</a>.</p>
<p>Third, we need to prove <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ llast (wit R xs) = llast xs</code>. Since we now know that
<code class="language-plaintext highlighter-rouge">emb xs (wit R xs)</code> holds, by definition of <code class="language-plaintext highlighter-rouge">emb</code> only two cases are possible. Either both
<code class="language-plaintext highlighter-rouge">wit R xs</code> and <code class="language-plaintext highlighter-rouge">xs</code> are finite lazy lists, in which case the property follows by induction, or both
are infinite, in which case their last elements are equal to the notorious fixed arbitrary value.</p>
<p>Fourth, when attempting to prove <code class="language-plaintext highlighter-rouge">chain R⇧+⇧+ xs ⟹ chain R (wit R xs)</code>, we run into a similar issue
as in the proof of the second conjunct. The resolution is also similar. We manually derive a
coinduction-up-to principle for <code class="language-plaintext highlighter-rouge">chain</code> with respect to <code class="language-plaintext highlighter-rouge">prepend</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>X xs ⟹
(⋀xs'. X xs' ⟹
(∃x. xs' = LCons x LNil) ∨
(∃xs x zs'. xs' = LCons x (prepend zs' xs) ∧ (X xs ∨ chain R xs) ∧ chain R (llist_of (y # zs' @ [lhd xs])))) ⟹
chain R xs
</code></pre></div></div>
<p>This principle additionally involves a generalization of the side condition <code class="language-plaintext highlighter-rouge">R y (lhd xs)</code> to
<code class="language-plaintext highlighter-rouge">chain R (llist_of (y # zs' @ [lhd xs]))</code> to incorporate <code class="language-plaintext highlighter-rouge">zs'</code>.</p>
<h3 id="conclusion">Conclusion</h3>
<p>We successfully solved the coinductive puzzle that arose during our verification of an automatic
theorem prover. At its core, the puzzle has little to do with theorem proving; instead, it is about
refinement of possibly nonterminating transition systems. Our proof can be found in the <a href="https://devel.isa-afp.org/sessions/ordered_resolution_prover/#Lazy_List_Chain.html#Lazy_List_Chain.chain_tranclp_imp_exists_chain|fact">AFP</a>. On the plus side, Isabelle conveniently
allowed us to define all the functions and predicates we needed to carry out the proof, including
functions whose productivity relied on “friends”. On the minus side, the proof of an easy-looking
theorem required some ingenuity. In particular, we found ourselves deriving coinduction-up-to
principles for coinductive predicates manually to use them for definitions involving “friends”. An
avenue for future work would be to derive such principles automatically.</p>
Wed, 08 Nov 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/11/08/CoinductivePuzzle.html
https://lawrencecpaulson.github.io//2023/11/08/CoinductivePuzzle.htmlWhat do we mean by "the foundations of mathematics"?<p>The phrase “foundations of mathematics” is bandied about frequently these days,
but it’s clear that there is widespread confusion about what it means.
Some say that a proof assistant must be based on a foundation of mathematics,
and therefore that the foundations of mathematics refers to some sort of formal system.
And yet, while set theory is frequently regarded as <em>the</em> foundation of mathematics,
none of the mainstream proof assistants are based on set theory.
These days we see everything from category theory to homotopy type theory
described as a possible foundation of mathematics.
There is a lot of wrongness here.</p>
<h3 id="what-we-mean-by-the-foundations-of-mathematics">What we mean by the foundations of mathematics?</h3>
<p>N. G. de Bruijn made the remarkable claim</p>
<blockquote>
<p>We do not possess a workable definition of the word “mathematics”. (AUT001, p. 4)</p>
</blockquote>
<p>He seemed to be referring primarily to the difficulty of defining
<em>mathematical reasoning</em>, but the dictionary definition – “the abstract science of number, quantity, and space” – does not begin to scratch the surface
of the topics studied by mathematicians, such as groups, abstract topologies,
graphs or even finite sets. If we can’t define mathematics, neither can we define
the notion of mathematical foundations.</p>
<p>One solution to this difficulty is to say, “I can’t define it but I know what it is
when I see it”. This has famously been applied to pornography and even there does not
settle the question in the case of something like
Titian’s <a href="https://en.wikipedia.org/wiki/Venus_of_Urbino">Venus d’Urbino</a>.
Mathematical reasoning can be wrong or doubtful while still being great mathematics;
Newton and Euler used infinitesimals and other methods generally rejected today.
Crank attempts to square the circle or prove the Riemann hypothesis
often look like mathematics while saying nothing.</p>
<p>The foundations of mathematics is concerned with questions of the form
“does this even make sense”? It seems to be triggered by periodic crises:</p>
<ul>
<li>the existence of irrational numbers</li>
<li>Berkeley’s <a href="https://plato.stanford.edu/entries/continuity/">criticism of infinitesimals</a></li>
<li>the infinite</li>
<li>the discovery of non-Euclidean geometries</li>
<li>Russell’s paradox (1901) and many others</li>
</ul>
<p>The story of Pythagoras trying to suppress the shocking
discovery of irrational numbers such as $\sqrt2$,
the ratio of the diagonal of a square to its side, is probably mythical.
But it seems that <a href="https://plato.stanford.edu/entries/dedekind-foundations/">they noticed</a>:</p>
<blockquote>
<p>The Greeks’ response to this startling discovery culminated in Eudoxos’ theory of ratios and proportionality, presented in Chapter V of Euclid’s Elements.</p>
</blockquote>
<p>The nature of the real numbers was still not clear in the 19th century.
Richard Dedekind devoted himself to this problem,
inventing the famous <a href="https://en.wikipedia.org/wiki/Dedekind_cut">Dedekind cuts</a>:
downwards-closed sets of rational numbers. Cantor independently chose to define
real numbers as equivalence classes of Cauchy sequences.
The point is not that a real number <em>is</em> either of those things, but simply that
we can present specific constructions exhibiting the behaviour expected of the real numbers.</p>
<p>Cantor’s work on set theory is well known. Dedekind also made major contributions, in his
by <a href="https://plato.stanford.edu/entries/dedekind-foundations/"><em>Was Sind und Was Sollen die Zahlen</em> </a>.
Their aim was finally to pin down the precise meaning of concepts such as function,
relation and class, and how to make sense of infinite collections and infinite constructions.</p>
<p>Berkeley’s attack on infinitesimals resulted in a concerted effort to banish them
in favour of $\epsilon$-$\delta$ arguments (hated by many), which remind me of
challenge-response protocols in computer science. As I’ve <a href="/2022/08/10/Nonstandard_Analysis.html">noted previously</a> on this blog,
today – thanks to set theory – we have the theoretical tools to place infinitesimals
on a completely rigorous basis.</p>
<h3 id="the-paradoxes-and-the-solutions">The paradoxes and the solutions</h3>
<p>The <a href="https://plato.stanford.edu/entries/settheory-early/#CritPeri">paradoxes of set theory</a>,
discovered around the turn of the 20th century, aroused huge disquiet. Although I have
posted this exact quote <a href="/2023/04/12/Wittgenstein.html">previously</a>, there is no better description than Gödel’s:</p>
<blockquote>
<p>By analyzing the paradoxes to which Cantor’s set theory had led, he freed them from all mathematical technicalities, thus bringing to light the amazing fact that our logical intuitions (i.e., intuitions concerning such notions as: truth, concept, being, class, etc.) are self-contradictory.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>
</blockquote>
<p>Russell’s paradox was seen as a potentially fatal blow to much of the 19th century
foundational work, including that of Frege, Dedekind and Cantor.
Russell (and Whitehead) decided to continue in the spirit of Frege’s <em>Logicist</em>
programme of reducing mathematics to logic. But does this make sense? I can imagine
De Bruijn saying</p>
<blockquote>
<p>We do not possess a workable definition of the word “logic”.</p>
</blockquote>
<p>The system they created in the multivolume
<a href="https://plato.stanford.edu/entries/principia-mathematica/"><em>Principia Mathematica</em></a>
was needlessly complicated and in some respects curiously imprecise.
But it led to today’s first-order logic and especially higher-order logic.
Russell and Whitehead formalised some chunks of mathematics in great detail
and with immense tedium, but they could not have predicted how powerful
their system would turn out to be.</p>
<p>Many philosphers had contemplated the essence of mathematics in prior centuries,
but the crisis gave the issues urgency. Roughly speaking, there are three main schools of thought:</p>
<ul>
<li>The <em>Platonist</em> or <em>realist</em> viewpoint: ideal mathematical objects, such as the complex plane, exist objectively and independently of us, though we may deduce their properties. Gödel held this view.</li>
<li>The <em>formalist</em> viewpoint: mathematics is concerned with symbols. For Hilbert,
I think that <a href="https://plato.stanford.edu/entries/hilbert-program/">his programme</a>
was a technical approach to abolish the paradoxes rather than
an expression of his true beliefs. How can one person adhere to
a <a href="https://plato.stanford.edu/entries/hilbert-program/#2">finitary point of view</a>
and simultaneously describe Cantor’s world of transfinite ordinals and cardinals
as a paradise? But it seems that others, such as Curry, regarded mathematics
as nothing but a symbolic game.</li>
<li>The <a href="https://plato.stanford.edu/entries/intuitionism/"><em>intuitionists</em></a> held that mathematical objects were nothing but creations of the human mind.
This gave them a radical attitude to proof and the wholesale rejection of many techniques and concepts regarded by others as indispensable.
Their rejection of the reality of mathematical objects and their stance against
symbolic formulas (other than as a means of communicating ideas)
set them firmly against the other schools.</li>
</ul>
<p>It seems clear from the reactions of Frege, Russell, Hilbert, Brouwer and many others
that the paradoxes constituted an emergency. Russell’s “vicious circle principle”
and his solution, namely ramified type theory, Brouwer’s intuitionism and Hilbert’s formalism
– these were the equivalent of burning all your clothes and furniture upon the discovery of bedbugs.
That the solution could lie in something as simple as
<a href="https://plato.stanford.edu/entries/zermelo-set-theory/">Zermelo’s separation axiom</a>
and the conception of the <a href="/papers/Boolos-iterative-concept-of-set.pdf">cumulative hierarchy of sets</a><sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>
was seemingly not anticipated.
It was a miracle.</p>
<h3 id="modern-foundations-of-mathematics">Modern foundations of mathematics</h3>
<p>Today one commonly sees all kinds of things described as “foundations of mathematics”,
especially category theory and type theory. Foundational work has definitely been done
within the framework of category theory, but that is not the same thing as saying that
category theory itself is foundational. The objects in category theory are equipped with
structure and the morphisms between objects are structure preserving, just as we have homomorphisms between groups and continuous maps between topological spaces.
By contrast, classical sets have no notion of structure beyond the membership relation,
which we might regard as bare metal.
Since a large part of mathematics is concerned with structure,
category theory is a natural fit.
That does not mean, however, that it addresses foundational issues.
It tends rather to introduce new ones, especially because of its unfortunate and
needless habit of assuming the existence of proper classes everywhere.
Far from replacing set theory, it relies on it.</p>
<p>As to whether type theory is foundational, we need to ask which type theory you are talking about:</p>
<ul>
<li>Principia Mathematica: of course, that was its precise purpose. Gödel’s essay, <a href="/papers/Russells-mathematical-logic.pdf">Russell’s mathematical logic</a>,
is an indispensable source on this and related topics.</li>
<li>Church’s simple type theory: the granddaughter of PM, it is equally expressive and a lot simpler.</li>
<li>Automath: absolutely not. De Bruijn consistently referred to it as “a <em>language</em> for mathematics”. He moreover said it was “like a big restaurant that serves all sorts of food: vegetarian, kosher, or anything else the customer wants”.<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> Automath was, by design, neutral to foundational choices. (Isabelle/Pure is in the same spirit.)</li>
<li>Martin-Löf type theory: he himself said it was intended as a vehicle for formalising Bishop-style analysis, clearly a foundational claim. But one that rejects the vast majority of modern mathematics.</li>
<li>Calculus of inductive constructions (Coq, Lean): the original paper (describing a weaker system) begins “The calculus of constructions is a higher-order formalism for constructive proofs in natural deduction style,” and the paper makes no foundational claims.
Coquand’s <a href="https://www.cse.chalmers.se/~coquand/v1.pdf">retrospective paper</a> makes no such claims either.
Since it turns out to be significantly stronger than ZF set theory, one could even say it makes foundational assumptions.</li>
</ul>
<p>The world has moved on. People no longer worry about the issues that were
critical in the 19th century: the role of the real numbers, the role of infinity,
the status of infinitesimals, the very consistency of mathematics.
And the reason is simple: because Herculean work in the 19th and 20th centuries
largely banished those issues from our minds.</p>
<p>This achievement doesn’t seem to be much appreciated today.
Instead of “each real number can be understood as a set of rational numbers, and more generally, the most sophisticated mathematical constructions can be reduced
to a handful of simple principles”
people say “we are asked to believe that everything is a set”
and even “set theory is just another formal system”.</p>
<p>I give up.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>Kurt Gödel, <a href="https://doi.org/10.1017/CBO9781139171519.024">Russell’s mathematical logic</a>. <em>In</em>: P Benacerraf, H Putnam (eds), <em>Philosophy of Mathematics: Selected Readings</em> (CUP, 1984), 447–469 <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>George Boolos, <a href="https://doi.org/10.1017/CBO9781139171519.026">The iterative concept of set</a>. <em>Ibid</em>, 486–502 <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>N.G. de Bruijn. A Survey of the Project Automath. <em>In</em>: R. P. Nederpelt, J. H. Geuvers, & R. C. Vrijer, de (Eds.), <em>Selected Papers on Automath</em> (North-Holland, 1994), 144 <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
Wed, 01 Nov 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/11/01/Foundations.html
https://lawrencecpaulson.github.io//2023/11/01/Foundations.htmlThe concept of proof within the context of machine mathematics<p>This post is prompted by a preprint, <a href="https://doi.org/10.48550/arXiv.2309.11457">Automated Mathematics and the Reconfiguration of Proof and Labor</a>,
recently uploaded by <a href="https://ochigame.org">Rodrigo Ochigame</a>.
It begins by contrasting two opposing ideals of proof — what any computer scientist
would call <em>top-down</em> versus <em>bottom-up</em> – and then asks how they might have to be modified
in a possible future in which mathematics is automated.
To my way of thinking, his outlook is too negative.</p>
<h3 id="the-ideals-of-proof">The ideals of proof</h3>
<p>The two ideals, which Ochigame are tributes to <a href="https://www.pet.cam.ac.uk/news/professor-ian-macdougall-hacking-1936-2023">Ian Hacking</a>,
are as follows:</p>
<ul>
<li><em>Cartesian ideal of proof</em>: “after some reflection and study, one totally understands the proof, and can get it in one’s mind ‘all at once’”</li>
<li><em>Leibnizian ideal of proof</em>: “every step is meticulously laid out, and can be
checked, line by line, in a mechanical way”</li>
</ul>
<p>I feel divided, because I seldom feel capable of understanding a proof all at once,
and yet, having instead checked a lengthy proof line by line and getting to QED,
I feel no more enlightened than before. Perhaps many people feel this way,
and look for some compromise where they have a good idea about the mathematical tools
that were deployed in the proof, and just to be careful, meticulously verify
certain tricky or suspect calculations.</p>
<p>Ochigame himself explores a number of variations of these ideals in order to take into account
modern day complications such as phenomenally long, complex or specialised proofs.
He then outlines the history of the mechanisation of mathematical proof,
beginning with <a href="/2021/11/03/AUTOMATH.html">AUTOMATH</a>
and Mizar, and concluding with today’s systems, such as Lean and Isabelle.
Regarding these as <em>proof checkers</em> (where we are “verifying existing results”),
he then briefly outlines the history of automated theorem proving,
beginning with the work of Newell and Simon and mentioning <a href="https://lawrencecpaulson.github.io/tag/Hao_Wang">Hao Wang</a>.
And now I feel obliged to mention again that while Newell and Simon got all the glory
as AI pioneers, Wang’s system was on another planet when it came to capability.
That’s because Wang actually understood logic.
The AI world has often been driven by motives quite different from
the actual competence of a particular AI system (see also <a href="https://en.wikipedia.org/wiki/SHRDLU">SHRDLU</a>: the importance of having a demo).</p>
<h3 id="the-role-of-computer-encoded-proofs">The role of computer-encoded proofs</h3>
<p>Since most theorem provers work by reducing every claim
to a string of low-level inferences in some built-in calculus,
and since they don’t understand anything, we expect them to be firmly on the Leibnizian side.
Ochigame proposes the following</p>
<ul>
<li><em>Practical standard of computer-encoded proofs</em>: every step can be checked by a computer program and derived from the axiomatic foundations of the program; and after some study, one understands or trusts the encoding of the proven statement.</li>
</ul>
<p>This formulation is natural enough, but I can imagine that mathematicians would be
dissatisfied: it gives them no way to survey the proof themselves.
They are forced to trust the computer program, its axiomatic foundations
and even the underlying hardware, and realistically, they are going to have
to trust the encoding of the proven statement as well.</p>
<p>Isabelle has supported
legibility since Makarius Wenzel introduced
his <a href="https://rdcu.be/dngL4">Isar structured language</a> in 1999.
Through this blog I have published <a href="https://lawrencecpaulson.github.io/tag/examples">numerous examples</a>
to demonstrate how much legibility you can obtain if you try.
Too often, people don’t try. Incidentally, there is nothing about Isar inherently
specific to Isabelle/HOL: it works for all of Isabelle’s incarnations,
and I believe it could be adopted by Lean or Coq without modifying the underlying formalism.
The chief difficulty is that a more sophisticated user interface would be required;
an Isar proof is not simply a series of tactic invocations.</p>
<p>My ALEXANDRIA colleagues and I have formalised an enormous amount
of advanced mathematics, but we were never satisfied with formalisation alone;
we wanted our proofs to be legible. A mathematician still has to learn
the Isabelle notation, but then should be able to read the proof
without the aid of a computer. With existing automation, the computer
seldom sees further than a mathematician, rather the opposite:
we have to spell out many things
that a mathematician would find obvious.
At the moment, the chief exceptions are lengthy calculations and occasionally, large case analyses. If the time ever came that automation could find truly deep proofs,
we would have to insist that it delivered intelligible justifications.</p>
<h3 id="the-future-of-formalised-mathematics">The future of formalised mathematics</h3>
<p>Ochigame presents a bleak future in which formalisation becomes obligatory
for mathematicians, with formalisers distinct from the mathematicians themselves
and forming an underclass. The military origins of formal verification
are also mentioned, in a vaguely ominous way.</p>
<p>I see the future differently. As proof assistants become more useful,
and as more mathematicians become aware of them, their use will grow organically.
Journals may eventually start to request formalisations of some material,
but it’s likely that there will always be mathematics not easily formalisable
in any existing system.</p>
<h3 id="and-another-thing-why-is-it-always-about-proofs">And another thing: why is it always about proofs?</h3>
<p>Mathematics is too often presented as a discipline in which axioms
are laid down and theorems proved from them. Sometimes, axioms are even conflated
with beliefs, but I’m not going there today. Instead I would like to remark
(as I have <a href="/2023/04/12/Wittgenstein.html">done before</a>)
the genius in mathematics typically lies in the definitions, not in the proofs.
For example, <a href="https://en.wikipedia.org/wiki/Szemerédi_regularity_lemma">Szemerédi’s regularity lemma</a>
is a straightforward proof — some calculations and an induction —
relying on an extraordinary string of definitions.
Why should we care about edge density? How did he come up with ε-regular pairs of sets,
ε-regular partitions, the energy of a partition?
How did he come up with the theorem statement?
His genius was grasping the importance of these concepts.</p>
<p>The central importance of definitions like these gives something of a pass
to those proof assistants (most of them) that don’t support legible proofs:
if the definitions are there, you must be on the right track.</p>
<h3 id="postscript">Postscript</h3>
<p>I have a distant memory of NG de Bruijn (visiting Caltech in 1977) describing the “mathematics assembly-line”. He wrote down the word Genius, then an arrow pointing to “first-rate mathematician”, then I believe a further arrow pointing to “student”, a further arrow pointing to “journal” and there he drew a little tombstone. To my mind this conjures up the genius who has the ideas and the junior colleagues who fill in the details to make the work publishable.
(And yet, he himself seems to have published almost exclusively as <a href="https://www.semanticscholar.org/author/de-Ng-Dick-Bruijn/66031417/">sole author</a>.)
Conceivably, formalisation will begin to play some role on the journey from the genius to the grave.</p>
Wed, 04 Oct 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/10/04/Ochigame.html
https://lawrencecpaulson.github.io//2023/10/04/Ochigame.htmlThe End (?) of the ALEXANDRIA project<p>Today marks the final day of the <a href="https://www.cl.cam.ac.uk/~lp15/Grants/Alexandria/">ALEXANDRIA</a> project.
I outlined a brief history of the project
<a href="/2023/04/27/ALEXANDRIA_outcomes.html">not long ago</a>.
It is nevertheless right to take a moment to thank
the European Research Council
<a href="https://cordis.europa.eu/project/id/742178">for funding it</a>
and to state, yet again, what the outcomes were.
Six years on, what have we learned?</p>
<h3 id="how-it-started">How it started</h3>
<p>A milestone for the start of the project is the <a href="https://www.newton.ac.uk/event/bpr/"><em>Big Proof</em> programme</a>,
organised by the Newton Institute in Cambridge. Its theme mentioned two then recent
and widely-admired achievements:</p>
<blockquote>
<p>Interactive proof assistants have been used to check complicated mathematical proofs such as those for the Kepler’s conjecture and the Feit-Thompson odd order theorem.</p>
</blockquote>
<p>It then refers to</p>
<blockquote>
<p>the challenges of bringing proof technology into mainstream mathematical practice</p>
</blockquote>
<p>and it lists specifically</p>
<ol>
<li>Novel pragmatic foundations for representing mathematical knowledge and vernacular inspired by set theory, category theory, and type theory.</li>
<li>Large-scale formal mathematical libraries that capture background knowledge spanning a range of domains</li>
</ol>
<p>A proposal for a programme
devoted entirely to <a href="https://homotopytypetheory.org">homotopy type theory</a> (HoTT)
had been rejected, but people from that community were invited to <em>Big Proof</em>.
Dependent type theory, whether HoTT or the already established type theory of Coq,
was widely assumed to be the future of the formalisation of mathematics.
I felt very lucky to get funding for a project involving simple type theory
and <a href="https://isabelle.in.tum.de">Isabelle/HOL</a>.</p>
<p>During the programme, prior formalisation efforts were criticised as lacking sophistication.
As Kevin Buzzard pointedly noted,
researchers had formalised long proofs about simple objects, but no one had formalised
<em>even the definitions</em> of more complicated objects used every day,
such as Grothendieck schemes.
Much existing work formalised 19th-century mathematics.</p>
<p>These complaints would have to be tackled.</p>
<h3 id="how-it-went">How it went</h3>
<p>I chronicled the project in my <a href="/2023/04/27/ALEXANDRIA_outcomes.html">previous post</a>.
Briefly: we formalised heaps of mathematics.
We also did groundbreaking work on applications of information retrieval and machine learning
to formalisation.
A longer and more formal account can be found <a href="https://arxiv.org/abs/2305.14407">on arXiv</a>.</p>
<h3 id="how-it-ended-formalisation-of-mathematics">How it ended (formalisation of mathematics)</h3>
<p>The sheer amount of new formalised material is impressive (and the quality is also high):</p>
<ul>
<li>formalisations of advanced mathematics, including the first ever on topics such as additive combinatorics, combinatorial block designs and ordinal partition theory</li>
<li>showing that dependent types aren’t necessary to have sophisticated objects like Grothendieck schemes or ω-categories</li>
<li>tens of thousands of lines of more basic but necessary library material, e.g. on metric and topological spaces (imported from HOL Light)</li>
<li>we formalised advanced work from some of the leading mathematicians of the age: Erdős, Gowers, Roth, Szemerédi</li>
</ul>
<p>We developed some highly fruitful techniques:</p>
<ul>
<li><a href="https://rdcu.be/dkoEr">locales</a> work exceptionally well for <a href="https://www.tandfonline.com/doi/full/10.1080/10586458.2022.2062073">structuring complicated hierarchies of definitions</a></li>
<li>“dependent” constructions can typically be formalised as families of (typed) sets</li>
</ul>
<p>We arrived at some surprising conclusions:</p>
<ul>
<li>Formalising even advanced mathematics is largely a matter of perseverance.</li>
<li>Combining material from different branches of mathematics, say probability theory and graph theory or complex analysis and set theory, works fine.</li>
<li>Dependent types aren’t necessary and probably aren’t even advantageous. We aren’t the ones fighting our formalism.</li>
</ul>
<p>To be fair, <a href="https://xenaproject.wordpress.com/2020/12/05/liquid-tensor-experiment/">astonishing progress</a> has also been made by the <a href="https://leanprover.github.io">Lean</a> community.
They have been extremely active over the same period
and formalised <a href="https://leanprover-community.github.io">mountains of material</a>.</p>
<p><strong>We can safely conclude that proof assistants already offer value to mathematicians.</strong>
Although full formalisation is still not really affordable,
neither is it necessary.
You can forego proving the results that you feel confident about,
focusing your formalisation efforts on the problematical parts.</p>
<h3 id="how-it-ended-ai-techniques">How it ended (AI techniques)</h3>
<p>The proposal included a lot of speculative ideas about search
and auto completion, in particular by somehow mining
the existing libraries for “proof idioms”.
Writing the proposal in 2016, I had no idea how such things could be done.
I was lucky to attract people who were prepared to apply their specialised knowledge.
That’s how we got</p>
<ul>
<li>the <a href="https://behemoth.cl.cam.ac.uk/search/">SErAPIS search engine</a>, a one-of-a-kind tool to search the libraries even on the basis of abstract mathematical concepts</li>
<li>a tremendous amount of infrastructure to analyse the Isabelle libraries and extract information</li>
<li>a string of advanced papers on proof synthesis, auto-formalisation, an Isabelle parallel corpus and more</li>
</ul>
<p>These projects are still at the research stage, but show great promise!</p>
<h3 id="spreading-the-word">Spreading the word</h3>
<p>For more detail and links relating to everything described above,
you can visit the <a href="https://www.cl.cam.ac.uk/~lp15/Grants/Alexandria/">ALEXANDRIA</a> webpage
or read the <a href="https://arxiv.org/abs/2305.14407">project summary</a>.</p>
<p>The team has worked hard to share the knowledge we discovered. We have written</p>
<ul>
<li>13 journal articles, including half (3 out of 6) of a special issue of <em>Experimental Mathematics</em></li>
<li>15 articles in conference proceedings</li>
<li>2 refereed chapters in a <a href="https://link.springer.com/book/10.1007/978-3-030-15655-8"><em>Synthese Library</em> volume</a></li>
<li>33 formal proof developments accepted to Isabelle’s <a href="https://www.isa-afp.org"><em>Archive of Formal Proofs</em></a></li>
</ul>
<p>More are forthcoming.
In addition, we’ve worked on formalisation projects with about two dozen interns and students,
many of whom have gone on to do PhD research. We’ve given dozens of talks at variety of venues. We are open to collaboration to take our work forward.</p>
Thu, 31 Aug 2023 00:00:00 +0000
https://lawrencecpaulson.github.io//2023/08/31/ALEXANDRIA_finished.html
https://lawrencecpaulson.github.io//2023/08/31/ALEXANDRIA_finished.html