Probabilistic reasoning and formal proof
[Paul Erdős
examples
Isar
]
Many are the pitfalls awaiting anybody trying to formalise a proof, but the worst are appeals to intuition. These are typically a sign that the author can’t be bothered to outline a calculation or argument. Perhaps the claim is obvious (to them, not to you). Probabilistic claims, say about drawing coloured balls from a sack, may look particularly dubious. But as Paul Erdős has shown, such arguments can yield short, intuitive proofs that are absolutely rigorous. To formalise them simply requires a bit of measure theory boilerplate.
A simple example
Let’s consider an example from the website Cut the Knot, created by Alexander Bogomolny. He credits the example to a 1963 paper by Erdős (I could not work out which one; your hint would be welcome). It goes as follows:
Let $A_k$, for $k = 1, \ldots, m$, be a family of $n$element subsets of a set $X$. If $m < 2^{n1}$, then there exists a bichromatic colouring^{1} of $X$ such that no $A_k$ is monochromatic.
And here’s the proof, as presented by Bogomolny:
Let $\cal F$ be a collection of $n$sets (sets with exactly $n$ elements), and assume that $\vert\cal F\vert = m < 2^{n1}$. Colour $X$ randomly with two colours, all colourings being equally likely. For $A\in \cal F$ let $E_A$ be the event that $A$ is monochromatic. Since there are two such colourings and $A = n$, probability $P(E_A)$ of the event $E_A$ is given by $P(E_A) = 2\times 2^{n} = 2^{1n}$.
Since the events $E_A$ are not necessarily disjoint, $P(\bigcup_{A\in\cal F} E_A) \le \sum_{A\in\cal F} P(E_A) = m\times2^{1n} < 1$.
So the probability that at least one $A\in \cal F$ is monochromatic is less than 1. Thus there must be a bichromatic colouring of $X$ with no monochromatic $A\in\cal F$. QED.
This example is clearly a simplified version of Erdős’s celebrated proof of a lower bound for Ramsey numbers, which is often claimed to be the first application of the probabilistic method. Note that the existence claim is nonconstructive: we have shown that the probability of a certain outcome is less than one. So the opposite outcome has nonzero probability and therefore forms a nonempty set.
Formalising the probability space
The theorem statement assumes the family $\cal F$ of $n$sets of the finite set $X$. The family has cardinality $\vert \cal F \vert = m<2^{n1}$. Necessary is the constraint $0<n\le\vert X\vert$, omitted from the problem statement. As for the conclusion, the required 2colouring is expressed as a function from $X$ to the set $\{0,1\}$. The extensional function space →⇩_{E} is required: by constraining the functions outside their domain ($X$) to some arbitrary fixed value, this operator accurately represents the set $X\to\{0,1\}$. It’s vital because we are actually counting these functions and their values outside $X$ are irrelevant.
theorem Erdos_1963: assumes X: "𝓕 ⊆ nsets X n" "finite X" and "card 𝓕 = m" and m: "m < 2^(n1)" and n: "0 < n" "n ≤ card X" obtains f::"'a⇒nat" where "f ∈ X →⇩_{E} {..<2}" "⋀F c. ⟦F ∈ 𝓕; c<2⟧ ⟹ ¬ f ` F ⊆ {c}" proof 
Now we have to set up the probabilities.
Our “colours” are actually 0 and 1, constrained to type nat
.
The sample space $\Omega$ is the set of all 2colourings of $X$.
Then the probability space $M$ is the corresponding measure space
where all colourings have the same probability.
A nonuniform probability distribution would be a little more work,
e.g. we’d have to show that the probabilities summed to 1.
have "finite 𝓕" using X finite_imp_finite_nsets finite_subset by blast let ?two = "{..<2::nat}" define Ω where "Ω ≡ X →⇩_{E} ?two" define M where "M ≡ uniform_count_measure Ω"
Next comes some boilerplate relating $\Omega$ and $M$,
allowing the interpretation of the prob_space
locale.
Isabelle/HOL’s tools of probability reasoning are now at our disposal.
have space_eq: "space M = Ω" by (simp add: M_def space_uniform_count_measure) have sets_eq: "sets M = Pow Ω" by (simp add: M_def sets_uniform_count_measure) have cardΩ: "card Ω = 2 ^ card X" using ‹finite X› by (simp add: Ω_def card_funcsetE) have Ω: "finite Ω" "Ω ≠ {}" using cardΩ less_irrefl by fastforce+ interpret P: prob_space M unfolding M_def by (intro prob_space_uniform_count_measure Ω)
The idea of a colouring being monochromatic on a set is easily expressed in terms of set image. For any given colour $c$ and set $F$, the set of monochromatic maps is an event of the probability space.
define mchrome where "mchrome ≡ λc F. {f ∈ Ω. f ` F ⊆ {c}}" have mchrome: "mchrome c F ∈ P.events" "mchrome c F ⊆ Ω" for F c by (auto simp: sets_eq mchrome_def Ω_def)
The probability that a map is monochrome on some $F\in\cal F$
Given $F\in\cal F$ and any fixed colour $c$, the number of maps monochrome on $\cal F$ (coloured $c$) is $2^{\vert X\vertn}$. That’s because each element of $X$ not in $F$ could be given either colour. The proof defines a bijection between colourings mapping the whole of $F$ to $c$ and those that don’t colour $F$ at all. This sort of calculation can get quite a bit more complicated when the probability distribution is nonuniform.
have card_mchrome: "card (mchrome c F) = 2 ^ (card X  n)" if "F ∈ 𝓕" "c<2" for F c proof  have F: "finite F" "card F = n" "F ⊆ X" using assms that by (auto simp: nsets_def) with F ‹finite X› have "card (XF →⇩_{E} ?two) = 2 ^ (card X  n)" by (simp add: card_funcsetE card_Diff_subset) moreover have "bij_betw (λf. restrict f (XF)) (mchrome c F) (XF →⇩_{E} ?two)" proof (intro bij_betwI) show "(λg x. if x∈F then c else g x) ∈ (XF →⇩_{E} ?two) → mchrome c F" using that ‹F ⊆ X› by (auto simp: mchrome_def Ω_def) qed (fastforce simp: mchrome_def Ω_def)+ ultimately show ?thesis by (metis bij_betw_same_card) qed
The probability calculation is simply $2^{\vert X\vertn} / 2^{\vert X\vert} = 1 / 2^n$.
have prob_mchrome: "P.prob (mchrome c F) = 1 / 2^n" if "F ∈ ℱ" "c<2" for F c proof  have emeasure_eq: "emeasure M U = (if U⊆Ω then ennreal(card U / card Ω) else 0)" for U by (simp add: M_def emeasure_uniform_count_measure_if ‹finite Ω›) have "emeasure M (mchrome c F) = ennreal (2 ^ (card X  n) / card Ω)" using that mchrome by (simp add: emeasure_eq card_mchrome) also have "… = ennreal (1 / 2^n)" by (simp add: n power_diff cardΩ) finally show ?thesis by (simp add: P.emeasure_eq_measure) qed
Finishing up the argument
The rest of the proof should be straightforward, but needs to be annoyingly detailed in Isabelle. We begin by showing that the union is a subset of $\Omega$, and therefore an event.
have "(⋃F∈𝓕. ⋃c<2. mchrome c F) ⊆ Ω" by (auto simp: mchrome_def Ω_def)
To show that the union is actually a strict subset involves formalising the proof that $P(\bigcup_{A\in\cal F} E_A) < 1$.
moreover have "(⋃F∈𝓕. ⋃c<2. mchrome c F) ≠ Ω" proof  have "P.prob (⋃F∈𝓕. ⋃c<2. mchrome c F) ≤ (∑F∈𝓕. P.prob (⋃c<2. mchrome c F))" by (intro measure_UNION_le) (auto simp: countable_Un_Int mchrome ‹finite 𝓕›) also have "… ≤ (∑F∈𝓕. ∑c<2. P.prob (mchrome c F))" by (intro sum_mono measure_UNION_le) (auto simp: mchrome) also have "… = m * 2 * (1 / 2^n)" by (simp add: prob_mchrome ‹card 𝓕 = m›) also have "… < 1" proof  have "real (m * 2) < 2 ^ n" using mult_strict_right_mono [OF m, of 2] ‹n>0› by (metis of_nat_less_numeral_power_cancel_iff pos2 power_minus_mult) then show ?thesis by (simp add: divide_simps) qed finally have "P.prob (⋃F∈𝓕. ⋃c<2. mchrome c F) < 1" . then show ?thesis using P.prob_space space_eq by force qed
The conclusion of the theorem is now immediate. Recall that moreover accumulates prior lemmas, which ultimately makes available to the next proof.
ultimately obtain f where f:"f ∈ Ω  (⋃F∈𝓕. ⋃c<2. mchrome c F)" by blast with that show ?thesis by (fastforce simp: mchrome_def Ω_def) qed
Postscript
The probabilistic method is simply a more intuitive way of presenting a proof by counting. The original example of such a proof is claimed to be Erdős’s “Some remarks on the theory of graphs” (1947). This paper indeed presents a proof of a lower bound for Ramsey numbers, but it makes no reference to probability and instead enumerates the total number of graphs satisfying certain properties. Presumably he published a probabilistic version of that proof at a later date.
A recent paper by Chelsea Edmonds describes the formalisation of probabilistic proofs in considerably more detail.
The examples for this post are online here.

A bichromatic colouring of $X$ is a map taking each element of $X$ to either red or blue, and $Y\subseteq X$ is monochromatic if all its elements have the same colour. ↩