Probabilistic reasoning and formal proof

21 Aug 2024

[ Paul Erdős  example  Isar  ]

Many are the pitfalls awaiting anybody trying to formalise a proof, but the worst are appeals to intuition. These are typically a sign that the author can’t be bothered to outline a calculation or argument. Perhaps the claim is obvious (to them, not to you). Probabilistic claims, say about drawing coloured balls from a sack, may look particularly dubious. But as Paul Erdős has shown, such arguments can yield short, intuitive proofs that are absolutely rigorous. To formalise them simply requires a bit of measure theory boilerplate.

A simple example

Let’s consider an example from the website Cut the Knot, created by Alexander Bogomolny. He credits the example to a 1963 paper by Erdős (I could not work out which one; your hint would be welcome). It goes as follows:

Let $A_k$, for $k = 1, \ldots, m$, be a family of $n$-element subsets of a set $X$. If $m < 2^{n-1}$, then there exists a bichromatic colouring1 of $X$ such that no $A_k$ is monochromatic.

And here’s the proof, as presented by Bogomolny:

Let $\cal F$ be a collection of $n$-sets (sets with exactly $n$ elements), and assume that $\vert\cal F\vert = m < 2^{n-1}$. Colour $X$ randomly with two colours, all colourings being equally likely. For $A\in \cal F$ let $E_A$ be the event that $A$ is monochromatic. Since there are two such colourings and $|A| = n$, probability $P(E_A)$ of the event $E_A$ is given by $P(E_A) = 2\times 2^{-n} = 2^{1-n}$.

Since the events $E_A$ are not necessarily disjoint, $P(\bigcup_{A\in\cal F} E_A) \le \sum_{A\in\cal F} P(E_A) = m\times2^{1-n} < 1$.

So the probability that at least one $A\in \cal F$ is monochromatic is less than 1. Thus there must be a bichromatic colouring of $X$ with no monochromatic $A\in\cal F$. QED.

This example is clearly a simplified version of Erdős’s celebrated proof of a lower bound for Ramsey numbers, which is often claimed to be the first application of the probabilistic method. Note that the existence claim is nonconstructive: we have shown that the probability of a certain outcome is less than one. So the opposite outcome has nonzero probability and therefore forms a non-empty set.

Formalising the probability space

The theorem statement assumes the family $\cal F$ of $n$-sets of the finite set $X$. The family has cardinality $\vert \cal F \vert = m<2^{n-1}$. Necessary is the constraint $0<n\le\vert X\vert$, omitted from the problem statement. As for the conclusion, the required 2-colouring is expressed as a function from $X$ to the set $\{0,1\}$. The extensional function space E is required: by constraining the functions outside their domain ($X$) to some arbitrary fixed value, this operator accurately represents the set $X\to\{0,1\}$. It’s vital because we are actually counting these functions and their values outside $X$ are irrelevant.

theorem Erdos_1963:
  assumes X: "𝓕  nsets X n" "finite X"
    and "card 𝓕 = m" and m: "m < 2^(n-1)" and n: "0 < n" "n  card X"
  obtains f::"'anat" where "f  X E {..<2}" "F c. F  𝓕; c<2  ¬ f ` F  {c}"
proof -

Now we have to set up the probabilities. Our “colours” are actually 0 and 1, constrained to type nat. The sample space $\Omega$ is the set of all 2-colourings of $X$. Then the probability space $M$ is the corresponding measure space where all colourings have the same probability. A non-uniform probability distribution would be a little more work, e.g. we’d have to show that the probabilities summed to 1.

  have "finite 𝓕"
    using X finite_imp_finite_nsets finite_subset by blast
  let ?two = "{..<2::nat}"
  define Ω where "Ω  X E ?two"
  define M where "M  uniform_count_measure Ω"

Next comes some boilerplate relating $\Omega$ and $M$, allowing the interpretation of the prob_space locale. Isabelle/HOL’s tools of probability reasoning are now at our disposal.

  have space_eq: "space M = Ω"
    by (simp add: M_def space_uniform_count_measure)
  have sets_eq: "sets M = Pow Ω"
    by (simp add: M_def sets_uniform_count_measure)
  have cardΩ: "card Ω = 2 ^ card X"
    using finite X by (simp add: Ω_def card_funcsetE)
  have Ω: "finite Ω" "Ω  {}"
    using cardΩ less_irrefl by fastforce+
  interpret P: prob_space M
    unfolding M_def by (intro prob_space_uniform_count_measure Ω)

The idea of a colouring being monochromatic on a set is easily expressed in terms of set image. For any given colour $c$ and set $F$, the set of monochromatic maps is an event of the probability space.

  define mchrome where "mchrome  λc F. {f  Ω. f ` F  {c}}"
  have mchrome: "mchrome c F  P.events" "mchrome c F  Ω" for F c
    by (auto simp: sets_eq mchrome_def Ω_def)

The probability that a map is monochrome on some $F\in\cal F$

Given $F\in\cal F$ and any fixed colour $c$, the number of maps monochrome on $\cal F$ (coloured $c$) is $2^{\vert X\vert-n}$. That’s because each element of $X$ not in $F$ could be given either colour. The proof defines a bijection between colourings mapping the whole of $F$ to $c$ and those that don’t colour $F$ at all. This sort of calculation can get quite a bit more complicated when the probability distribution is nonuniform.

  have card_mchrome: "card (mchrome c F) = 2 ^ (card X - n)" if "F  𝓕" "c<2" for F c
  proof -
    have F: "finite F" "card F = n" "F  X"
      using assms that by (auto simp: nsets_def)
    with F finite X have "card (X-F E ?two) = 2 ^ (card X - n)"
      by (simp add: card_funcsetE card_Diff_subset)
    moreover
    have "bij_betw (λf. restrict f (X-F)) (mchrome c F) (X-F E ?two)"
    proof (intro bij_betwI)
      show "(λg x. if xF then c else g x)  (X-F E ?two)  mchrome c F"
        using that F  X by (auto simp: mchrome_def Ω_def)
    qed (fastforce simp: mchrome_def Ω_def)+
    ultimately show ?thesis
      by (metis bij_betw_same_card)
  qed

The probability calculation is simply $2^{\vert X\vert-n} / 2^{\vert X\vert} = 1 / 2^n$.

  have prob_mchrome: "P.prob (mchrome c F) = 1 / 2^n"  
    if "F  " "c<2" for F c
  proof -
    have emeasure_eq: "emeasure M U = (if UΩ then ennreal(card U / card Ω) else 0)" for U
      by (simp add: M_def emeasure_uniform_count_measure_if finite Ω)
    have "emeasure M (mchrome c F) = ennreal (2 ^ (card X - n) / card Ω)"
      using that mchrome by (simp add: emeasure_eq card_mchrome)
    also have " = ennreal (1 / 2^n)"
      by (simp add: n power_diff cardΩ)
    finally show ?thesis
      by (simp add: P.emeasure_eq_measure)
  qed

Finishing up the argument

The rest of the proof should be straightforward, but needs to be annoyingly detailed in Isabelle. We begin by showing that the union is a subset of $\Omega$, and therefore an event.

  have "(F𝓕. c<2. mchrome c F)  Ω"
    by (auto simp: mchrome_def Ω_def)

To show that the union is actually a strict subset involves formalising the proof that $P(\bigcup_{A\in\cal F} E_A) < 1$.

  moreover have "(F𝓕. c<2. mchrome c F)  Ω"
  proof -
    have "P.prob (F𝓕. c<2. mchrome c F)  (F𝓕. P.prob (c<2. mchrome c F))"
      by (intro measure_UNION_le) (auto simp: countable_Un_Int mchrome finite 𝓕)
    also have "  (F𝓕. c<2. P.prob (mchrome c F))"
      by (intro sum_mono measure_UNION_le) (auto simp: mchrome)
    also have " = m * 2 * (1 / 2^n)"
      by (simp add: prob_mchrome card 𝓕 = m)
    also have " < 1"
    proof -
      have "real (m * 2) < 2 ^ n"
        using mult_strict_right_mono [OF m, of 2] n>0
        by (metis of_nat_less_numeral_power_cancel_iff pos2 power_minus_mult) 
      then show ?thesis
        by (simp add: divide_simps)
    qed
    finally have "P.prob (F𝓕. c<2. mchrome c F) < 1" .
    then show ?thesis
      using P.prob_space space_eq by force
  qed

The conclusion of the theorem is now immediate. Recall that moreover accumulates prior lemmas, which ultimately makes available to the next proof.

  ultimately obtain f where f:"f  Ω - (F𝓕. c<2. mchrome c F)"
    by blast
  with that show ?thesis
    by (fastforce simp: mchrome_def Ω_def)
qed

Postscript

The probabilistic method is simply a more intuitive way of presenting a proof by counting. The original example of such a proof is claimed to be Erdős’s “Some remarks on the theory of graphs” (1947). This paper indeed presents a proof of a lower bound for Ramsey numbers, but it makes no reference to probability and instead enumerates the total number of graphs satisfying certain properties. Presumably he published a probabilistic version of that proof at a later date.

A recent paper by Chelsea Edmonds describes the formalisation of probabilistic proofs in considerably more detail.

The examples for this post are online here.

  1. A bichromatic colouring of $X$ is a map taking each element of $X$ to either red or blue, and $Y\subseteq X$ is monochromatic if all its elements have the same colour.