Everything you know is wrong

20 Sep 2025

[ general memories higher-order logic ]

Among the few advantages of attaining the dizzy age of 70 is the ability to look back on half a century. Things were great back in 1975. The Cold War was over, thanks to the détente established by Nixon and Brezhnev. Most of the world’s population lived under Marxism, and that was not going to change. The Social Democrats ruled West Germany, and in the UK, even Rolls-Royce was a state-run enterprise. The world of fashion discovered pastels, bell bottoms, hot pants and platform shoes: the very zenith of human culture, styles that would live forever. Great progress had been made in Artificial Intelligence, thanks to a relentless focus on symbol processing as opposed to discredited, useless neural networks. Many thought that automatic theorem provers could lead the way to what we now call AGI. Also, watches did not have operating systems. People knew that clouds were real and that vaccines worked. Well, a lot can happen in 50 years.

Artificial Intelligence

AI is an extreme example of changing research trends. Already in 1961, a computer program had beat MIT students at calculus problems. By 1970, Terry Winograd’s SHRDLU demonstrated something remarkably like sentience in its use of English. The sort of wild predictions made today by AI boosters – that machines would put everybody out of work and maybe kill us all – were already being made back then. Consider the movie 2001: A Space Odyssey (released in 1968), featuring the smooth-talking but lethal HAL 9000. Director Stanley Kubrick had chosen MIT’s Marvin Minsky, possibly the world’s leading AI researcher, to be his scientific advisor. Minsky’s prediction turned out to be way off.

All AI work in that era was based on manipulating symbols. The book Perceptrons, by Minsky and Seymour Papert, had shown that neural networks could not even learn such a simple function as exclusive-or: they were worthless. The remaining debate was whether intelligence was procedural (embodied in code) or declarative (encoded in a symbolic formalism). SHRDLU was an example of procedural knowledge, a great mass of LISP code that no one knew what to do with. Declarative knowledge might be expressed in first-order logic or some other approach to knowledge representation, processed by an automatic theorem prover or other engine.

Expert systems – Intelligent Knowledge Based Systems – were attracting widespread attention by 1980. These contained declarative knowledge in the form of production rules obtained by interrogating human experts in a specific domain. For example, MYCIN could prescribe antibiotics given a description of the patient’s symptoms. It could even “explain” its decisions based on the production rules involved in the response to a query.

By the late 1980s, disenchantment with expert systems and other symbolic approaches led to an “AI Winter” that did not really end until neural networks were revived.

DPLL versus resolution theorem proving

Interest in automatic theorem proving dates back at least 1956: Logic Theorist could prove some simple statements from Whitehead and Russell’s Principia Mathematica. Philosopher Hao Wang and others were excited by the possibility of computers automating the process of mathematical proof. One outcome of such work was the DPLL procedure for propositional logic: the first SAT solver. It was quickly superseded by the resolution procedure, which worked in the richer formalism of first-order logic. By the mid 1970s, people were even getting disenchanted with resolution, but DPLL still seemed as dead as a doornail. Attempts to revive DPLL during the 1990s provoked laughter in some, but nobody is laughing now. Resolution has also grown in importance, even if nobody thinks it can do robot planning anymore. Prolog, a programming language originating in a specialised form of resolution, attracted enormous interest in the 1980s but had since fallen by the wayside.

Today, DPLL is the basis of powerful theorem provers based on SMT solvers and used for verifying software. Both SMT and resolution are used every time an Isabelle user invokes Sledgehammer.

Specialised hardware for declarative languages

The VLSI revolution that started in the 1970s coincided with the emergence of declarative languages such as Prolog and ML. People said that such languages were inefficient only because they were competing with von Neumann languages running on von Neumann hardware. But now anyone could put a computer on a chip, and a new class of computers designed for declarative languages would surely bridge the performance gap.

Some new hardware designs did emerge, such as the SKIM machine and the Transputer. The former had support for S, K, I combinator reduction right in its microcode to execute lazy functional languages fast. The latter was designed for parallel execution, to be wired together in massive grids for specialised computing tasks.

The big beast of this era was Japan’s Fifth Generation Computer Systems project. It was an attempt to leapfrog western computer technology through new approaches centred around Prolog, including special hardware. Similar programmes were launched in other countries, notably the UK’s Alvey Programme. The first ESPRIT programme was Europe’s response.

This burst of research activity was undoubtedly beneficial to computer science. Declarative languages such as Haskell emerged, which over time would exert a profound influence on programming language design. However, knowledge based systems still reached their dead end, and all attempts to create novel architectures were remorselessly crushed by the Intel steamroller. An off-the-shelf x86 processor could beat your special chip for a fraction of the cost of developing the latter.

Program verification versus correct-program synthesis

Program verification is the process of proving correctness properties of code that has already been written, identifying defects in the process. For much of my career the very idea has been heavily criticised, mostly as being infeasible¹ but also as being back-to-front. Begin with the problem specification, people said, and work backwards to code that is correct by design.

One of the foremost proponents of that approach was Edgar W Dijkstra, a dominant figure right up until his death in 2002. Already in his 1972 essay “Notes on structured programming” he set himself the task of printing a table of 1000 prime numbers, spending 11 pages refining the task “print first thousand prime numbers” to pseudocode in a language resembling Algol W. In this example he did not write his specifications in logic, but in later years his developments became increasingly formal. One of his bugbears was what he called a rabbit: an idea simply “pulled out of a hat” rather than determined by a calculation. I find it odd to be against having ideas.

A formal and rather elegant “deductive approach to program synthesis” was set forth by Zohar Manna and Richard Waldinger. They used a tableau style formalism intermingling logical formulas with code fragments and with a set of inference rules governing the translation from logic to executable code. Even more elegant and formal were the constructive type theories outlined by Per Martin-Löf. By formulating logical propositions as types, such theories guaranteed that the proof of a given statement yielded a proof object: executable code witnessing the claim. The approach required constructive logic, but the loss of classical reasoning was a small price to pay for bug-free code.

This dream could not work for a number of reasons:

Not all code has the same performance. It is easier to write the code you want and prove it afterwards than to prove a theorem in the precise way that would generate the code you already knew you wanted.
Specifications are seldom stable or complete. We know how to extend a program proof to cover additional properties, but we don’t have a deductive synthesis approach that is open to adding further properties later.
Many proof objects are computationally irrelevant, but you are confined to constructive reasoning even for those.

For the first point, to synthesise a sorting function we have to prove that every sequence of integers can be permuted into non-decreasing order. There is a simple proof by induction: swap the smallest element of the sequence with the first element, and since the remaining sequence is shorter by one, the conclusion follows. The corresponding algorithm is selection sort, which is no good: it takes quadratic time. To write a proof that corresponds to quicksort, you have to know quicksort already. Even in Dijkstra’s 1972 essay, he begins by remarking that the most efficient way to generate prime numbers is the Sieve of Eratosthenes and concludes by noting that he has derived exactly that.

Related to correct-program synthesis was the field of program transformations, explored by Richard Bird among others. The original specification would be executable (typically written in a functional language) but inefficient. Your task would be to transform this initial program into usable code. Lots of fascinating research went into program transformations, but this never looked like an approach that could scale to thousands of lines. If we consider sorting again, an executable specification is easily expressed in Prolog:

sort(X, Y) :- permutation(X,Y), ordered(Y).

This code enumerates every permutation Y of the input X and terminates if Y is ordered, otherwise backtracking to get another permutation. (It would be harder to code in a functional language.) This “specification” is incomplete because it assumes that such a permutation exists, which actually must be proved.

I am one of the many people who were attracted to type theory by the dream of deductive program synthesis in the early 1980s. Every so often, some researcher would attempt to execute the proof object they had derived from some interesting theorem. In almost every case, their extracted code turned out to be all but unusable. When I checked Wikipedia today, their entry on Program Synthesis mentioned the work of Manna and Waldinger but did not contain a single mention of the words “type” or “constructive”. Dang.

Postscript

If there is a lesson here, it is that almost nothing is settled for certain. What now is dead may be restored to life, what now seems invincible may be laid low, a decade or two from now.

I wish I could say that I had taken my own advice, ignored the herd and struck out boldly on my own. In fact, I pursued topics that other people found important. I even found myself bounced into joining a Prolog-oriented grant proposal for Alvey that made absolutely no sense. I devoted years to writing derivations in Martin-Löf type theory and even coding an implementation before falling out with the community’s cultish practices. The implementation became Isabelle.

A more impressive role model for a young researcher today would be Mike Gordon. He decided that he wanted to verify hardware, something no one else was interested in. After years of persistent research, he fixed on higher-order logic as the best formalism. He was not tempted by the still evolving dependent type theories, nor concerned when I mentioned that higher-order logic was distained by logicians for its messy metatheory. Already by 1986 he had achieved his initial milestone of verifying a simple but realistic computer, after which there was no stopping him.

My dictionary actually illustrates the definition of infeasible with the sentence “proof that a program works is infeasible unless it is very short”. ↩

Machine Logic

At the junction of computation, logic and mathematics