A Casual Intro to Formalization

The early days of the theory of computers were characterized by researchers trying to figure out the limits of what was doable mathematically and logically. A mathematician, to quote Paul Erdos, was a “machine for turning coffee into theorems.”
Mathematicians – or any human researcher – was abstracted away as a machine performing computations. Since those early days, however, computers have mostly been used to perform calculations, and run programs. Today, even most mathematicians use computers/devices mainly for mundane tasks like checking emails, and ordering food.
There have, however, been researchers who have built some tools for checking and even helping in the discovery of proofs. These tools include:
- Coq
- Isabelle/HOL
- Lean
I haven’t found working with any of them to be easy, but I also haven’t worked with any of them that much. On top of that, it’s been a couple years since I tried learning it.
When I was trying to learn it, I mostly used Mathematics in Lean. On the whole, it’s excellent. You’ll definitely also want to take advantage of the Lean community on Zulip! The project page is here: Lean. The great thing about Zulip chat is that it supports writing in LaTeX!
I’m going to cover an example from Topology. It’s an incredibly easy result, but it still stumped me for a while! But, first, I’ve got to introduce a little math. It’s good, though, because it’s a mathematical topic that I wanted to get to anyway.
Definition: Filter
Let \( S \) be a non-empty set. A filter on \(S\) is a subset of \(P(S)\), the power set of \(S\), such that
- \(\emptyset \notin F \)
- \( u \in F \wedge u \subseteq v \implies v \in F \)
- \( u, v \in F \implies u \cap v \in F \)
I’m going to consider the above definition of filter to be the working definition for this site – it’s in line with what the set theorist Jech gave in his book on Set Theory. However, we’ll see that this is not how Mathlib handles things with Lean, so when we use that, we’ll have to make some adjustments.
Instead, we’ll see that Mathlib drops the first requirement. Then, it treats filters as elements of a Lattice, where the filters are ordered by reverse set inclusion. The filter with the most subsets – the entire power set, \(P(S)\), is thus the bottom element of the lattice, “\( \bot \)”.
This makes some things easier and other things harder. On the whole, for those who actually developed Mathlib, the tradeoff made a lot of sense, apparently.
One of the downsides that you run into with this setup, though, is that you might have to prove that you’re not working with the bottom filter, because this is no longer guaranteed by definition. One way to do that is to prove that the Filter doesn’t contain the empty set, \( \emptyset \).
Back when I was working through Mathematics in Lean, I couldn’t find the following result anywhere, so I proved it myself. I could have, and probably should have, asked the Zulip community for help with this, but I thought that since it was certainly true, I should just try to solve it myself. The proof I came up with is not what the Zulip community’s proofs usually look like. Their proofs are usually slick – they’re almost always shorter than what I come up with, and they use functions in brilliant ways. That shows that there is a way of thinking about Lean4, and Mathlib that’s not how most mathematicians intuitively think. But, it also shows that it’s a skill that can be learned.
Below is what I did. Afterward, I’ll explain what I think is going on.
An example lemma in Lean4 using Mathlib
|
|
Explanation
Everything before lemma
is to enable what follows. The import
lines import the modules Mathlib
and MIL.Common
, while the open
line allows us to directly access the names: Set
, Filter
, and Topology
directly, without having to reference the module as well. I got that explanation from Google’s Gemini, which is also a very useful resource when trying to get familiar with this material, of course you can also make use of other available LLMs, like OpenAI’s ChatGPT, etc…
Line 6 and 7:
|
|
mean something like,
bot_iff_contains_empty
is a lemma that has to be applied to two objects. The first can be any type, but the second has to be of typeFilter X
, which means that it has to be a filter of whatever type X has.
Moreover, the lemma claims that, “a filter, F, is a bottom filter if and only if it contains the empty set.”
Finally, the := by
introduces the proof of the lemma
.
Then, in lines 8 - 12 we have:
|
|
The constructor
line invokes a pattern-matching tactic that automatically breaks the proof down from an “if and only if” proof into the two cases, the “if” and “only if” cases.
How do you know to do something like that? That and much more, is covered earlier in Mathematics in Lean.
When learning this, you really have to work with an IDE like Visual Studio Code or Emacs. I haven’t worked with Emacs in many years, and never tried to use it for learning Lean. Visual Studio is a great environment for it, though, so I recommend using it.
If you’re using something like Visual Studio, and you have it set up properly for Lean4, you should be able to view the lean InfoView panel, while working with your lean4 code. The InfoView will show, if you click on the constructor
line in your code, something like this
|
|
This shows us that the constructor
tactic automatically broke the if and only if proof down into two cases, case mp
(probably) for “Modus Ponens” and case mpr
(probably) for “Modus Ponens Reverse.”
The next line, case mp =>
, just invokes the case that the constructor
tactic automatically labeled mp
. Then, the intro h
line introduces the hypothesis, that F = ⊥
and gives it a label, h
, so we can actually invoke the hypothesis by itself.
Lean4 has other proof tactics besides constructor
. The line rw [h]
uses the powerful rw
tactic, which rewrites things automatically by making substitutions where there’s an applicable equality. Then the next line, tauto
asserts that the line is a tautology.
It doesn’t appear that you can get rid of either of those lines. It seems that, “under the hood”, the rw [h]
tactic has caused lean to replace the left side of the equality in h : F = ⊥
with the right side in the goal. After this, the goal is then, ⊢ ∅ ∈ ⊥
, which (if you’re careful), you can see in the InfoView while hovering over line 11 above. Then, tauto
tactic on the next line (12), allows lean to automatically conclude that ∅ ∈ ⊥
, which must be some kind of a definition. This establishes case mp
.
At this point, you might be thinking to yourself that if case mp
followed from a simple rewrite tactic and invoking a tautology, that the other direction, case mpr
would behave similarly. It might be possible, but I couldn’t figure it out.
For case mpr
, you still use intro h
, so that you can access ∅ ∈ F
. When you do that, and you try to apply rw [h]
, like you did above, you get:
|
|
So, the InfoView is telling us that it expects either an if and only if proof, or an equality. Fortunately, Mathlib has a lot of definitions and lemmas we can draw upon. Unfortunately, I don’t remember how I found filter_eq_iff
. It might have been introduced earlier in the tutorial, but you can search for results like that as well. If I recall correctly, I never got great at searching, and I think I would sometimes end up guessing names until something worked.
When you apply rw [filter_eq_iff]
, you get
|
|
in the Lean InfoView. At this point, the goal has been rewritten, so that now we have to show that the sets of F
are equal to the sets of ⊥
, the bottom filter. Then, the next line applies the result Filter.bot_sets_eq
to rewrite the bottom filter’s sets. That allows Lean4 to conclude that ⊥.sets = univ
, the universe of sets. Now when we check the InfoView, we see
|
|
So, the rest of the proof involves showing that F.sets
, the collection of sets of F
is equal to the universe of all sets.
I’m going to mention again that this is probably not how the Mathlib experts on Zulip would handle this, but it might represent something like what you could expect to run into, especially when you’re new and trying to prove things in a fairly naive way. When you do it that way, you might end up with something like this.
I showed that both univ
and F.sets
are equal to {s : Set X | ∅ ⊆ s}
, which means, “the set of all sets that contain the empty set as a subset”. Notice that you can create sub-goals! I used three of them
|
|
Proving those three sub-goals involved using two more tactics that I haven’t mentioned yet
|
|
ext x
is a tactic that allows us to reason about sets by reasoning about their elements. So, that at line 18 above, we have
|
|
in the Lean InfoView panel. Now we have to show that univ
and {s | ∅ ⊆ s}
have the same sets as elements. You can then see that h₀
is easily proved by just invoking a few tactics, Lean4 handles the rest.
Now, to see what’s happening in line 28, you have to check what the InfoView is saying at line 27, intro h'
|
|
This is telling us, among other things, that x
is a set, and that it has the empty set as a subset. It’s also telling us, with the ⊢ x ∈ F.sets
line that we need to show that x
is a set of F
.
F.sets_of_superset
says what part 2 of the definition we gave for filters said, “filters are closed upward”. So, after using the apply
tactic with it, by line 29, we only have to show that ⊢ ∅ ⊆ x
. In the next line we see that we can prove that result by applying the tauto
tactic, because it’s a tautology in lean4 + Mathlib.
Putting it Together
You might notice that this doesn’t treat everything as a set, so it’s not how set theorists approach the foundations of math. It might also not seem natural to take a detour through the “universe of all sets” just to prove that something is the degenerate bottom filter. However, it works. You’ll also probably notice that it can be pretty cumbersome proving all of the necessary steps. You can’t just handwave, but that’s exactly where the system’s real strength comes from.
Why We Must Formalize Math
Mathematicians are typically quite good at math, right? But, when you stop to think about it, aren’t programmers supposed to be good at programming, too? Have you ever noticed how difficult it is to write perfect code without compiling (if necessary) and testing? This is exactly the setup for a grueling programming interview, right? It’s done that way precisely because it’s hard to get everything right.
Now, realize that this is what mathematicians have pretty much always been doing. They work through proofs and try really hard to spot the flaws. However, they are not perfect. It can get really tough to spot the flaws in complicated proofs! Sometimes a mistake goes unnoticed for years and a result that was thought to be a theorem will either have to wait to be proven correctly later, or might end up getting disproven.
Formalization allows us to know that a computer has mechanically checked the proposed proof and found no flaws. It adds an extremely good additional level of trust. It’s even more trust than a programmer would have upon compiling and running a program, finding that it passed every unit test.
Formalization will help with AI
Formalization will become essential as we continue to work increasingly with AI. AI has shown an impressive ability to provide plausible sounding, but incorrect answers to some problems. The flaws in the reasoning can be extremely difficult to spot. Formalizing will become essential to preserving the integrity of mathematics, as it increases in complexity and abstraction.
Formalization will also allow AI researchers the only credible way to ensure AI safety in reasoning. Having AI explain itself, whenever possible, to a formalization system will allow it to learn math/logic on its own, and also allow it to explore concepts more with exactness that’s not possible using methods like RLHF (reinforcement learning with human feedback). Very quickly the human feedback will become unreliable in many situations.
Conclusion
Formalization, as shown by that simple filter lemma, can be a meticulous, even agonizing process. It demands patience and forces us to speak the language of logic with absolute precision. But this very difficulty is its power. By building these frameworks, we are not just correcting human error; we are building the only robust infrastructure capable of sustaining mathematical integrity in a world increasingly powered by opaque AI systems. The effort to turn coffee into theorems may now evolve into the effort to turn rigorous formal proofs into near absolute certainty, ensuring that the foundations of knowledge remain secure, no matter who (or what) is doing the reasoning.