A Casual Intro to Formalization

Mike

2025-09-29

Page content

The early days of the theory of computers were characterized by researchers trying to figure out the limits of what was doable mathematically and logically. A mathematician, to quote Paul Erdos, was a “machine for turning coffee into theorems.”

Mathematicians – or any human researcher – was abstracted away as a machine performing computations. Since those early days, however, computers have mostly been used to perform calculations, and run programs. Today, even most mathematicians use computers/devices mainly for mundane tasks like checking emails, and ordering food.

There have, however, been researchers who have built some tools for checking and even helping in the discovery of proofs. These tools include:

Coq
Isabelle/HOL
Lean

I haven’t found working with any of them to be easy, but I also haven’t worked with any of them that much. On top of that, it’s been a couple years since I tried learning it.

When I was trying to learn it, I mostly used Mathematics in Lean. On the whole, it’s excellent. You’ll definitely also want to take advantage of the Lean community on Zulip! The project page is here: Lean. The great thing about Zulip chat is that it supports writing in LaTeX!

I’m going to cover an example from Topology. It’s an incredibly easy result, but it still stumped me for a while! But, first, I’ve got to introduce a little math. It’s good, though, because it’s a mathematical topic that I wanted to get to anyway.

Definition: Filter

Let \( S \) be a non-empty set. A filter on \(S\) is a subset of \(P(S)\), the power set of \(S\), such that

\(\emptyset \notin F \)
\( u \in F \wedge u \subseteq v \implies v \in F \)
\( u, v \in F \implies u \cap v \in F \)

I’m going to consider the above definition of filter to be the working definition for this site – it’s in line with what the set theorist Jech gave in his book on Set Theory. However, we’ll see that this is not how Mathlib handles things with Lean, so when we use that, we’ll have to make some adjustments.

Instead, we’ll see that Mathlib drops the first requirement. Then, it treats filters as elements of a Lattice, where the filters are ordered by reverse set inclusion. The filter with the most subsets – the entire power set, \(P(S)\), is thus the bottom element of the lattice, “\( \bot \)”.

This makes some things easier and other things harder. On the whole, for those who actually developed Mathlib, the tradeoff made a lot of sense, apparently.

One of the downsides that you run into with this setup, though, is that you might have to prove that you’re not working with the bottom filter, because this is no longer guaranteed by definition. One way to do that is to prove that the Filter doesn’t contain the empty set, \( \emptyset \).

Back when I was working through Mathematics in Lean, I couldn’t find the following result anywhere, so I proved it myself. I could have, and probably should have, asked the Zulip community for help with this, but I thought that since it was certainly true, I should just try to solve it myself. The proof I came up with is not what the Zulip community’s proofs usually look like. Their proofs are usually slick – they’re almost always shorter than what I come up with, and they use functions in brilliant ways. That shows that there is a way of thinking about Lean4, and Mathlib that’s not how most mathematicians intuitively think. But, it also shows that it’s a skill that can be learned.

Below is what I did. Afterward, I’ll explain what I think is going on.

An example lemma in Lean4 using Mathlib

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


import MIL.Common
import Mathlib.Topology.Instances.Real.Lemmas

open Set Filter Topology

lemma bot_iff_contains_empty {X : Type*} {F : Filter X} :
  F = ⊥ ↔ ∅ ∈ F := by
  constructor
  case mp =>
    intro h
    rw [h]
    tauto
  case mpr =>
    intro h
    rw [filter_eq_iff]
    rw [Filter.bot_sets_eq]
    have h₀ : (univ : Set (Set X)) = {s : Set X | ∅ ⊆ s} := by
      ext x
      constructor
      tauto
      tauto
    have h₁ : ∅ ∈ F → F.sets = {s : Set X | ∅ ⊆ s} := by
      intro h
      ext x
      constructor
      tauto
      intro h'
      apply F.sets_of_superset
      apply h
      tauto
    have h₂ : F.sets = {s : Set X | ∅ ⊆ s} := by
      apply h₁
      exact h
    rw [h₀]
    rw [h₂]

Explanation

Everything before lemma is to enable what follows. The import lines import the modules Mathlib and MIL.Common, while the open line allows us to directly access the names: Set, Filter, and Topology directly, without having to reference the module as well. I got that explanation from Google’s Gemini, which is also a very useful resource when trying to get familiar with this material, of course you can also make use of other available LLMs, like OpenAI’s ChatGPT, etc…

Line 6 and 7:

1
2


lemma bot_iff_contains_empty {X : Type*} {F : Filter X} :
  F = ⊥ ↔ ∅ ∈ F := by

mean something like,

bot_iff_contains_empty is a lemma that has to be applied to two objects. The first can be any type, but the second has to be of type Filter X, which means that it has to be a filter of whatever type X has.
Moreover, the lemma claims that, “a filter, F, is a bottom filter if and only if it contains the empty set.”

Finally, the := by introduces the proof of the lemma.

Then, in lines 8 - 12 we have:

1
2
3
4
5


  constructor
  case mp =>
    intro h
    rw [h]
    tauto

The constructor line invokes a pattern-matching tactic that automatically breaks the proof down from an “if and only if” proof into the two cases, the “if” and “only if” cases.

How do you know to do something like that? That and much more, is covered earlier in Mathematics in Lean.

When learning this, you really have to work with an IDE like Visual Studio Code or Emacs. I haven’t worked with Emacs in many years, and never tried to use it for learning Lean. Visual Studio is a great environment for it, though, so I recommend using it.

If you’re using something like Visual Studio, and you have it set up properly for Lean4, you should be able to view the lean InfoView panel, while working with your lean4 code. The InfoView will show, if you click on the constructor line in your code, something like this

1
2
3
4
5
6
7
8
9


case mp
X : Type u_1
F : Filter X
⊢ F = ⊥ → ∅ ∈ F

case mpr
X : Type u_1
F : Filter X
⊢ ∅ ∈ F → F = ⊥

This shows us that the constructor tactic automatically broke the if and only if proof down into two cases, case mp (probably) for “Modus Ponens” and case mpr (probably) for “Modus Ponens Reverse.”

The next line, case mp =>, just invokes the case that the constructor tactic automatically labeled mp. Then, the intro h line introduces the hypothesis, that F = ⊥ and gives it a label, h, so we can actually invoke the hypothesis by itself.

Lean4 has other proof tactics besides constructor. The line rw [h] uses the powerful rw tactic, which rewrites things automatically by making substitutions where there’s an applicable equality. Then the next line, tauto asserts that the line is a tautology.

It doesn’t appear that you can get rid of either of those lines. It seems that, “under the hood”, the rw [h] tactic has caused lean to replace the left side of the equality in h : F = ⊥ with the right side in the goal. After this, the goal is then, ⊢ ∅ ∈ ⊥, which (if you’re careful), you can see in the InfoView while hovering over line 11 above. Then, tauto tactic on the next line (12), allows lean to automatically conclude that ∅ ∈ ⊥, which must be some kind of a definition. This establishes case mp.

At this point, you might be thinking to yourself that if case mp followed from a simple rewrite tactic and invoking a tautology, that the other direction, case mpr would behave similarly. It might be possible, but I couldn’t figure it out.

For case mpr, you still use intro h, so that you can access ∅ ∈ F. When you do that, and you try to apply rw [h], like you did above, you get:

1
2
3
4
5
6


tactic 'rewrite' failed, equality or iff proof expected
  ∅ ∈ F
X : Type u_1
F : Filter X
h : ∅ ∈ F
⊢ F = ⊥

So, the InfoView is telling us that it expects either an if and only if proof, or an equality. Fortunately, Mathlib has a lot of definitions and lemmas we can draw upon. Unfortunately, I don’t remember how I found filter_eq_iff. It might have been introduced earlier in the tutorial, but you can search for results like that as well. If I recall correctly, I never got great at searching, and I think I would sometimes end up guessing names until something worked.

When you apply rw [filter_eq_iff], you get

1
2
3
4


X : Type u_1
F : Filter X
h : ∅ ∈ F
⊢ F.sets = ⊥.sets

in the Lean InfoView. At this point, the goal has been rewritten, so that now we have to show that the sets of F are equal to the sets of ⊥, the bottom filter. Then, the next line applies the result Filter.bot_sets_eq to rewrite the bottom filter’s sets. That allows Lean4 to conclude that ⊥.sets = univ, the universe of sets. Now when we check the InfoView, we see

1
2
3
4


X : Type u_1
F : Filter X
h : ∅ ∈ F
⊢ F.sets = univ

So, the rest of the proof involves showing that F.sets, the collection of sets of F is equal to the universe of all sets.

I’m going to mention again that this is probably not how the Mathlib experts on Zulip would handle this, but it might represent something like what you could expect to run into, especially when you’re new and trying to prove things in a fairly naive way. When you do it that way, you might end up with something like this.

I showed that both univ and F.sets are equal to {s : Set X | ∅ ⊆ s}, which means, “the set of all sets that contain the empty set as a subset”. Notice that you can create sub-goals! I used three of them

1
2
3


have h₀ : (univ : Set (Set X)) = {s : Set X | ∅ ⊆ s} := by 
have h₁ : ∅ ∈ F → F.sets = {s : Set X | ∅ ⊆ s} := by 
have h₂ : F.sets = {s : Set X | ∅ ⊆ s} := by

Proving those three sub-goals involved using two more tactics that I haven’t mentioned yet

1
2


ext x
apply

ext x is a tactic that allows us to reason about sets by reasoning about their elements. So, that at line 18 above, we have

1
2
3
4
5


X : Type u_1
F : Filter X
h : ∅ ∈ F
x : Set X
⊢ x ∈ univ ↔ x ∈ {s | ∅ ⊆ s}

in the Lean InfoView panel. Now we have to show that univ and {s | ∅ ⊆ s} have the same sets as elements. You can then see that h₀ is easily proved by just invoking a few tactics, Lean4 handles the rest.

Now, to see what’s happening in line 28, you have to check what the InfoView is saying at line 27, intro h'

1
2
3
4
5
6
7
8
9


case h.mpr
X : Type u_1
F : Filter X
h✝ : ∅ ∈ F
h₀ : univ = {s | ∅ ⊆ s}
h : ∅ ∈ F
x : Set X
h' : x ∈ {s | ∅ ⊆ s}
⊢ x ∈ F.sets

This is telling us, among other things, that x is a set, and that it has the empty set as a subset. It’s also telling us, with the ⊢ x ∈ F.sets line that we need to show that x is a set of F.

F.sets_of_superset says what part 2 of the definition we gave for filters said, “filters are closed upward”. So, after using the apply tactic with it, by line 29, we only have to show that ⊢ ∅ ⊆ x. In the next line we see that we can prove that result by applying the tauto tactic, because it’s a tautology in lean4 + Mathlib.

Putting it Together

You might notice that this doesn’t treat everything as a set, so it’s not how set theorists approach the foundations of math. It might also not seem natural to take a detour through the “universe of all sets” just to prove that something is the degenerate bottom filter. However, it works. You’ll also probably notice that it can be pretty cumbersome proving all of the necessary steps. You can’t just handwave, but that’s exactly where the system’s real strength comes from.

Why We Must Formalize Math

Mathematicians are typically quite good at math, right? But, when you stop to think about it, aren’t programmers supposed to be good at programming, too? Have you ever noticed how difficult it is to write perfect code without compiling (if necessary) and testing? This is exactly the setup for a grueling programming interview, right? It’s done that way precisely because it’s hard to get everything right.

Now, realize that this is what mathematicians have pretty much always been doing. They work through proofs and try really hard to spot the flaws. However, they are not perfect. It can get really tough to spot the flaws in complicated proofs! Sometimes a mistake goes unnoticed for years and a result that was thought to be a theorem will either have to wait to be proven correctly later, or might end up getting disproven.

Formalization allows us to know that a computer has mechanically checked the proposed proof and found no flaws. It adds an extremely good additional level of trust. It’s even more trust than a programmer would have upon compiling and running a program, finding that it passed every unit test.

Formalization will help with AI

Formalization will become essential as we continue to work increasingly with AI. AI has shown an impressive ability to provide plausible sounding, but incorrect answers to some problems. The flaws in the reasoning can be extremely difficult to spot. Formalizing will become essential to preserving the integrity of mathematics, as it increases in complexity and abstraction.

Formalization will also allow AI researchers the only credible way to ensure AI safety in reasoning. Having AI explain itself, whenever possible, to a formalization system will allow it to learn math/logic on its own, and also allow it to explore concepts more with exactness that’s not possible using methods like RLHF (reinforcement learning with human feedback). Very quickly the human feedback will become unreliable in many situations.

Conclusion

Formalization, as shown by that simple filter lemma, can be a meticulous, even agonizing process. It demands patience and forces us to speak the language of logic with absolute precision. But this very difficulty is its power. By building these frameworks, we are not just correcting human error; we are building the only robust infrastructure capable of sustaining mathematical integrity in a world increasingly powered by opaque AI systems. The effort to turn coffee into theorems may now evolve into the effort to turn rigorous formal proofs into near absolute certainty, ensuring that the foundations of knowledge remain secure, no matter who (or what) is doing the reasoning.