How Google DeepMind's AlphaGeometry Reached Math Olympiad Level Reasoning By Combining Creative LLMs With Deductive Symbolic Engines
How open-sourced Neurosymbolic AI (Neural Networks + Symbolic Systems) is revolutionizing automated theorem proving
Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.
Consider sharing this with someone who wants to know more about machine learning.
DeepMind's AlphaGeometry [8] represents a significant leap forward in automated geometric reasoning, tackling olympiad-level geometry problems with performance rivaling human experts.
DeepMind has a track record of solving hard problems with the goal of beating and surpassing human performance.
AlphaGo and AlphaZero [10, 11]: Developing AI systems that mastered complex games like Go and surpassing human-level performance.
AlphaFold [9]: Creating an AI system that can accurately predict protein structures, which has significant implications for biological research and drug discovery.
In a previous post, we covered what drove the success behind the AlphaFold:
Their neuro-symbolic system solved 25 out of 30 problems from a benchmark set of International Mathematical Olympiad (IMO) geometry questions, surpassing the previous state-of-the-art method's 10 problems and approaching the average IMO gold medalist's 25.9 problems.
It does use a language model but in close coordination with a symbolic system that keeps the LLM grounded. In addition to this, it achieves the feat without relying on human demonstrations.
Would a stand-alone LLM be able to do the same?
Not quite.
“When producing full natural-language proofs on IMO-AG-30, however, GPT-4 has a success rate of 0%, often making syntactic and semantic errors throughout its outputs, showing little understanding of geometry knowledge and of the problem statements itself.”
ChatGPT-4 [2] fails with a success rate of 0%. On its own, a stand-alone LLM [1, 2, 3, 4, 5, 6] is nowhere as good. This makes a strong case for agentic workflows and tool calling. And that stand-alone LLMs might just be good chat agents without additional context (such as in Retrieval Augmented Generation which we saw in a previous article) and tooling.
What is more, they have made the code repository available on GitHub (see References section at the bottom for the link) [7]!
Today, we take a deep dive into what makes AlphaGeometry work:
What is the problem at hand?
An Overview of AlphaGeometry
Detailed Breakdown of Essential Building Blocks
Design Choices
Key Insights and Takeaways
How good is AlphaGeometry’s Performance? Quantitative + Qualitative
Would like to read other related pieces?
Here you can read more about the Transformers series and LLMs series.
1. Scoping the Problem
Geometric theorem proving at the olympiad level poses unique difficulties. Traditional symbolic methods with hand-crafted heuristics have struggled to address these challenges effectively.
Wait, but what are even symbolic systems?
“Systems that are built with symbols, like natural language, programming, languages, and formal logic; …” - Stanford’s Symbolic Systems [12]
Complexity of Geometry & Scarcity of training data
Geometric theorem proving at the Olympiad level has long been a formidable challenge in artificial intelligence. The complexity lies not just in the intricate nature of geometry itself, but also in the scarcity of training data.
Challenges in translating human proofs to machine-verifiable format
Unlike other mathematical domains, geometry faces unique translation challenges when converting human proofs into machine-verifiable format. This data bottleneck has caused geometry to lag behind in recent AI advancements that rely heavily on human demonstrations.
Infinite branching factor in geometric proofs
Traditional approaches to geometry theorem proving have primarily relied on symbolic methods and human-designed, hard-coded search heuristics. However, these methods often struggle with the infinite branching factor inherent in geometric proofs, particularly when it comes to auxiliary constructions - a crucial aspect of many complex geometric arguments.
An auxiliary construction refers to an additional line, segment, circle, or other geometric element that is introduced into a figure to help solve a problem, prove a theorem, or simplify a geometric proof. These constructions are not originally part of the given problem but are added to make the relationships between the elements clearer.
The key challenge, therefore, was to develop a system that could not only prove geometric theorems at a high level but do so without relying on human-provided examples or curated problem statements. This system would need to generate its own training data, learn to make auxiliary constructions and combine neural and symbolic approaches effectively.