How Google DeepMind's AlphaGeometry Reached Math Olympiad Level Reasoning By Combining Creative LLMs With Deductive Symbolic Engines

How open-sourced Neurosymbolic AI (Neural Networks + Symbolic Systems) is revolutionizing automated theorem proving

Aug 22, 2024

∙ Paid

Get a list of personally curated and freely accessible ML, NLP, and computer vision resources for FREE on newsletter sign-up.

Consider sharing this with someone who wants to know more about machine learning.

DeepMind's AlphaGeometry [8] represents a significant leap forward in automated geometric reasoning, tackling olympiad-level geometry problems with performance rivaling human experts.

DeepMind has a track record of solving hard problems with the goal of beating and surpassing human performance.

AlphaGo and AlphaZero [10, 11]: Developing AI systems that mastered complex games like Go and surpassing human-level performance.
AlphaFold [9]: Creating an AI system that can accurately predict protein structures, which has significant implications for biological research and drug discovery.

In a previous post, we covered what drove the success behind the AlphaFold:

Inside AlphaFold: DeepMind’s Recipe For Success

CodeCompass

June 6, 2024

Inside AlphaFold: DeepMind’s Recipe For Success

Read full story

Their neuro-symbolic system solved 25 out of 30 problems from a benchmark set of International Mathematical Olympiad (IMO) geometry questions, surpassing the previous state-of-the-art method's 10 problems and approaching the average IMO gold medalist's 25.9 problems.

It does use a language model but in close coordination with a symbolic system that keeps the LLM grounded. In addition to this, it achieves the feat without relying on human demonstrations.

Would a stand-alone LLM be able to do the same?

Not quite.

“When producing full natural-language proofs on IMO-AG-30, however, GPT-4 has a success rate of 0%, often making syntactic and semantic errors throughout its outputs, showing little understanding of geometry knowledge and of the problem statements itself.”

ChatGPT-4 [2] fails with a success rate of 0%. On its own, a stand-alone LLM [1, 2, 3, 4, 5, 6] is nowhere as good. This makes a strong case for agentic workflows and tool calling. And that stand-alone LLMs might just be good chat agents without additional context (such as in Retrieval Augmented Generation which we saw in a previous article) and tooling.

Combining LLM and symbolic systems allows AlphaGeometry to outperform previous state-of-the-art as well as stand-alone LLMs such as ChatGPT-4 by a large margin.

What is more, they have made the code repository available on GitHub (see References section at the bottom for the link) [7]!

Today, we take a deep dive into what makes AlphaGeometry work:

What is the problem at hand?
An Overview of AlphaGeometry
Detailed Breakdown of Essential Building Blocks
Design Choices
Key Insights and Takeaways
How good is AlphaGeometry’s Performance? Quantitative + Qualitative

Would like to read other related pieces?
Here you can read more about the Transformers series and LLMs series.

What is QLoRA?: A Visual Guide to Efficient Finetuning of Quantized LLMs

CodeCompass

August 8, 2024

Read full story

Uber's Billion Dollar Problem: Predicting ETAs Reliably And Quickly

CodeCompass

April 25, 2024

Read full story

[Jupyter Notebook] Build Your Own Open-source RAG Using LangChain, LLAMA 3 and Chroma

CodeCompass

July 23, 2024

Read full story

1. Scoping the Problem

Geometric theorem proving at the olympiad level poses unique difficulties. Traditional symbolic methods with hand-crafted heuristics have struggled to address these challenges effectively.

Wait, but what are even symbolic systems?

“Systems that are built with symbols, like natural language, programming, languages, and formal logic; …” - Stanford’s Symbolic Systems [12]

Complexity of Geometry & Scarcity of training data

Geometric theorem proving at the Olympiad level has long been a formidable challenge in artificial intelligence. The complexity lies not just in the intricate nature of geometry itself, but also in the scarcity of training data.

Challenges in translating human proofs to machine-verifiable format

Unlike other mathematical domains, geometry faces unique translation challenges when converting human proofs into machine-verifiable format. This data bottleneck has caused geometry to lag behind in recent AI advancements that rely heavily on human demonstrations.

Infinite branching factor in geometric proofs

Traditional approaches to geometry theorem proving have primarily relied on symbolic methods and human-designed, hard-coded search heuristics. However, these methods often struggle with the infinite branching factor inherent in geometric proofs, particularly when it comes to auxiliary constructions - a crucial aspect of many complex geometric arguments.

An example of an auxiliary construction. This involves creative thinking and in AlphaGeometry, it is done by an LLM.

An auxiliary construction refers to an additional line, segment, circle, or other geometric element that is introduced into a figure to help solve a problem, prove a theorem, or simplify a geometric proof. These constructions are not originally part of the given problem but are added to make the relationships between the elements clearer.

The key challenge, therefore, was to develop a system that could not only prove geometric theorems at a high level but do so without relying on human-provided examples or curated problem statements. This system would need to generate its own training data, learn to make auxiliary constructions and combine neural and symbolic approaches effectively.

The Code Compass

How Google DeepMind's AlphaGeometry Reached Math Olympiad Level Reasoning By Combining Creative LLMs With Deductive Symbolic Engines

How open-sourced Neurosymbolic AI (Neural Networks + Symbolic Systems) is revolutionizing automated theorem proving

Inside AlphaFold: DeepMind’s Recipe For Success

Would a stand-alone LLM be able to do the same?

What is QLoRA?: A Visual Guide to Efficient Finetuning of Quantized LLMs

Uber's Billion Dollar Problem: Predicting ETAs Reliably And Quickly

[Jupyter Notebook] Build Your Own Open-source RAG Using LangChain, LLAMA 3 and Chroma

1. Scoping the Problem

Complexity of Geometry & Scarcity of training data

Challenges in translating human proofs to machine-verifiable format

Infinite branching factor in geometric proofs

This post is for paid subscribers