Ethereum Verification Benchmark

Deductive reasoning.

We evaluate frontier models on fresh problems derived from production Ethereum contracts. Each task pairs a Verity contract model with a fixed formal specification. A model succeeds only if it produces a proof accepted by Lean.

Leaderboard

#ModelPass rate
1OpenAIgpt-5.5
48.9%
2Z.aiglm-5.2
39.3%
3Anthropicopus-4.8
33.3%
4xAIgrok-build-0.1
30.4%
5MiniMaxminimax-m3
28.1%
6Kimikimi-k2.7
23.0%
7DeepSeekv4-flash
22.2%
8DeepSeekv4-pro*
22.2%
9xAIgrok-4.3
18.5%
10Xiaomimimo-v2.5
9.6%

Pass rate is the share of proofs that compiled against the fixed specification. Total is the number of tasks, out of 135, for which the model returned a parseable proof. An asterisk* marks models evaluated on only part of the 135 tasks.

Scaling Law

We re-ran MiniMax M3 on 50 tasks solved by the frontier model with steadily larger effort budgets. Its solve rate climbs from 12% to 96%, approaching a near-perfect rate as effort grows.

010203040507.9M15.9M23.8M31.7M39.7Msolved taskscumulative total tokens
Estimation: S(E) = 1 / (1 + (4.6M / E)1.19)

About the benchmark

The benchmark was developed with support from the Ethereum Foundation as part of our work to make formal verification of smart contracts more practical. Instead of asking a model to explain whether a contract is safe, Verity asks it to produce a mathematical proof that a specific formal property follows from a specific contract model and specification.

It contains 135 proof tasks derived from 11 Ethereum contracts and built with input from protocol teams. Each task includes a reference proof and a frozen grading setup. The model may attempt the proof, but it may not change the theorem statement or specification.

A proof is considered correct only if it is accepted by the Lean 4 kernel, with no additional assumptions.

Methodology & datasetHow cases are built and gradedView source on GitHubTasks, specs & grading harness

Every task, specification, and grading harness is open source. Results are fully reproducible.

In collaboration with

Ethereum Foundation grantee
SafeLidoMorphoIPORTerm FinanceGnosis GuildRootstock1deltaUsualPikuPolaris FinanceLagoonForgeYieldsBalancerAgglayerAlchemixReserveCorkTermMaxWildcatMidasZamaSonicNexus Mutual