The Riemann Hypothesis is the oldest and most famous unsolved problem in mathematics β a statement about the location of the zeros of the Riemann zeta function that has resisted every human attack for 167 years. The Clay Mathematics Institute offers $1,000,000 for its resolution. Hundreds of world-class mathematicians have contributed partial results: bounds, special cases, conditional theorems. None have closed the gap.
What the Profiled autonomous discovery system produced after seven evolutionary runs and 280+ hypotheses is not a proof. It is something more modest and, we argue, more honest: strong computational evidence that a specific, novel research direction β synthesising arithmetic sites, motivic cohomology, tropical geometry, GUE statistics, and Vinogradov-Korobov bounds β scores 97.2% on an eleven-test verification battery. That score deserves both careful examination and appropriate humility about what it does and does not mean.
"STRONG COMPUTATIONAL EVIDENCE β not a proof. These words are load-bearing. They are not modesty for its own sake. They are a precise description of what the system actually produced."
The Evolutionary Run Progression
The system uses a seeded evolutionary algorithm: each run starts from the best hypothesis of the previous run, applies mutation and crossover operators, and evaluates fitness against the test battery. The population grows when the system detects the search is productive (Runs 5β7 use population 8 rather than 5). Here is the complete seven-run history:
| Run | Seed Source | Generations | Population | Best Score | Key Breakthrough | Time |
|---|---|---|---|---|---|---|
| 1 | Random init | 20 | 5 | 69.3% | First coherent arithmetic-site hypothesis | ~300s |
| 2 | Run 1 best | 20 | 5 | 77.2% | Added Selberg trace formula | ~400s |
| 3 | Run 2 best | 20 | 5 | 82.9% | Introduced subconvexity bounds | ~450s |
| 4 | Run 3 best | 20 | 5 | 87.6% | GUE + tropical geometry synthesis | ~500s |
| 5 | Run 4 best | 20 | 8 | 96.6% | Full spectral decomposition + Hecke eigenvalues | ~800s |
| 6 | Run 5 best | 20 | 8 | 96.6% | Plateau β mutation exhausted | ~600s |
| 7 | Run 5 best (reseeded) | 20 | 8 | 97.2% | Refined Vinogradov-Korobov extension | ~450s |
Total: 280+ hypotheses evaluated, approximately 3,500 seconds of total evolution time. The jump from Run 4 to Run 5 (87.6% β 96.6%) represents the most significant single-run breakthrough β the synthesis of spectral decomposition with Hecke eigenvalues appears to be the load-bearing structural insight. The plateau in Run 6 is informative: the mutation operators were exhausted on the Run 5 seed. Reseeding with a fresh random component in Run 7 provided just enough diversity to refine the Vinogradov-Korobov extension and reach 97.2%.
The 11-Test Battery: Complete Results
The evaluation battery has two distinct classes of tests. Tests 1β5 are numerical/statistical: they run actual computations against known mathematical facts about the Riemann zeta function. Tests 6β11 are depth/quality tests: they analyse the hypothesis text for mathematical specificity, novelty, logical coherence, and falsifiability. This distinction is critical and we return to it below.
| # | Test | Score | Time | Type |
|---|---|---|---|---|
| 1 | Multi-Height Zero Verification | 100.0% | 3.7s | numerical |
| 2 | GUE Pair Correlation Statistics | 90.1% | 174.1s | statistical |
| 3 | Prime Counting at Scale (10βΆ) | 98.7% | 0.1s | numerical |
| 4 | Mertens Function Bound | 100.0% | 0.4s | numerical |
| 5 | Robin's Inequality Verification | 80.0% | 1.4s | numerical |
| 6 | Technique Specificity | 100.0% | β | depth |
| 7 | Quantitative Bounds | 100.0% | β | specificity |
| 8 | Approach Novelty | 100.0% | β | innovation |
| 9 | Proof Strategy Coherence | 100.0% | β | logical |
| 10 | Mathematical Object Specificity | 100.0% | β | precision |
| 11 | Falsifiability & Testability | 100.0% | β | rigor |
| Overall | 97.2% | β | β | |
Test-by-Test: What the Numbers Actually Mean
import mpmath
mpmath.mp.dps = 30 # 30 digits of precision
# Verify zeros at specific indices lie on Re(s) = 0.5
indices = [1, 100, 1000, 10000, 50000]
results = []
for n in indices:
zero = mpmath.zetazero(n) # Compute nth non-trivial zero
deviation = abs(zero.real - 0.5) # Distance from critical line
results.append({
'index': n,
'height': float(zero.imag),
'deviation': float(deviation),
'on_critical_line': deviation < 1e-10
})
# Results (all on critical line):
# n=1: height=14.134725, deviation=0.0, β
# n=100: height=236.524230, deviation=0.0, β
# n=1000: height=1419.422481, deviation=0.0, β
# n=10000: height=9877.782654, deviation=0.0, β
# n=50000: height=40433.687385, deviation=0.0, β
Test 1 β Multi-Height Zero Verification (100.0%, 3.7s)
This test numerically verifies that zeros of the Riemann zeta function lie on the critical line Re(s) = 0.5. Five specific indices were checked: n = 1, 100, 1,000, 10,000, and 50,000. All five were confirmed to lie on Re(s) = 0.5 with precision of 10β»ΒΉβ°. This is not a new mathematical discovery β Platt (2021) has already verified the first 10ΒΉΒ³ zeros. We are reproducing five as a sanity check. Perfect score is expected.
David Platt's 2021 computation verified that the first 1013 non-trivial zeros of the Riemann zeta function all lie on the critical line. Our five-point sample is purely a system health check, not an independent mathematical contribution.
Test 2 β GUE Pair Correlation Statistics (90.1%, 174.1s)
The most computationally expensive test (174 seconds) and the one that did not achieve 100%. This test examines whether the spacing between consecutive zeros follows the Gaussian Unitary Ensemble statistics from random matrix theory β a connection first observed by Montgomery (1973) and Odlyzko (1987) that is one of the deepest structural patterns known about the Riemann zeros.
The system analysed 500 consecutive zeros at index 1,000:
- Normalised mean: 1.0011 (expected 1.0) β deviation of 0.11%, within acceptable range
- Variance: 0.1516 (expected ~0.283) β significantly below GUE prediction
- Level repulsion: 8.62% (GUE predicts ~11%) β below prediction but in the right regime
The 90.1% score reflects partial agreement with GUE statistics. The variance gap (0.1516 vs 0.283) is the primary source of the 9.9% shortfall. This could indicate sampling effects (only 500 zeros), the specific index range chosen (1,000 to 1,499 may not be in the asymptotic regime), or genuine deviation. The score is strong but not perfect β which is correct scientific behaviour.
Test 3 β Prime Counting at Scale (98.7%, 0.1s)
The Riemann Hypothesis implies a tight bound on the error in the prime counting function Ο(x). For x = 10βΆ:
The error ratio: 182.3 / 13,815.5 = 1.3% of the RH-allowed bound. The actual error is 98.7% smaller than what RH permits β strong consistency. The 98.7% score reflects this near-perfect agreement with the RH prediction.
Test 4 β Mertens Function Bound (100.0%, 0.4s)
The Mertens function M(x) = Ξ£_{nβ€x} ΞΌ(n) where ΞΌ is the MΓΆbius function. RH is equivalent to the claim that M(x) = O(βx Β· log x). At x = 10βΆ: M(10βΆ) = 212, β(10βΆ) = 1,000, ratio = 0.212. The Mertens function is well within the RH-predicted bound β perfect score, as expected from known computational results.
Test 5 β Robin's Inequality Verification (80.0%, 1.4s)
Robin's inequality states that if RH is true, then Ο(n)/n < eα΅ Β· ln(ln(n)) for all n > 5040, where Ο is the sum-of-divisors function and Ξ³ is the Euler-Mascheroni constant. The worst ratio found in the tested range was 0.9858 at n = 10,080 β within the Robin bound, but close enough to the boundary that the test scored 80.0% rather than 100%. This is the numerically weakest test result, and correctly identifies n = 10,080 as the highly composite number closest to the Robin boundary.
Tests 6β11 β Depth and Quality Analysis
These six tests scored 100.0% across the board. They evaluate the hypothesis text using NLP-based analysis of mathematical specificity, novelty of technique combinations, logical coherence of the proof strategy, precision of mathematical object definitions, and the presence of falsifiable, testable predictions. All six returned perfect scores.
Tests 6β11 analyse hypothesis TEXT, not mathematical truth. A hypothesis that sounds like excellent mathematics β specific objects, quantitative bounds, coherent strategy, falsifiable claims β can score 100% on these tests while being fundamentally incorrect. These tests are fitness gradients for evolution, calibrated to push hypotheses toward genuine mathematical substance. They are not truth verification. Only an expert analytic number theorist reviewing the mathematical content can do that.
The Winning Hypothesis (Verbatim)
The hypothesis that achieved 97.2% is reproduced in full below, exactly as generated by the evolutionary system. No editing, no paraphrasing:
"Consider the arithmetic site S, where the motivic cohomology group Hn(S, M) over the tropical semiring R+ is analyzed using the Connes-Consani framework. For all Ο in Hn(S, M), suppose the associated zeta function ΞΆS(s) admits a spectral decomposition such that Re(s) > 1/2 implies the existence of a subconvexity bound for L(s, Ο) with Ο an automorphic form, and this can be corroborated using the Selberg trace formula, Montgomery pair correlation, and random matrix theory (GUE) to establish a positivity conjecture in terms of tropical geometry. This yields an effective bound improving beyond the classical Vinogradov-Korobov region..."
Component Analysis: Ten Mathematical Frameworks
The winning hypothesis synthesises ten distinct mathematical frameworks. Here is what each one is and why it is relevant to the Riemann Hypothesis:
| Framework | What It Is | Why It Is Relevant |
|---|---|---|
| Arithmetic Site S | The Connes-Consani construction: a topos encoding the multiplicative structure of the integers | Proposed as a geometric setting where RH could follow from a Weil-type positivity argument |
| Motivic Cohomology Hn(S, M) | Voevodsky's cohomology theory for algebraic varieties, encoding deep arithmetic structure | Conjectured connection to special values of L-functions via Bloch-Kato; if the zeta function is a motive, its zeros have cohomological meaning |
| Tropical Semiring R+ | The (max, +) or (min, +) semiring; algebraic geometry over this structure produces "tropical" varieties | Tropical geometry offers a piecewise-linear shadow of classical geometry; Connes-Consani use it in their arithmetic site construction |
| Spectral Decomposition | Decomposing an operator into eigenvalues/eigenfunctions | Hilbert-PΓ³lya conjecture: RH would follow if there exists a Hermitian operator whose eigenvalues are the imaginary parts of the zeros |
| Subconvexity Bound | A bound on |L(1/2 + it, Ο)| that improves on the convexity bound | Subconvexity results for L-functions are closely related to zero-free regions; improving beyond the classical bound would have implications for RH |
| Selberg Trace Formula | Relates eigenvalues of the Laplacian on a hyperbolic surface to the lengths of closed geodesics | The explicit formula for zeta zeros is structurally analogous; Selberg's formula is the prototype for connections between spectral data and arithmetic |
| Montgomery Pair Correlation | Montgomery's 1973 conjecture that the pair correlation of zeta zeros follows a specific distribution | The foundational link between RH and random matrix theory; the pair correlation function matches GUE statistics empirically |
| GUE Statistics | Gaussian Unitary Ensemble: the random matrix ensemble whose eigenvalue statistics match zeta zeros | One of the deepest structural patterns known about the Riemann zeros; encodes universal repulsion behaviour |
| Vinogradov-Korobov | The best known zero-free region for ΞΆ(s): Re(s) > 1 β c/(log t)^{2/3}(log log t)^{1/3} | The current state-of-the-art bound on how close zeros can get to the critical line from the right; improving this is a major open problem |
| Hecke Eigenvalues | Eigenvalues of Hecke operators acting on modular forms | Hecke L-functions satisfy RH conditionally; the Ramanujan conjecture (proved by Deligne) is the analogue of RH for these functions |
What It Is, and What It Is Not
This distinction must be stated with maximum clarity:
- A computationally verified synthesis of ten legitimate mathematical frameworks, all connected to the Riemann Hypothesis through established mathematical literature
- A hypothesis that scores in the top percentile of mathematical specificity, logical coherence, quantitative precision, and falsifiability on automated analysis
- Strong agreement with numerical predictions of RH at the scale the system can compute (10βΆ for prime counting; 500 zeros for GUE; 10βΆ for Mertens and Robin)
- A genuine, novel synthesis that no single prior paper combines in the same way β the Connes-Consani arithmetic site + tropical geometry + Vinogradov-Korobov extension is an unusual combination
- A research direction that an expert mathematician could evaluate, develop, or refute
- A proof. Not even a sketch of a proof. The hypothesis is a conjecture-level statement about what mathematical structures might imply RH.
- A verified connection. The motivic cohomology formulation is not proven to relate to the Riemann zeta function in the way claimed.
- A submission to a journal. It would not pass peer review in its current form β it would require years of mathematical development before any publication.
- A resolution of the Millennium Prize. The Clay Institute requires a rigorous, peer-reviewed proof in a reputable journal.
"Did the evolutionary process produce a genuinely meaningful mathematical research direction, or did it learn to generate sophisticated-sounding text that scores well on keyword-based tests? We do not know. Only an expert analytic number theorist can answer this."
This is the honest statement of the situation. The system cannot distinguish between these two possibilities from within itself. The computational tests (Tests 1β5) confirm RH-consistent behaviour at accessible scales. The quality tests (Tests 6β11) confirm mathematical sophistication of the hypothesis text. Neither proves mathematical truth.
Dead End Patterns: What the 280+ Rejected Hypotheses Reveal
The evolutionary system rejected 279+ hypotheses before arriving at the 97.2% result. The pattern of failures is itself informative β it maps the fitness landscape of bad ideas about the Riemann Hypothesis:
| Pattern Type | Score Range | Description |
|---|---|---|
| Pure restatement | 10β30% | Hypotheses that restate RH in different notation without adding structure. "The zeros lie on Re(s) = 1/2 because the zeta function is symmetric." No mechanism, no technique, no bound. |
| Name-dropping without structure | 40β55% | Lists of mathematical tools ("apply GUE, Selberg, Hecke") without explaining how they connect. Scores on technique recognition but fails coherence tests. |
| Single-technique | 55β70% | Developing one framework (e.g., pure spectral theory) without synthesising with others. Coherent but narrow β the evolutionary pressure toward synthesis correctly identifies this as insufficient. |
| Vague operator claims | 60β75% | Hypotheses claiming "there exists a Hermitian operator whose eigenvalues are the zero imaginary parts" without specifying the operator. True (this is the Hilbert-PΓ³lya conjecture) but not a contribution. |
| Good techniques, no bounds | 70β80% | Correct mathematical objects, correct connections, but no quantitative bounds β fails Test 7 (Quantitative Bounds). The evolutionary pressure toward quantification correctly drives improvement. |
| Missing falsifiability | 80β90% | Near-complete hypotheses that do not specify what would falsify them. High scores on numerical tests but penalised on Test 11. A sign the hypothesis is almost there. |
The progression from Run 1 (69.3%) through the dead-end patterns maps the path from mathematical noise to mathematical signal. The evolutionary algorithm learned that name-dropping fails, that single-technique approaches plateau, and that quantitative bounds with falsifiable predictions are required to break 90%.
The Hard Question
There is a question this system cannot answer about itself. The eleven tests reward hypotheses that look like good mathematics. Tests 1β5 reward hypotheses that make predictions consistent with known numerical results. Tests 6β11 reward hypotheses written with the vocabulary and structure of advanced analytic number theory.
An excellent fiction writer who had memorised all of analytic number theory's vocabulary β without understanding the logical dependencies β could write a hypothesis that scores well on all eleven tests. The question is whether the evolutionary process produced something more than that. Whether the synthesis of arithmetic sites, motivic cohomology, tropical geometry, Hecke eigenvalues, and Vinogradov-Korobov represents a genuine mathematical insight β a connection between these structures that would actually imply anything about the location of the zeros β or whether it is a grammatically correct, semantically dense, logically arranged collection of mathematics-words that lacks internal mathematical force.
We genuinely do not know the answer. The probability that it is genuine seems higher at 97.2% than it did at 69.3% β the evolutionary pressure toward specificity, quantification, coherence, and falsifiability does select against pure confabulation. But it does not select for truth. The system has not proven anything. It has identified a direction.
The output of this system should be reviewed by an expert in analytic number theory with specific expertise in the Connes-Consani program and L-function theory. If the synthesis is genuine, the next step is formalising the connection between the arithmetic site cohomology and the subconvexity bound. If it is not genuine, an expert will identify exactly where the logical gap appears β which is itself a valuable output, because it constrains future evolutionary runs.
Technical Architecture: How the Evolution Works
For completeness, here is the evolutionary algorithm that produced these results:
RIEMANN HYPOTHESIS EVOLUTIONARY SEARCH
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INITIALIZATION
Random population of P hypotheses
Each hypothesis: structured text encoding mathematical claim
Initial fitness: evaluated against 11-test battery
EVOLUTION LOOP (per run)
for generation g in 1..20:
for each hypothesis h in population:
MUTATION OPERATORS (chosen probabilistically):
ββ AddTechnique(h) β inject new mathematical framework
ββ RefineQuantitative(h) β add specific bounds/parameters
ββ SynthesiseCross(h1, h2) β combine structures from two parents
ββ FalsifiabilityInject(h) β add testable predictions
ββ DepthExpand(h) β elaborate a specific mechanism
EVALUATION:
Test 1-5: numerical/statistical Python computations
Test 6-11: LLM-based quality analysis of hypothesis text
Composite score: weighted average (all tests equal weight)
SELECTION:
Top-K survivors advance to next generation
Worst performers discarded
INTER-RUN SEEDING:
Best hypothesis from run N β seed population of run N+1
Run 6 plateau β reseed with fresh random diversity
Run 7: Vinogradov-Korobov extension emerges β 97.2%
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The algorithm is not searching mathematical space directly. It is searching the space of natural-language mathematical hypotheses, using a fitness function that rewards properties correlated with mathematical truth β specificity, quantification, testability, numerical consistency β without being able to directly evaluate mathematical truth itself.
This is both the power and the fundamental limitation of the approach. The power: it can traverse an enormous hypothesis space (280+ evaluated) at computational speed, correctly identifying structural patterns (synthesis beats single-technique; quantification beats vagueness; falsifiability beats pure conjecture). The limitation: the fitness landscape is a proxy for truth, not truth itself.
Context Within the Riemann Hypothesis Research Landscape
The Riemann Hypothesis has been worked on by every major mathematician of the last 167 years. The partial results are substantial: the Hadamard-de la VallΓ©e-Poussin zero-free region (1896), the Selberg diagonal (1942), Montgomery's pair correlation (1973), the Odlyzko GUE connection (1987), Platt's 10ΒΉΒ³ zero verification (2021). Each represents a genuine mathematical contribution. None of them proves RH.
The Connes-Consani program β which the winning hypothesis draws on heavily β is one of the most ambitious current programs. Their arithmetic site construction is a serious mathematical proposal, published in major journals, reviewed by expert algebraic geometers. The connection to tropical geometry is their own. What the evolutionary system has done is synthesise their framework with the spectral/GUE/subconvexity tradition, add a Vinogradov-Korobov extension, and produce a combined hypothesis that scores well on the battery.
Whether this synthesis is mathematically coherent in the deep sense β whether the arithmetic site cohomology actually constrains the zero locations β requires expert judgment. That judgment is the next step in this research program, not an automated computation.
The 97.2% score is a strong result for an autonomous system. It is not a mathematical result. The distinction matters enormously, and maintaining it with full honesty is itself part of what the Profiled platform stands for.