Riemann Hypothesis: 97.2% on 11-Test Computational Battery

The Riemann Hypothesis is the oldest and most famous unsolved problem in mathematics — a statement about the location of the zeros of the Riemann zeta function that has resisted every human attack for 167 years. The Clay Mathematics Institute offers $1,000,000 for its resolution. Hundreds of world-class mathematicians have contributed partial results: bounds, special cases, conditional theorems. None have closed the gap.

What the Profiled autonomous discovery system produced after seven evolutionary runs and 280+ hypotheses is not a proof. It is something more modest and, we argue, more honest: strong computational evidence that a specific, novel research direction — synthesising arithmetic sites, motivic cohomology, tropical geometry, GUE statistics, and Vinogradov-Korobov bounds — scores 97.2% on an eleven-test verification battery. That score deserves both careful examination and appropriate humility about what it does and does not mean.

"STRONG COMPUTATIONAL EVIDENCE — not a proof. These words are load-bearing. They are not modesty for its own sake. They are a precise description of what the system actually produced."

The Evolutionary Run Progression

The system uses a seeded evolutionary algorithm: each run starts from the best hypothesis of the previous run, applies mutation and crossover operators, and evaluates fitness against the test battery. The population grows when the system detects the search is productive (Runs 5–7 use population 8 rather than 5). Here is the complete seven-run history:

Run	Seed Source	Generations	Population	Best Score	Key Breakthrough	Time
1	Random init	20	5	69.3%	First coherent arithmetic-site hypothesis	~300s
2	Run 1 best	20	5	77.2%	Added Selberg trace formula	~400s
3	Run 2 best	20	5	82.9%	Introduced subconvexity bounds	~450s
4	Run 3 best	20	5	87.6%	GUE + tropical geometry synthesis	~500s
5	Run 4 best	20	8	96.6%	Full spectral decomposition + Hecke eigenvalues	~800s
6	Run 5 best	20	8	96.6%	Plateau — mutation exhausted	~600s
7	Run 5 best (reseeded)	20	8	97.2%	Refined Vinogradov-Korobov extension	~450s

Total: 280+ hypotheses evaluated, approximately 3,500 seconds of total evolution time. The jump from Run 4 to Run 5 (87.6% → 96.6%) represents the most significant single-run breakthrough — the synthesis of spectral decomposition with Hecke eigenvalues appears to be the load-bearing structural insight. The plateau in Run 6 is informative: the mutation operators were exhausted on the Run 5 seed. Reseeding with a fresh random component in Run 7 provided just enough diversity to refine the Vinogradov-Korobov extension and reach 97.2%.

280+

Hypotheses Evaluated

Evolutionary Runs

~3,500s

Total Evolution Time

97.2%

Final Battery Score

The 11-Test Battery: Complete Results

The evaluation battery has two distinct classes of tests. Tests 1–5 are numerical/statistical: they run actual computations against known mathematical facts about the Riemann zeta function. Tests 6–11 are depth/quality tests: they analyse the hypothesis text for mathematical specificity, novelty, logical coherence, and falsifiability. This distinction is critical and we return to it below.

#	Test	Score	Time	Type
1	Multi-Height Zero Verification	100.0%	3.7s	numerical
2	GUE Pair Correlation Statistics	90.1%	174.1s	statistical
3	Prime Counting at Scale (10⁶)	98.7%	0.1s	numerical
4	Mertens Function Bound	100.0%	0.4s	numerical
5	Robin's Inequality Verification	80.0%	1.4s	numerical
6	Technique Specificity	100.0%	—	depth
7	Quantitative Bounds	100.0%	—	specificity
8	Approach Novelty	100.0%	—	innovation
9	Proof Strategy Coherence	100.0%	—	logical
10	Mathematical Object Specificity	100.0%	—	precision
11	Falsifiability & Testability	100.0%	—	rigor
Overall		97.2%	—	—

Test-by-Test: What the Numbers Actually Mean

Python — Actual Zero Verification Code (Test 1)

import mpmath
mpmath.mp.dps = 30  # 30 digits of precision

# Verify zeros at specific indices lie on Re(s) = 0.5
indices = [1, 100, 1000, 10000, 50000]
results = []
for n in indices:
    zero = mpmath.zetazero(n)        # Compute nth non-trivial zero
    deviation = abs(zero.real - 0.5) # Distance from critical line
    results.append({
        'index': n,
        'height': float(zero.imag),
        'deviation': float(deviation),
        'on_critical_line': deviation < 1e-10
    })

# Results (all on critical line):
# n=1:     height=14.134725, deviation=0.0, ✅
# n=100:   height=236.524230, deviation=0.0, ✅
# n=1000:  height=1419.422481, deviation=0.0, ✅
# n=10000: height=9877.782654, deviation=0.0, ✅
# n=50000: height=40433.687385, deviation=0.0, ✅

Test 1 — Multi-Height Zero Verification (100.0%, 3.7s)

This test numerically verifies that zeros of the Riemann zeta function lie on the critical line Re(s) = 0.5. Five specific indices were checked: n = 1, 100, 1,000, 10,000, and 50,000. All five were confirmed to lie on Re(s) = 0.5 with precision of 10⁻¹⁰. This is not a new mathematical discovery — Platt (2021) has already verified the first 10¹³ zeros. We are reproducing five as a sanity check. Perfect score is expected.

Verification Context

David Platt's 2021 computation verified that the first 10¹³ non-trivial zeros of the Riemann zeta function all lie on the critical line. Our five-point sample is purely a system health check, not an independent mathematical contribution.

Test 2 — GUE Pair Correlation Statistics (90.1%, 174.1s)

The most computationally expensive test (174 seconds) and the one that did not achieve 100%. This test examines whether the spacing between consecutive zeros follows the Gaussian Unitary Ensemble statistics from random matrix theory — a connection first observed by Montgomery (1973) and Odlyzko (1987) that is one of the deepest structural patterns known about the Riemann zeros.

The system analysed 500 consecutive zeros at index 1,000:

Normalised mean: 1.0011 (expected 1.0) — deviation of 0.11%, within acceptable range
Variance: 0.1516 (expected ~0.283) — significantly below GUE prediction
Level repulsion: 8.62% (GUE predicts ~11%) — below prediction but in the right regime

The 90.1% score reflects partial agreement with GUE statistics. The variance gap (0.1516 vs 0.283) is the primary source of the 9.9% shortfall. This could indicate sampling effects (only 500 zeros), the specific index range chosen (1,000 to 1,499 may not be in the asymptotic regime), or genuine deviation. The score is strong but not perfect — which is correct scientific behaviour.

Test 3 — Prime Counting at Scale (98.7%, 0.1s)

The Riemann Hypothesis implies a tight bound on the error in the prime counting function π(x). For x = 10⁶:

π(10⁶) — actual count of primes up to 1,000,000

78,498

Li(10⁶) — logarithmic integral (RH prediction)

78,680.3

Absolute error |π(10⁶) − Li(10⁶)|

182.3

RH-implied bound: √(10⁶) · ln(10⁶) / (8π)

13,815.5

The error ratio: 182.3 / 13,815.5 = 1.3% of the RH-allowed bound. The actual error is 98.7% smaller than what RH permits — strong consistency. The 98.7% score reflects this near-perfect agreement with the RH prediction.

Test 4 — Mertens Function Bound (100.0%, 0.4s)

The Mertens function M(x) = Σ_{n≤x} μ(n) where μ is the Möbius function. RH is equivalent to the claim that M(x) = O(√x · log x). At x = 10⁶: M(10⁶) = 212, √(10⁶) = 1,000, ratio = 0.212. The Mertens function is well within the RH-predicted bound — perfect score, as expected from known computational results.

Test 5 — Robin's Inequality Verification (80.0%, 1.4s)

Robin's inequality states that if RH is true, then σ(n)/n < eᵞ · ln(ln(n)) for all n > 5040, where σ is the sum-of-divisors function and γ is the Euler-Mascheroni constant. The worst ratio found in the tested range was 0.9858 at n = 10,080 — within the Robin bound, but close enough to the boundary that the test scored 80.0% rather than 100%. This is the numerically weakest test result, and correctly identifies n = 10,080 as the highly composite number closest to the Robin boundary.

Tests 6–11 — Depth and Quality Analysis

These six tests scored 100.0% across the board. They evaluate the hypothesis text using NLP-based analysis of mathematical specificity, novelty of technique combinations, logical coherence of the proof strategy, precision of mathematical object definitions, and the presence of falsifiable, testable predictions. All six returned perfect scores.

Critical Caveat

Tests 6–11 analyse hypothesis TEXT, not mathematical truth. A hypothesis that sounds like excellent mathematics — specific objects, quantitative bounds, coherent strategy, falsifiable claims — can score 100% on these tests while being fundamentally incorrect. These tests are fitness gradients for evolution, calibrated to push hypotheses toward genuine mathematical substance. They are not truth verification. Only an expert analytic number theorist reviewing the mathematical content can do that.

The Winning Hypothesis (Verbatim)

The hypothesis that achieved 97.2% is reproduced in full below, exactly as generated by the evolutionary system. No editing, no paraphrasing:

"Consider the arithmetic site S, where the motivic cohomology group Hⁿ(S, M) over the tropical semiring R₊ is analyzed using the Connes-Consani framework. For all ψ in Hⁿ(S, M), suppose the associated zeta function ζ_S(s) admits a spectral decomposition such that Re(s) > 1/2 implies the existence of a subconvexity bound for L(s, π) with π an automorphic form, and this can be corroborated using the Selberg trace formula, Montgomery pair correlation, and random matrix theory (GUE) to establish a positivity conjecture in terms of tropical geometry. This yields an effective bound improving beyond the classical Vinogradov-Korobov region..."

Component Analysis: Ten Mathematical Frameworks

The winning hypothesis synthesises ten distinct mathematical frameworks. Here is what each one is and why it is relevant to the Riemann Hypothesis:

Framework	What It Is	Why It Is Relevant
Arithmetic Site S	The Connes-Consani construction: a topos encoding the multiplicative structure of the integers	Proposed as a geometric setting where RH could follow from a Weil-type positivity argument
Motivic Cohomology Hⁿ(S, M)	Voevodsky's cohomology theory for algebraic varieties, encoding deep arithmetic structure	Conjectured connection to special values of L-functions via Bloch-Kato; if the zeta function is a motive, its zeros have cohomological meaning
Tropical Semiring R₊	The (max, +) or (min, +) semiring; algebraic geometry over this structure produces "tropical" varieties	Tropical geometry offers a piecewise-linear shadow of classical geometry; Connes-Consani use it in their arithmetic site construction
Spectral Decomposition	Decomposing an operator into eigenvalues/eigenfunctions	Hilbert-Pólya conjecture: RH would follow if there exists a Hermitian operator whose eigenvalues are the imaginary parts of the zeros
Subconvexity Bound	A bound on \|L(1/2 + it, π)\| that improves on the convexity bound	Subconvexity results for L-functions are closely related to zero-free regions; improving beyond the classical bound would have implications for RH
Selberg Trace Formula	Relates eigenvalues of the Laplacian on a hyperbolic surface to the lengths of closed geodesics	The explicit formula for zeta zeros is structurally analogous; Selberg's formula is the prototype for connections between spectral data and arithmetic
Montgomery Pair Correlation	Montgomery's 1973 conjecture that the pair correlation of zeta zeros follows a specific distribution	The foundational link between RH and random matrix theory; the pair correlation function matches GUE statistics empirically
GUE Statistics	Gaussian Unitary Ensemble: the random matrix ensemble whose eigenvalue statistics match zeta zeros	One of the deepest structural patterns known about the Riemann zeros; encodes universal repulsion behaviour
Vinogradov-Korobov	The best known zero-free region for ζ(s): Re(s) > 1 − c/(log t)^{2/3}(log log t)^{1/3}	The current state-of-the-art bound on how close zeros can get to the critical line from the right; improving this is a major open problem
Hecke Eigenvalues	Eigenvalues of Hecke operators acting on modular forms	Hecke L-functions satisfy RH conditionally; the Ramanujan conjecture (proved by Deligne) is the analogue of RH for these functions

What It Is, and What It Is Not

This distinction must be stated with maximum clarity:

What the 97.2% Result IS

A computationally verified synthesis of ten legitimate mathematical frameworks, all connected to the Riemann Hypothesis through established mathematical literature
A hypothesis that scores in the top percentile of mathematical specificity, logical coherence, quantitative precision, and falsifiability on automated analysis
Strong agreement with numerical predictions of RH at the scale the system can compute (10⁶ for prime counting; 500 zeros for GUE; 10⁶ for Mertens and Robin)
A genuine, novel synthesis that no single prior paper combines in the same way — the Connes-Consani arithmetic site + tropical geometry + Vinogradov-Korobov extension is an unusual combination
A research direction that an expert mathematician could evaluate, develop, or refute

What the 97.2% Result Is NOT

A proof. Not even a sketch of a proof. The hypothesis is a conjecture-level statement about what mathematical structures might imply RH.
A verified connection. The motivic cohomology formulation is not proven to relate to the Riemann zeta function in the way claimed.
A submission to a journal. It would not pass peer review in its current form — it would require years of mathematical development before any publication.
A resolution of the Millennium Prize. The Clay Institute requires a rigorous, peer-reviewed proof in a reputable journal.

"Did the evolutionary process produce a genuinely meaningful mathematical research direction, or did it learn to generate sophisticated-sounding text that scores well on keyword-based tests? We do not know. Only an expert analytic number theorist can answer this."

This is the honest statement of the situation. The system cannot distinguish between these two possibilities from within itself. The computational tests (Tests 1–5) confirm RH-consistent behaviour at accessible scales. The quality tests (Tests 6–11) confirm mathematical sophistication of the hypothesis text. Neither proves mathematical truth.

Dead End Patterns: What the 280+ Rejected Hypotheses Reveal

The evolutionary system rejected 279+ hypotheses before arriving at the 97.2% result. The pattern of failures is itself informative — it maps the fitness landscape of bad ideas about the Riemann Hypothesis:

Pattern Type	Score Range	Description
Pure restatement	10–30%	Hypotheses that restate RH in different notation without adding structure. "The zeros lie on Re(s) = 1/2 because the zeta function is symmetric." No mechanism, no technique, no bound.
Name-dropping without structure	40–55%	Lists of mathematical tools ("apply GUE, Selberg, Hecke") without explaining how they connect. Scores on technique recognition but fails coherence tests.
Single-technique	55–70%	Developing one framework (e.g., pure spectral theory) without synthesising with others. Coherent but narrow — the evolutionary pressure toward synthesis correctly identifies this as insufficient.
Vague operator claims	60–75%	Hypotheses claiming "there exists a Hermitian operator whose eigenvalues are the zero imaginary parts" without specifying the operator. True (this is the Hilbert-Pólya conjecture) but not a contribution.
Good techniques, no bounds	70–80%	Correct mathematical objects, correct connections, but no quantitative bounds — fails Test 7 (Quantitative Bounds). The evolutionary pressure toward quantification correctly drives improvement.
Missing falsifiability	80–90%	Near-complete hypotheses that do not specify what would falsify them. High scores on numerical tests but penalised on Test 11. A sign the hypothesis is almost there.

The progression from Run 1 (69.3%) through the dead-end patterns maps the path from mathematical noise to mathematical signal. The evolutionary algorithm learned that name-dropping fails, that single-technique approaches plateau, and that quantitative bounds with falsifiable predictions are required to break 90%.

The Hard Question

There is a question this system cannot answer about itself. The eleven tests reward hypotheses that look like good mathematics. Tests 1–5 reward hypotheses that make predictions consistent with known numerical results. Tests 6–11 reward hypotheses written with the vocabulary and structure of advanced analytic number theory.

An excellent fiction writer who had memorised all of analytic number theory's vocabulary — without understanding the logical dependencies — could write a hypothesis that scores well on all eleven tests. The question is whether the evolutionary process produced something more than that. Whether the synthesis of arithmetic sites, motivic cohomology, tropical geometry, Hecke eigenvalues, and Vinogradov-Korobov represents a genuine mathematical insight — a connection between these structures that would actually imply anything about the location of the zeros — or whether it is a grammatically correct, semantically dense, logically arranged collection of mathematics-words that lacks internal mathematical force.

We genuinely do not know the answer. The probability that it is genuine seems higher at 97.2% than it did at 69.3% — the evolutionary pressure toward specificity, quantification, coherence, and falsifiability does select against pure confabulation. But it does not select for truth. The system has not proven anything. It has identified a direction.

The Honest Next Step

The output of this system should be reviewed by an expert in analytic number theory with specific expertise in the Connes-Consani program and L-function theory. If the synthesis is genuine, the next step is formalising the connection between the arithmetic site cohomology and the subconvexity bound. If it is not genuine, an expert will identify exactly where the logical gap appears — which is itself a valuable output, because it constrains future evolutionary runs.

Technical Architecture: How the Evolution Works

For completeness, here is the evolutionary algorithm that produced these results:

RIEMANN HYPOTHESIS EVOLUTIONARY SEARCH
═══════════════════════════════════════════════════════════════

INITIALIZATION
  Random population of P hypotheses
  Each hypothesis: structured text encoding mathematical claim
  Initial fitness: evaluated against 11-test battery

EVOLUTION LOOP (per run)
  for generation g in 1..20:
    for each hypothesis h in population:

      MUTATION OPERATORS (chosen probabilistically):
        ├─ AddTechnique(h)     → inject new mathematical framework
        ├─ RefineQuantitative(h) → add specific bounds/parameters
        ├─ SynthesiseCross(h1, h2) → combine structures from two parents
        ├─ FalsifiabilityInject(h) → add testable predictions
        └─ DepthExpand(h)     → elaborate a specific mechanism

      EVALUATION:
        Test 1-5: numerical/statistical Python computations
        Test 6-11: LLM-based quality analysis of hypothesis text
        Composite score: weighted average (all tests equal weight)

      SELECTION:
        Top-K survivors advance to next generation
        Worst performers discarded

INTER-RUN SEEDING:
  Best hypothesis from run N → seed population of run N+1
  Run 6 plateau → reseed with fresh random diversity
  Run 7: Vinogradov-Korobov extension emerges → 97.2%
═══════════════════════════════════════════════════════════════

The algorithm is not searching mathematical space directly. It is searching the space of natural-language mathematical hypotheses, using a fitness function that rewards properties correlated with mathematical truth — specificity, quantification, testability, numerical consistency — without being able to directly evaluate mathematical truth itself.

This is both the power and the fundamental limitation of the approach. The power: it can traverse an enormous hypothesis space (280+ evaluated) at computational speed, correctly identifying structural patterns (synthesis beats single-technique; quantification beats vagueness; falsifiability beats pure conjecture). The limitation: the fitness landscape is a proxy for truth, not truth itself.

Context Within the Riemann Hypothesis Research Landscape

The Riemann Hypothesis has been worked on by every major mathematician of the last 167 years. The partial results are substantial: the Hadamard-de la Vallée-Poussin zero-free region (1896), the Selberg diagonal (1942), Montgomery's pair correlation (1973), the Odlyzko GUE connection (1987), Platt's 10¹³ zero verification (2021). Each represents a genuine mathematical contribution. None of them proves RH.

The Connes-Consani program — which the winning hypothesis draws on heavily — is one of the most ambitious current programs. Their arithmetic site construction is a serious mathematical proposal, published in major journals, reviewed by expert algebraic geometers. The connection to tropical geometry is their own. What the evolutionary system has done is synthesise their framework with the spectral/GUE/subconvexity tradition, add a Vinogradov-Korobov extension, and produce a combined hypothesis that scores well on the battery.

Whether this synthesis is mathematically coherent in the deep sense — whether the arithmetic site cohomology actually constrains the zero locations — requires expert judgment. That judgment is the next step in this research program, not an automated computation.

The 97.2% score is a strong result for an autonomous system. It is not a mathematical result. The distinction matters enormously, and maintaining it with full honesty is itself part of what the Profiled platform stands for.