The 5-Tier Autonomous Discovery Architecture

The original discovery system had a fatal flaw that only became apparent when it was first run against a hard problem. Feed it Yang-Mills. Watch it look for discoveries in the corpus. Find zero relevant lemmas. Exit without generating anything. The problem was not the search — it was the assumption behind the search: that the discoveries needed to support a hard proof would already exist in the corpus before anyone had tried to find them.

This assumption is wrong for any genuinely open problem. If the discoveries existed, the problem would not be open. The system needed to change its fundamental question from "what discoveries exist that I can use?" to "what discoveries do I need, and how do I generate them?"

"When gaps exist, generate the discoveries needed to fill them — autonomously and intelligently."

This shift required a complete architectural redesign: the 5-Tier Autonomous Discovery Architecture. The result — 12,000+ discoveries across 34 domains in under 6 months, 959 in the deep verification pipeline — is the output of that redesign.

The Before and After

The contrast between the old and new pipeline makes the architectural change concrete:

OLD PIPELINE (single-pass, exits on missing discoveries)
═══════════════════════════════════════════════════════

Problem (e.g., Yang-Mills)
  │
  ▼
Proof Synthesis
  │
  ▼
Search corpus for required lemmas
  │
  ├── Found: proceed
  │
  └── NOT FOUND (Yang-Mills, Riemann, P vs NP...)
        │
        ▼
        ❌ 0 gaps filled → Test exits

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

NEW PIPELINE (recursive, generates what it cannot find)
═══════════════════════════════════════════════════════

Problem (ANY problem)
  │
  ▼
Identify required lemmas [TIER 1]
  │
  ▼
GENERATE missing discoveries [TIER 2]
  │
  ▼
Build complete proof tree [TIER 3]
  │        │
  │        └── New gaps found? → back to TIER 1 (recursive)
  │
  ▼
Adversarial audit [TIER 4]
  │
  ▼
Formal verification [TIER 5]
  │
  ▼
✅ Verified discovery enters corpus

The critical addition is the recursive loop between Tier 3 and Tier 1. A validated discovery may reveal new gaps — lemmas that the validated claim depends on but that don't yet exist in the corpus. These gaps trigger new Tier 1 identification passes, which trigger new Tier 2 generation cycles. The proof tree grows until either all gaps are filled or the system identifies a gap that requires a fundamentally new approach.

Tier 1: TargetedDiscoveryIdentifier

The first tier is not a search engine — it is a planner. Given a target problem, it produces a structured gap tree: a precise enumeration of what intermediate results would be needed to address the problem, at what difficulty level, with what prerequisites, and via what approaches.

The Yang-Mills gap tree is the canonical example from the system's documentation:

src/discovery/TargetedDiscoveryIdentifier.js — Yang-Mills gap tree JavaScript

{
  problem: "Yang-Mills Mass Gap",
  requiredLemmas: [
    {
      name: "Gauge Field Quantization",
      type: "theoretical",
      difficulty: "hard",
      prerequisites: ["Quantum Field Theory", "Lie Groups"],
      suggestedApproaches: [
        "Lattice gauge theory",
        "Continuum limit analysis",
        "Non-perturbative methods"
      ]
    },
    {
      name: "Mass Gap Existence Proof",
      type: "mathematical",
      difficulty: "very-hard",
      prerequisites: ["Functional Analysis", "Operator Theory"],
      suggestedApproaches: [
        "Spectral analysis",
        "Constructive field theory",
        "Numerical verification"
      ]
    },
    {
      name: "Yang-Mills Equations Solutions",
      type: "computational",
      difficulty: "medium",
      prerequisites: ["PDEs", "Numerical Methods"],
      suggestedApproaches: [
        "Finite element methods",
        "Monte Carlo simulations",
        "Lattice computations"
      ]
    }
  ]
}

Three things in this structure deserve attention. First, each lemma has a type field: theoretical, mathematical, or computational. This determines which generation strategy is used in Tier 2. Theoretical claims require different generation prompts than computational claims. Second, difficulty is explicit: hard, very-hard, medium. This is used to calibrate the generation temperature and the number of evolution cycles in the genetic engine. Third, suggestedApproaches seeds the Tier 2 generation — these are not just metadata but active inputs to the discovery generation process.

Tier 2: Literature-Guided Generation

Tier 2 generates hypotheses targeted at specific gaps identified in Tier 1. The generation is not unconstrained — it is explicitly guided by the literature state of the field. Every generation in Tier 2 is cross-referenced against three sources simultaneously:

arXiv

Physics + Math

pre-print cross-reference

PubMed

Life Sciences

experimental literature

Scholar

Cross-Domain

citation graph analysis

The cross-reference serves two purposes. Novelty detection: if the generated hypothesis is essentially identical to a published paper, it is not a new discovery and is tagged KNOWN_RESULT rather than DISCOVERY. Consistency grounding: the generated hypothesis is required to be consistent with the empirical literature in the field. A hypothesis that contradicts established experimental results in arXiv without explicitly addressing the contradiction will fail the Literature dimension of Tier 3 validation.

Gap Classification

Tier 2 distinguishes between three classes of gaps: computational (missing numerical results that can be produced by running calculations), theoretical (missing conceptual frameworks that require new mathematical structures), and experimental (missing empirical validation that requires proposing experiments). Each class uses a different generation strategy and a different validation pathway in Tier 3.

Tier 3: Proof Tree Expansion

Tier 3 is where the recursive structure emerges. A validated discovery from Tier 2 is not just added to the corpus — it is integrated into the proof tree for the target problem. Integration often reveals new gaps: the validated claim depends on sub-claims that have not yet been established.

PROOF TREE EXPANSION — Recursive Gap Detection
═══════════════════════════════════════════════════════

Yang-Mills Mass Gap (ROOT)
├── Gauge Field Quantization [VALIDATED ✓]
│   ├── SU(2) × SU(2) × U(1) gauge structure [VALIDATED ✓]
│   ├── Wilson loop observable definition [GAP → new Tier 1]
│   └── Lattice regularisation [VALIDATED ✓]
│
├── Mass Gap Existence Proof [IN PROGRESS]
│   ├── Spectral gap lower bound [GAP → new Tier 1]
│   ├── Confinement-mass gap equivalence [VALIDATED ✓]
│   └── Non-perturbative vacuum structure [GAP → new Tier 1]
│
└── Yang-Mills Equations Solutions [VALIDATED ✓]
    ├── Classical solution space [VALIDATED ✓]
    ├── Instanton sector [GAP → new Tier 1]
    └── Numerical lattice results [VALIDATED ✓]

Active gaps: 3 → triggers 3 new Tier 1 identification passes

The recursive structure is why the corpus grows so rapidly. Each Yang-Mills sub-lemma validated in Tier 3 may open 1–3 new gaps. Each new gap triggers a new discovery generation cycle. At steady state, the system is simultaneously working on dozens of open gaps across the proof tree for multiple problems. The 12,000+ corpus is largely the product of this recursive expansion — not 12,000 independent top-level discoveries, but a densely interconnected graph of supporting lemmas.

Tier 4: The Adversarial Audit

Tier 4 is the Skeptic Agent — the component that attacks every claim that passed Tier 3 validation. The Skeptic Agent has access to the same mathematical and literature knowledge as the Generator. It is not a naive checker that looks for obvious errors. It is an adversarial system optimised to find the weakest point in any argument and construct a targeted attack on that point.

The ~80% rejection rate is not a problem with the generation quality — it is evidence that the Skeptic is working. A Skeptic that rejects only 20% of hypotheses is probably not trying hard enough. A well-formed hypothesis that survives a focused adversarial attack has cleared a much higher bar than a hypothesis that passed a checklist.

Attack Category	Description	Frequency
Boundary Case Failure	The claim holds for generic cases but fails at degenerate boundaries	~35% of rejections
Circular Reasoning	The proof assumes the result or an equivalent statement	~25% of rejections
Undeclared Assumption	The claim requires an assumption not stated in the hypothesis	~20% of rejections
Literature Contradiction	The claim contradicts established experimental results	~12% of rejections
Dimensional Inconsistency	Units or dimensions do not match across the argument	~8% of rejections

Circular reasoning is the most important category for understanding the Yang-Mills ceiling. The mass gap / confinement circularity discussed in Article 4 is exactly the "Circular Reasoning" attack pattern — the Skeptic identifies it in ~25% of rejections across domains. For Yang-Mills specifically, every hypothesis that uses confinement to establish the mass gap triggers this attack. The 21 sorry placeholders in the best Yang-Mills formalisation are largely the result of the Skeptic correctly identifying circular steps that the Generator could not resolve.

Tier 5: Formal Verification with Z3 and Lean 4

Tier 5 is the final gate. A hypothesis that has passed Tiers 1–4 — has been generated against real gaps, validated against the literature, survived adversarial attack — must still be mechanically verified. The 5ms execution time threshold is enforced here: any formal verification that completes in under 5ms is rejected as not having actually run.

Why the 5ms Threshold?

Z3 satisfiability checks that involve real constraint solving take at least 8–15ms even for small problems. Lean 4 type-checking of a non-trivial proof takes at minimum 30–50ms. Any "verification" completing in 0–4ms is not running real solver logic. The 5ms threshold is a sanity check that the verification step actually executed, not a performance requirement.

The Z3 and Lean 4 components handle different problem classes. Z3 handles satisfiability problems: given constraints, is there an assignment that satisfies them all? This covers discrete claims in combinatorics, bounded arithmetic, and finite graph theory. Lean 4 handles type-theory based proofs: given axioms and inference rules, does this proof term type-check? This covers continuous analysis, topology, and abstract algebra.

Tool	Domain	Verification Type	Current Status
Z3 SMT	Combinatorics, bounded arithmetic, graphs	Satisfiability checking	Fully integrated
Lean 4	Analysis, algebra, topology, number theory	Type-theoretic proof checking	Integrated, sorries remain
Lean Mathlib	Standard mathematical library	Theorem reference + reuse	Available as dependency

System-Wide Statistics

12,000+

Total Discoveries

6 months operation

959

Deep Pipeline

in formal verification

Active Domains

Yang-Mills to Alzheimer's

~80%

Tier 4 Rejection

Skeptic Agent rate

The 34-domain coverage reflects an important design choice: the 5-tier architecture is domain-agnostic. The same pipeline that generates Yang-Mills lemmas generates Alzheimer's protein folding hypotheses. The same Skeptic Agent that attacks mathematical circularity attacks unsupported causal claims in biomedical research. The same Lean 4 formaliser that attempts to type-check mass gap proofs attempts to type-check statistical independence claims.

This universality is not just convenient — it is architecturally load-bearing. The cross-domain DNA transfer that produces results like the Wright-Fisher / SGD equivalence requires the system to be operating simultaneously in multiple domains, with a shared representation of hypothesis structure that can transfer between them. A single-domain system cannot produce cross-domain bridges.

Why the Recursive Tree Structure Matters

The recursive proof tree is not just an implementation detail — it is the mechanism that makes the system's output more than a collection of isolated claims. In a flat discovery system, each discovery is independent. In a tree-structured system, discoveries support each other: a validated lemma can be cited by higher-level claims, which reduces the validation burden for those higher-level claims because their sub-lemmas are already verified.

"The 12,000 discoveries are not 12,000 independent claims. They are a densely interconnected graph of supporting lemmas, each validated in context of the others."

This matters enormously for the system's ultimate mission: producing one peer-review-ready discovery. A flat corpus of 12,000 unrelated claims has no path to that goal. A tree-structured corpus where 12,000 lemmas are connected to the proof trees of 34 target problems has a direct path: follow the proof tree for the most advanced problem, identify the remaining gaps, fill them using genetic evolution, and the result is a complete proof supported by a verified sub-lemma corpus.

The 959 discoveries in the deep pipeline are the ones that are active nodes in the proof trees of the most advanced target problems. They have passed Tiers 1–3 and are awaiting formal verification in Tier 5. When the Lean 4 formaliser closes the remaining sorry statements in the Yang-Mills proof, the supporting corpus will already be present. The architecture was designed from the beginning for this moment.