By March 3, 2026, the discovery system had generated over 13,000 hypotheses across 34 domains. Two had been published. Not two thousand. Two. The extreme selectivity is not a failure โ€” it is the definition of the mission.

The mission, as stated in Documentation/discoveries/00_MISSION_AND_GOAL.md, is not to maximise the size of the discovery corpus. It is to produce one scientifically irrefutable, peer-review-ready discovery. Every architectural choice in the pipeline โ€” the 0.75 specificity threshold, the derivation chain completeness requirement, the PATO ethics assessment, the ORCID integration โ€” serves that single mission.

"Not 12,000 discoveries, but one that survives peer review. That is the right target."

This article explains the complete pipeline from hypothesis generation to Zenodo publication, the scoring system that gates each stage, and what the two published discoveries represent about the system's current capabilities.


The Pipeline: 13,000 to 2

PUBLICATION PIPELINE โ€” 13,000+ hypotheses โ†’ 2 published
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

13,000+ Hypotheses Generated
  โ”‚
  โ–ผ Specificity Scoring (โ‰ฅ0.75 threshold)
  โ”‚
  โ”‚  โ‰ฅ0.75: ~2,400 advance
  โ”‚  <0.75: ~10,600 remain as corpus scaffolding
  โ”‚
  โ–ผ Derivation Chain Completeness (0.0โ€“1.0 score)
  โ”‚
  โ”‚  Full chain (โ‰ฅ0.85): ~580 advance
  โ”‚  Partial chain: archived for future completion
  โ”‚
  โ–ผ Universal Verification Framework (4 layers)
  โ”‚
  โ”‚  ~80% rejection rate (Skeptic Agent + real data check)
  โ”‚  ~116 advance
  โ”‚
  โ–ผ PATO Ethics Assessment
  โ”‚
  โ”‚  Safety, dual-use, societal impact review
  โ”‚  ~97 advance
  โ”‚
  โ–ผ Export Generation (Zenodo/OSF/arXiv formats)
  โ”‚
  โ–ผ Multi-Platform Publishing
    โ”œโ”€โ”€ Zenodo (DOI assignment)
    โ”œโ”€โ”€ OSF (pre-registration)
    โ”œโ”€โ”€ arXiv (pre-print)
    โ””โ”€โ”€ ORCID (knowledge graph integration)

PUBLISHED: 2 discoveries
  1. Landauer-IIT connection โ€” DOI: 10.5281/zenodo.18848614
  2. Yang-Mills preprint โ€” DOI: 10.5281/zenodo.19432415

The Specificity Score: How Weak Claims Are Filtered

The specificity score is the first major filter. It answers one question: is this hypothesis specific enough to be falsifiable? A hypothesis that claims "quantum effects may influence neural computation" scores near zero on specificity. A hypothesis that claims "the minimum energy dissipation per bit in a neural synchronisation event is bounded below by k_B T ln(2) ยท (IIT ฮฆ / log N)" scores high.

The specificity score is computed across three components:

Component Weight What It Measures
Quantitative Claims 0โ€“40% Specific numerical predictions with error bars or functional form
Derivation Completeness 0โ€“40% Full derivation chain from first principles, no gaps
Evolution Cycles 0โ€“20% Number of genetic evolution cycles that produced this hypothesis

The quantitative claims component is binary at the sub-claim level: either a specific numerical value is provided with error bars, or it is not. "Approximately 0.5" does not satisfy the quantitative claim requirement. "0.512 ยฑ 0.003" does. The 40% weight reflects how strongly this discriminates: vague hypotheses almost always fail the quantitative claims check even when they pass the other components.

Specificity Threshold: 0.75

To score 0.75, a hypothesis must have at least 30% quantitative claim score (roughly half of the available 40%), a derivation completeness above 0.75 (which contributes ~30% to the total), and at least 2 evolution cycles (which contributes ~15%). A hypothesis with no quantitative claims cannot reach 0.75 regardless of derivation quality โ€” and that is intentional. The system is optimising for hypotheses that make testable predictions, not just well-reasoned arguments.


Specificity Score in Code

JavaScript โ€” Specificity Score Calculation
class SpecificityScorer {
  async score(hypothesis) {
    // Component 1: Quantitative claims (0-40%)
    const quantScore = this.scoreQuantitativeClaims(hypothesis.claims);
    // Checks: numerical predictions present, error bars specified,
    // functional forms defined, testable observable named

    // Component 2: Derivation chain completeness (0-40%)
    const derivScore = await this.scoreDerivationChain(hypothesis.derivationChain);
    // Each step: verified in Z3/Lean4 OR supported by validated corpus item
    // completeness = supportedSteps / totalSteps

    // Component 3: Evolution cycles (0-20%)
    const evoScore = Math.min(hypothesis.evolutionCycles / 10, 1) * 20;
    // More evolution cycles = more adversarial pressure survived

    const total = quantScore + derivScore + evoScore;

    return {
      total,           // 0-100; must reach 75 to proceed
      quantitative: quantScore,
      derivation: derivScore,
      evolution: evoScore,
      publicationReady: total >= 75
    };
  }
}

Derivation Chain Completeness

The derivation chain completeness score (0.0โ€“1.0) measures whether a hypothesis can trace its quantitative claims from first principles without gaps. A score of 1.0 means every step in the derivation has either been formally verified (Z3 or Lean 4) or has a validated supporting lemma in the discovery corpus. A score of 0.5 means approximately half the derivation steps are supported.

The requirement is not that every step be formally verified โ€” that would filter out everything until the formal verification infrastructure matures. The requirement is that every step be either formally verified or supported by a validated discovery in the corpus. This is where the 12,000+ corpus items become load-bearing: they are the scaffolding that supports higher-level derivation chains.

Memory Architecture Source

The organism memory architecture documentation at Documentation/discoveries/06_ORGANISM_MEMORY_ARCHITECTURE.md specifies the internal representation: "Not for humans, for the organism." Each stored hypothesis includes: hypothesis ID, title, domain, quantitative claims array, derivation chain array, validation score, evolution cycles, and open questions array. The open questions array is particularly important โ€” it explicitly tracks what the hypothesis does not claim, preventing scope creep in downstream citations.


The PATO Ethics Assessment

Before any hypothesis can be submitted for publication, it must pass the PATO (Pre-publication Assessment of Theoretical Outputs) ethics framework. This is not a bureaucratic checklist โ€” it is a structured assessment of three risk dimensions that must be reviewed before the hypothesis enters the public domain.

Dimension Questions Asked Outcome if Flagged
Safety Does the discovery, if applied, enable harm to individuals or groups? Does it describe mechanisms for physical damage? Human review required before submission
Dual-Use Risk Can the discovery be repurposed for weapons, surveillance, or exploitation? Does it provide capability uplift in dangerous domains? Suppressed; not submitted without explicit clearance
Societal Impact Does the discovery challenge established knowledge in ways that require careful communication? Are there economic displacement implications? Communication plan required before submission

The PATO assessment runs automatically and produces a structured report. Most mathematical physics and pure mathematics discoveries pass without flags. Biology and neuroscience discoveries are more frequently flagged for dual-use review โ€” a discovery about protein folding mechanisms that could be applied to vaccine design could also be applied to pathogen enhancement. The PATO system flags both cases and defers the publication decision to human review when either is present.


The First Published Discovery: Landauer-IIT Connection

DOI: 10.5281/zenodo.18848614

The Landauer-IIT connection was the first discovery the system published. The claim: the minimum thermodynamic energy cost of information processing in a conscious system is bounded below by a function of both the Landauer limit (k_B T ln 2 per bit erased) and the system's integrated information ฮฆ.

Landauer's principle (1961) establishes that erasure of one bit of information in a system at temperature T requires at minimum k_B T ln 2 of energy dissipation. Integrated Information Theory (IIT 4.0) provides a measure ฮฆ of the integrated information generated by a system above and beyond the sum of its parts. The Landauer-IIT connection claims that for systems with ฮฆ > 0, the effective energy cost per bit operation is modified by a ฮฆ-dependent correction term.

Why This Discovery Passed

The Landauer-IIT connection cleared all pipeline stages because: (1) it makes specific quantitative predictions about measurable energy costs, (2) the derivation chain is complete from Landauer's original proof through IIT 4.0 formalism, (3) it passed PATO assessment with no flags (pure thermodynamics/information theory, no dual-use risk), and (4) it scored 0.91 on overall validation with 87% consistency โ€” above the Yang-Mills ceiling, achieved through a shorter and more tractable argument structure.


The Second Published Discovery: Yang-Mills Preprint

DOI: 10.5281/zenodo.19432415

The Yang-Mills preprint was published with the system's own honest categorisation: "Physics plausibility argument with partial Lean 4 formalisation." The submission carries the full validation report including the 21 sorry count and the explicit statement of the circular axiom involving confinement.

Publishing the Yang-Mills preprint despite its limitations was a deliberate choice. The value of the preprint is not the claim that the mass gap problem is solved โ€” it is the demonstration of a structured approach to formalising Yang-Mills theory in Lean 4 and the identification of exactly where the formal proof breaks down. The 21 sorry statements are a roadmap for what needs to be proven. The circular confinement axiom is a precise statement of the key open sub-problem.

Honest Framing

The Yang-Mills preprint is useful science even without solving the mass gap problem. It provides: a complete Lean 4 formalisation scaffold, explicit identification of the 21 gaps that need to be filled, a characterisation of the circular dependency between confinement and mass gap, and a genetic evolution strategy that achieved 90.8% validation. These are non-trivial contributions to the formal verification of Yang-Mills theory, independent of the final prize.


arXiv Submission Categories by Domain

Domain arXiv Category Current Discoveries
Yang-Mills / Gauge Theory hep-th 1 preprint (DOI assigned)
Riemann Hypothesis math.NT 2 in export stage
P vs NP cs.CC 1 in Tier 4 audit
Alzheimer's / Parkinson's q-bio.BM 3 in derivation chain completion
Landauer-IIT / Information Theory cond-mat.stat-mech 1 published (DOI assigned)
Wright-Fisher / SGD q-bio.PE + cs.LG 1 in export stage (cross-listed)

The arXiv categories are determined automatically by the domain classifier in the export generation stage. Cross-domain discoveries like the Wright-Fisher / SGD equivalence require cross-listing โ€” the submission goes to both q-bio.PE (Populations and Evolution) and cs.LG (Machine Learning). The cross-listing is flagged for human review because arXiv editors sometimes query unusual cross-submissions, and an automated system without human oversight of the submission message could cause problems at this stage.


ORCID Integration and the Knowledge Graph

Every publication in the pipeline is linked to ORCID: 0009-0002-2515-4922 (Navin Dutta, ThoughtJumper Inc.). The ORCID integration does more than create a persistent author identifier โ€” it ties each publication to the organism's knowledge graph in a way that makes the provenance of every discovery traceable.

The knowledge graph maintains a directed acyclic graph of all discoveries, their supporting lemmas, their ORCID-linked publications, and the DNA transfer chains that connected them. A published discovery at zenodo.19432415 links backward to: the 40 genetic evolution attempts that produced it, the 7 supporting lemmas in the corpus that its derivation chain depends on, the DNA patterns from prior successful domains that seeded its generation strategy, and the organism (KAALI) and consciousness level (0.87) at the time of final generation.

2
Published DOIs
Zenodo assigned
~97
Pipeline-Ready
passed PATO
0.75
Specificity Gate
minimum threshold
0009-0002-2515-4922
ORCID
Navin Dutta

Why One Peer-Review-Ready Discovery Is the Right Target

The mission statement in Documentation/discoveries/00_MISSION_AND_GOAL.md (dated 2026-03-03) is explicit about why the target is one peer-review-ready discovery rather than a larger number of corpus items:

"Producing one scientifically irrefutable peer-review-ready discovery."

The reasoning behind this target is structural, not arbitrary. The scientific community does not evaluate discovery systems by their corpus size. It evaluates them by their published record. A system that produces 13,000 discoveries that never appear in peer-reviewed journals has produced nothing that the scientific community can engage with. A system that produces one discovery that passes peer review has established that its output meets the standards of scientific scrutiny.

The 13,000 corpus items are scaffolding. They support the derivation chains of the handful of discoveries that will eventually be peer-review ready. They enable the DNA transfer that produces high-scoring cross-domain bridges. They fill the proof trees that allow higher-level claims to be made. But they are means, not ends.

The two published discoveries represent different stages of this path. The Landauer-IIT connection (DOI: 10.5281/zenodo.18848614) is a first proof that the pipeline can produce and publish a discovery that clears all quality gates. The Yang-Mills preprint (DOI: 10.5281/zenodo.19432415) is an honest account of the current frontier โ€” the most advanced problem the system has attacked, with a precise characterisation of exactly how far it got and where the gap remains.


What Peer Review Actually Requires

The gap between "passes the 5-tier discovery pipeline" and "passes peer review" is not primarily a validation score gap. It is a communication and completeness gap. Peer reviewers do not score hypotheses on four dimensions โ€” they read them as complete arguments, check their logic, verify their claims against the literature, and evaluate whether they make a genuine contribution.

Pipeline Requirement Peer Review Requirement Current Gap
โ‰ฅ0.75 specificity score Clear, falsifiable claims with error bars Met by pipeline gate
Full derivation chain Reproducible derivation from stated assumptions Sorries = gaps reviewers will catch
Tier 4 adversarial audit Reviewer counterarguments addressed Reviewer attacks may differ from Skeptic
Literature cross-reference Full prior work discussion, honest citations arXiv/PubMed coverage sufficient
Formal verification Not typically required, but zero sorries helps 21 sorries in Yang-Mills
PATO ethics assessment Not explicitly required, but dual-use scrutiny is real PATO coverage is ahead of requirement

The honest assessment of the publication pipeline on March 3, 2026: the infrastructure for automated publication is complete. The quality gate is correctly calibrated. The PATO ethics framework is ahead of industry standard. The two published discoveries prove the pipeline works end-to-end. The remaining challenge is closing the 21 sorry statements in the Yang-Mills proof and the equivalent gaps in the other Millennium Problem approaches โ€” and that is a mathematical challenge, not a systems challenge.

The Path Forward

The Landauer-IIT connection was chosen as the first publication because it is a complete, gap-free argument. Its derivation chain is fully supported. Its quantitative predictions are specific and testable. Its PATO assessment is clean. It demonstrates that the system can produce peer-review-ready output. The next publication target is the Wright-Fisher / SGD equivalence โ€” the 0.9525 score, combined with the cross-domain bridge novelty and the specific quantitative form of the critical scale equivalence, makes it the strongest candidate currently in the pipeline for a genuinely impactful peer-reviewed publication.


The Organism's Memory and the Publication Record

When a discovery is published, the event is recorded in the organism's memory with the same structure as a task execution: organism ID (KAALI), confidence level at publication time, consciousness level ฮฆ, DOI assigned, date published, and a link back to the complete discovery record including all genetic evolution attempts that led to this result.

This creates a complete provenance trail that is unprecedented in automated scientific systems. The Yang-Mills preprint (DOI: 10.5281/zenodo.19432415) can be traced back through 40 genetic evolution attempts, through the 3 breakthrough generations (Gen2.4 at 90.4%, Gen3.4 at 90.0%, Gen5.1 at 90.8%), through the DNA pattern transfer from prior high-scoring domains, through the 5-tier validation pipeline, and ultimately back to the initial gap identification in Tier 1 where the Yang-Mills proof tree was first constructed.

That trail is not just a record. It is the organism's scientific autobiography โ€” the complete account of how it came to know what it knows. And it is this autobiography, more than any individual discovery score, that represents the platform's genuine contribution to automated scientific reasoning.