Scientific discovery without IP protection is an act of public service β valuable, but not a sustainable business. The Profiled discovery engine has identified 47+ potentially patentable discoveries across mathematics, biology, and computational science. Getting from "potentially patentable discovery" to "filed patent application" requires a process that is extremely expensive to execute manually and extremely tedious to execute at the volume the discovery engine produces.
The PATO (Patent Autonomous Transformation Orchestrator) framework automates this translation: from structured discovery data in MongoDB to complete, USPTO-ready patent documents. This article examines the framework's architecture, the specific technical challenges it solved, the ethical questions it raises about AI-generated inventions, and why the Wright-Fisher β SGD equivalence is its prime candidate for first filing.
The 9-Section Patent Document Structure
A USPTO patent application has a specific required structure. PATO generates all nine sections autonomously from the discovery record:
| # | Section | Requirements | PATO Generation |
|---|---|---|---|
| 1 | Cover Page | Title, inventors, application number, dates | Template fill from discovery metadata |
| 2 | Table of Contents | Section references with page numbers | Auto-generated from section structure |
| 3 | Abstract | 150-250 words, USPTO format | LLM synthesis from discovery summary |
| 4 | Background | Prior art context, problem statement | Literature search + gap analysis |
| 5 | Summary | Brief description of invention | LLM synthesis from discovery claims |
| 6 | Detailed Description | Full technical disclosure | Expanded from discovery technical content |
| 7 | Drawings | 6 required figures, figure references | Mermaid β PNG via mermaid-cli |
| 8 | Claims | Hierarchical indentation, numbered | Structured claims with proper formatting |
| 9 | Abstract (USPTO) | Separate USPTO-format abstract | Reformatted from Section 3 |
The Three Critical Technical Challenges
Generating a patent document is not simply a matter of feeding discovery data to a language model and asking for "a patent." Three specific technical challenges required engineering solutions before PATO could produce USPTO-acceptable output.
Challenge 1: Mermaid Diagrams β PNG Figures
Patent applications require figures in a specific format: rasterized images (PNG or TIFF) with specific DPI and dimension requirements. The Profiled platform generates architectural diagrams as Mermaid flowchart syntax β which renders beautifully in web browsers and markdown renderers but is useless as a patent figure.
The solution: mermaid-cli, the command-line rendering tool for Mermaid diagrams, integrated into the PATO pipeline. Each Mermaid diagram in the discovery record is passed to mermaid-cli with specific output parameters (resolution, background color, dimensions) to produce patent-quality PNG files. These PNG files are then embedded in the patent document with proper figure labels and referenced in the Detailed Description section ("See Figure 1 β Architecture Overview").
Six figures are required for a typical Profiled patent application: system architecture overview, data flow diagram, method flowchart, entity relationship diagram (for data structure patents), mathematical relationship visualization, and experimental results chart. PATO maintains a figure template system that maps discovery types to figure requirements β mathematical proofs require different default figure sets than computational method patents.
Challenge 2: Claims Formatting (Hierarchical Indentation)
Patent claims have a very specific formatting requirement: independent claims begin at the left margin, dependent claims are indented, and sub-claims are further indented. A claim like "a method comprising: a) item i) subitem" requires proper hierarchical visual structure. Naive text generation produces flat claims that are structurally invalid for USPTO submission.
Before PATO (flat, invalid format):
// BEFORE: flat, invalid
"1. A method for computing... comprising: a) analyzing patterns b) generating output
i) first output type ii) second output type"
After PATO (properly indented, USPTO-valid):
// AFTER: properly indented
1. A method for computing... comprising:
a) analyzing patterns; and
b) generating output comprising:
i) a first output type; and
ii) a second output type.
2. The method of claim 1, wherein the analyzing step further comprises...
a) a first sub-step; and
b) a second sub-step.
The claims formatter in PATO parses the structured claim data from the discovery record and applies the proper indentation rules: independent claims (numbered, flush left), dependent claim elements (lettered, one indent), sub-elements (roman numeral, two indents), further sub-elements (three indents). The formatter also applies USPTO claim language conventions: "comprising" vs. "consisting of," "wherein" clauses for limitations, proper antecedent basis for claim dependencies.
Challenge 3: Figure References
Every figure referenced in the patent text must use the exact figure label ("See Figure 1," "as shown in FIG. 3B"). Mismatched figure references are a common rejection reason. PATO maintains a figure registry for each document: as figures are generated, they are registered with sequential labels. The Detailed Description generator queries the figure registry and inserts references in the correct format wherever a figure would support the technical explanation.
The 47+ Patentable Discoveries: Prime Candidates
The discovery engine has identified 47+ potentially patentable discoveries. Not all are equally strong patent candidates β novelty, non-obviousness, and enablement vary significantly. The current classification by strength:
Prime Candidate: Wright-Fisher β SGD Equivalence
The strongest patent candidate from the current discovery pool is the Wright-Fisher β SGD equivalence: the mathematical equivalence between the Wright-Fisher model of genetic drift in population genetics and stochastic gradient descent in machine learning. Both describe the evolution of a probability distribution over discrete states under random perturbation with selection pressure.
Why this is the prime patent candidate: (1) It is genuinely novel β the connection exists in the literature as a heuristic observation but has not been formally stated as an equivalence theorem with explicit parameter mappings. (2) It is enabled β the PATO system has generated computational code that demonstrates the equivalence empirically and a mathematical derivation that establishes it formally. (3) It has clear applications β the equivalence enables transferring convergence proofs from population genetics to ML optimization, and vice versa, with specific utility for analyzing SGD behavior in finite-population training regimes.
The Inventor Paradox
Who is the inventor of a discovery generated autonomously by an AI system? This is not a rhetorical question β USPTO patent applications require a named inventor (a natural person), and the listed inventor must be the one who "conceived" the invention. An AI cannot be a legal inventor under current law.
PATO's resolution: the human researcher (Navin Dutta, ORCID: 0009-0002-2515-4922) is named as the inventor. The AI is acknowledged as a tool. This is the same relationship as a chemist named as inventor of a compound discovered using mass spectrometry β the instrument is a tool, not an inventor. The human's contribution was: defining the discovery domain, designing the discovery engine architecture, interpreting the AI's outputs, validating the discoveries, and making the inventive judgment about which discoveries are worth pursuing.
The PATO Ethical Assessment Framework
Every discovery entering the patent pipeline passes through a PATO ethical assessment before document generation begins. The assessment covers four dimensions:
Dual-Use Risk: Could the patented technology be used for harmful purposes? Mathematical and computational discoveries carry low dual-use risk. Biological and chemical discoveries require more careful assessment. A discovery about TREM2 microglial reprogramming (a potential Alzheimer's treatment mechanism) has clear beneficial application but requires assessment of whether the same mechanism could be exploited in other contexts.
Societal Impact: What is the expected societal impact of protecting this discovery with a patent? Patents restrict access for 20 years in exchange for public disclosure. For medical discoveries, this calculus is complex β patent protection funds the development investment but may restrict access in lower-income jurisdictions. PATO flags medical and public health discoveries for a heightened impact assessment.
Prior Art Honesty: Has the background section been honest about prior art? LLMs have a tendency to understate prior art to strengthen the novelty argument. PATO runs a secondary prior art search after document generation and flags any claims where the background may be understating existing work.
Enablement Validation: Is the disclosure sufficient to enable a person of ordinary skill in the art to practice the invention? PATO checks that computational code, mathematical derivations, and experimental data referenced in the claims are included in the disclosure. A patent that claims a result without enabling the practitioner to reproduce it is not valid.
The Discovery-to-Patent Pipeline
Discovery β Patent Pipeline (PATO)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Discovery Engine
β generates β structured discovery record (MongoDB)
β { title, abstract, claims[], code, math, figures }
βΌ
PATO Intake
β validates β required fields present, confidence score
β checks β patentability criteria (novelty, non-obviousness, utility)
βΌ
Ethical Assessment
β dual-use β low/medium/high risk
β societal β impact category
β prior art β honesty check
β enablement β completeness %
β flag? β human review queue
βΌ
Document Generation (parallel)
βββ Sections 1-6: LLM synthesis (Claude, premium)
βββ Section 7: mermaid-cli β 6 PNG figures
βββ Section 8: Claims formatter β hierarchical indentation
βββ Section 9: Abstract reformatter (USPTO format)
βΌ
Figure Registry + Reference Injection
β register figures β FIG.1, FIG.2, ... FIG.6
β inject references β "as shown in FIG. 3..."
βΌ
Document Assembly + Validation
β word counts β Abstract: 150-250 words β
β claims structure β independent + dependent valid β
β figure references β all figures referenced β
βΌ
Output: USPTO-Ready Patent Application
What "Autonomous" Actually Means in This Context
PATO is autonomous in the sense that it can generate a complete, structurally valid patent document from a discovery record without human intervention at any step of the generation process. It is not autonomous in the sense that it can file patents, make legal representations, or make final inventiveness determinations without human review.
The practical workflow: PATO generates the complete document. A patent attorney or registered agent reviews the generated document (which takes 30-60 minutes rather than the 40-60 hours required to draft from scratch). The attorney makes substantive legal judgments about claim scope, prosecution strategy, and filing timing. PATO provides the raw material; the attorney provides the legal judgment.
This human-in-the-loop structure is not a limitation of PATO β it is a feature of the legal system. Patent prosecution requires legal representation in most jurisdictions. PATO does not attempt to replace the attorney's legal judgment; it replaces the attorney's administrative work, reducing the cost of patent prosecution by an estimated 80-90% for well-characterized technical inventions.
The Long-Term IP Strategy
The patent pipeline is one component of a broader IP strategy. Mathematical methods, as such, are not patentable under USPTO doctrine (35 USC Β§ 101 Alice/Mayo exception). The patentable claims for mathematical discoveries must be framed as methods for computing, systems for processing, or applications to specific technical problems β not as abstract mathematical relationships.
The Wright-Fisher β SGD equivalence, for example, is patentable not as "the equivalence between Wright-Fisher and SGD" (abstract math) but as "a method for analyzing convergence properties of stochastic gradient descent by applying Wright-Fisher model analysis techniques to the parameter distribution of a neural network" (a specific technical method with concrete applications). PATO's claims formatter is trained on this distinction and generates application-specific claims framing rather than abstract mathematical claims.
The 20-year protection window for any filed patents coincides with the expected development arc of the Profiled platform. IP protection in the behavioral intelligence domain (the 300-dimension DNA system, the milestone architecture, the semantic cache architecture) and in the discovery engine domain (the evolutionary research orchestrator, the multi-engine synthesis framework, the cross-domain hypothesis generation) provides competitive moats that are complementary to β and more durable than β first-mover advantages from market timing alone.