The Universal AI Router — Profiled Technical Blog

The Universal AI Router is the infrastructure layer that makes the platform's unit economics possible. It sits between every application feature and the AI models those features depend on, making real-time decisions about which model to use for each request based on task type, revenue generated, margin targets, model availability, and quality requirements. It is not a simple if-else routing table — it is an economic intelligence layer that optimizes the quality-cost tradeoff continuously across tens of thousands of requests.

The router has processed 12,000+ discovery queries plus the full volume of behavioral intelligence queries since deployment. It runs autonomously, logging margin health for every request, detecting margin degradation in real time, and providing routing recommendations without human intervention. The $0.02 average query cost and 70-90% gross margins documented in Article 25 are produced by this system operating continuously.

Models

Claude / Gemini / GPT-4o / mini

12k+

Discovery Queries

processed autonomously

3-tier

Fallback Chain

zero user interruption

Real-time

Margin Logging

every request

Basic Usage: Task-Based Routing

The router's primary interface is task-based: the caller specifies what kind of task they are performing, and the router selects the appropriate model automatically. This encapsulates all economic and quality decisions from the application code, which only needs to know the task type and the content to process.

services/unifiedAI.js — basic task-based routing JavaScript

const { generateText } = require('./services/unifiedAI');

// Automatically uses task profile (life-composition → Claude)
const result = await generateText(
  'life-composition',
  'Analyze this user profile...',
  { userId: '123' }
);

console.log(result.text);    // AI response
console.log(result.model);   // 'claude'
console.log(result.cost);    // { totalCost: 0.024 }

The task profile life-composition instructs the router to use Claude regardless of cost tier, because this task type has a quality requirement that cheaper models cannot reliably meet. The router returns not just the generated text but also the model used and the actual cost — providing full transparency into every routing decision for downstream logging and analysis.

The cost object returned is not approximate. It is computed from the actual token count of the request and response, multiplied by the exact per-token pricing of the model used, plus any overhead (embedding lookups, cache checks). This enables the margin health calculation to be exact rather than estimated.

Revenue-Based Routing

The more sophisticated routing mode incorporates real-time revenue data. When the caller specifies a featureName that maps to a gem cost, the router applies the economic routing table from Article 25 to select the model that maximizes margin at that revenue level.

services/unifiedAI.js — revenue-based routing with margin health JavaScript

const result = await generateText(
  'quest-generation',
  'Generate a quest...',
  {
    userId: '123',
    featureName: 'quest-analysis',  // 25 gems = efficient tier
    metadata: { questType: 'story' }
  }
);

console.log(result.marginHealth);
// {
//   healthy: true,
//   margin: 87.3,
//   targetMargin: 85,
//   profit: 2.18,
//   revenue: 2.50,
//   cost: 0.32,
//   modelUsed: 'gemini',
//   recommendation: 'Margin healthy - maintain current model selection'
// }

The featureName: 'quest-analysis' maps to 25 gems in the gem-to-revenue configuration table. 25 gems = $2.50 revenue = efficient tier = Gemini as primary model with 85% target margin. The router selects Gemini, generates the response, computes the actual cost ($0.32 in this example), calculates the achieved margin (87.3%), compares against target (85%), and returns the full margin health object alongside the generated content.

The recommendation field is actionable: "Margin healthy — maintain current model selection" means no routing adjustment is needed. If the recommendation were "Margin below target — switch to GPT-4o-mini for this task type", the alert system would log this for the operations team and potentially trigger an automatic routing configuration update if the degradation persisted across multiple requests.

The Four Models and Their Positioning

The router maintains a model registry with quality and cost profiles for each available model. Understanding the positioning of each model explains the routing decisions:

Model	Positioning	Best For	Target Margin
Claude Sonnet 4	Complex reasoning, premium quality	Life composition, discovery analysis, high-sensitivity	70%
Gemini 2.0 Flash	Cost-effective, high-volume	Quest generation, story content, behavioral analysis	80-90%
GPT-4o	Balanced performance, reliable fallback	Mid-complexity tasks, reliable fallback when Gemini fails	75%
GPT-4o-mini	Budget tasks, maximum margin	High-volume simple generation, budget tier queries	90%

The 70% target margin for Claude is not an indictment of Claude's pricing — it reflects the revenue-quality tradeoff. Claude is used for the highest-revenue queries (100+ gems, $10+ revenue) where 70% margin on $10 = $7 profit per query. GPT-4o-mini achieves 90% margin but on lower-revenue queries: 90% margin on $2.50 = $2.25 profit per query. Claude generates higher absolute profit per query; GPT-4o-mini generates higher percentage margin.

Task Profiles: Quality-Required Overrides

Some task types always use specific models regardless of the revenue tier that would be implied by their gem cost. These task profile overrides exist because quality degradation on certain tasks is unacceptable regardless of economic optimization:

Task Profile Overrides

life-composition → Claude always. High sensitivity, trust-critical. Quality difference between Claude and Gemini Flash is perceptible and affects user trust.

quest-generation → Economic routing. High volume, quality is acceptable from Gemini at this task type.

discovery-analysis → Claude always. High mathematical complexity. Cheaper models produce structurally different (and lower quality) mathematical reasoning.

story-generation → Gemini primary. High volume, cache-friendly, quality acceptable.

interview-assessment → Claude always. High sensitivity, assessment validity at risk with cheaper models.

The task profile override system addresses a known failure mode of purely revenue-based routing: some tasks have quality thresholds below which the output is harmful to the user relationship regardless of the margin it achieves. A poorly performed life composition analysis that makes the user feel misunderstood generates a churn risk that costs far more than the margin saved by routing it to GPT-4o-mini. The task profile system encodes these quality floors as hard constraints that the economic optimizer cannot override.

The Routing Decision Tree

The router executes a specific decision sequence for every incoming request. Understanding the decision tree reveals the architecture's priorities:

Incoming Request
       ↓
1. CHECK TASK PROFILE OVERRIDE
   → Does this task type always use a specific model?
   → If yes: assign model, skip to step 4
   ↓
2. CHECK GEM COST → REVENUE TIER
   → featureName → gem cost lookup → revenue tier
   → revenue tier → economic routing table → candidate model
   ↓
3. APPLY ECONOMIC ROUTING
   → 100+ gems → Claude (70% target margin)
   → 50-99 gems → Gemini (80% target margin)
   → 25-49 gems → Gemini → GPT-4o-mini (85% target)
   → <25 gems   → Gemini → GPT-4o-mini (90% target)
   ↓
4. ATTEMPT PRIMARY MODEL
   → Success: generate response
   → Failure (timeout/rate limit/error): try fallback
   ↓
5. ATTEMPT FALLBACK MODEL
   → Success: generate response (log fallback event)
   → Failure: try emergency model
   ↓
6. ATTEMPT EMERGENCY MODEL
   → Success: generate response (log double-fallback event)
   → Failure: return error (should never reach this)
   ↓
7. COMPUTE MARGIN HEALTH
   → actual cost from token counts × model pricing
   → margin = (revenue - cost) / revenue
   → compare to target margin for tier
   → generate recommendation
   ↓
8. RETURN RESULT + MARGIN HEALTH

The decision tree's ordering is important: task profile overrides take precedence over economic routing because quality floors are harder constraints than margin targets. Economic routing applies when no override exists. Fallback chains activate when primary selection fails. Margin health is always computed and always logged, regardless of which model was ultimately used.

Fallback Mechanics: Preserving Quality Under Failure

The fallback chain is designed to maintain quality as much as possible when the primary model fails. This means fallback selection is not simply "try the next cheapest model" — it is "try the model that best preserves quality given that the primary has failed."

For life-composition (Claude primary): the fallback is Gemini rather than GPT-4o because, on analytical reasoning tasks, Gemini is closer in quality to Claude than GPT-4o is. The user experience degrades less when Claude fails and Gemini serves the request than when Claude fails and GPT-4o-mini serves it.

For quest-generation (Gemini primary): the fallback is GPT-4o-mini because this task type's quality difference between Gemini and GPT-4o-mini is small — both models produce acceptable quest narrative content. The economic optimization appropriately drives the fallback selection toward the cheaper option when quality difference is minimal.

Double-Fallback Events

When both primary and fallback fail and the emergency model serves the request, this is logged as a double-fallback event and treated as a high-priority incident. Double-fallback events are rare in practice (the probability of two independent model failures in the same request window is low) but their occurrence indicates systemic infrastructure issues that require immediate investigation — either a widespread API outage affecting multiple providers or a misconfiguration in the fallback chain itself.

Real-Time Margin Monitoring

The margin health system provides continuous visibility into the routing system's economic performance. Every request generates a margin health record. The aggregated stream of these records allows the operations system to detect three types of problematic patterns:

Margin degradation: When actual margins drop below target margins for 3+ consecutive requests on a given task type, an alert is generated. Common causes: model pricing changes (API costs increased), token inflation (prompts have grown longer), or task complexity increase (user queries for this task type have become more sophisticated, requiring more tokens to handle). The alert initiates an investigation into which cause is responsible.

Margin improvement: When actual margins significantly exceed target margins for a task type, this is a signal that the routing configuration may be too conservative — using a more expensive model than necessary for the achieved quality. The recommendation engine flags this for potential routing adjustment toward cheaper models that may achieve equivalent quality.

Fallback frequency: When fallback activation rate for a specific primary model exceeds a threshold, this is a reliability signal. Frequent fallbacks indicate that the primary model is experiencing elevated error rates — potentially a service quality issue that merits reporting to the model provider or adjustment of the primary selection.

services/unifiedAI.js — margin alert detection JavaScript

// Auto-alerts when margin degrades
if (consecutiveBelowTarget >= 3) {
  alertSystem.trigger({
    type: 'MARGIN_DEGRADATION',
    taskType: currentTask,
    targetMargin: routingConfig[currentTask].targetMargin,
    actualMargin: averageRecentMargin,
    consecutiveCount: consecutiveBelowTarget,
    action: 'INVESTIGATE_COST_SOURCES'
  });
}

// Auto-recommendation when margin over-achieves
if (marginExceedsTarget && marginExceedsTarget > OVERACHIEVE_THRESHOLD) {
  recommendationEngine.suggest({
    type: 'ROUTING_OPTIMIZATION',
    suggestion: `Consider downtiering ${currentTask} to ${cheaperModel}`,
    projectedSavings: estimateAnnualSavings(currentTask, cheaperModel)
  });
}

Why Not Always Use the Cheapest Model

The routing system's sophistication is not in making routing cheap — it is in knowing when cheap routing is acceptable and when it is not. The naive approach (always use GPT-4o-mini, achieve 90% margins on everything) fails for specific, measurable reasons.

Quality degradation on high-sensitivity tasks is the primary failure mode. A life composition analysis performed by GPT-4o-mini produces a response that is noticeably less nuanced, less emotionally attuned, and less precisely calibrated to the behavioral profile context than a Claude response. Users can detect this difference. In a platform whose core value proposition is "the system knows you better than anyone," a perceptibly degraded life composition analysis is not just a quality issue — it is a trust issue that damages the 100-interaction relationship arc.

"The economic routing targets the optimal quality-cost tradeoff per task type, not the minimum cost. Some quality is worth paying for. The routing system knows which quality and when."

Mathematical reasoning quality also degrades significantly on cheaper models. Discovery analysis queries — asking a model to reason about whether a proposed mathematical connection holds, what its implications are, and how it relates to known results — produce qualitatively different outputs from Claude vs. GPT-4o-mini. The difference is not just in surface fluency; it is in logical structure, appropriate hedging about uncertainty, and the identification of potential counterexamples. For the discovery engine's analytical work, this quality difference is not recoverable through prompt engineering or post-processing.

Performance at Scale: 12,000+ Discovery Queries

The 12,000+ discovery queries figure provides a baseline for understanding the router's production performance. Discovery queries are among the most expensive query types in the system: they involve complex reasoning chains, long context windows (literature retrieved from arXiv and PubMed), and Claude as the primary model. Yet the average cost across the full mixed query volume (discovery + behavioral) remains $0.02 because the behavioral query volume is much larger than the discovery query volume, and the behavioral queries have much higher cache hit rates and lower model costs.

The router's architecture ensures that the expensive discovery queries do not contaminate the economics of the cheaper behavioral queries. Each query type's margin health is tracked independently. The discovery engine's economics are evaluated on their own terms (justified by the long-term discovery value they produce), separate from the behavioral intelligence economics (evaluated on immediate revenue and margin metrics).

Claude

Discovery Queries

always — quality critical

Gemini

Behavioral Queries

primary — volume optimized

$0.02

Blended Average

across both query types

Evolution of the Routing Configuration

The routing configuration is not static. Model pricing changes. New models become available. Task complexity distributions shift as the user base grows. The router is designed to accommodate these changes through a configuration layer that can be updated without code changes — routing rules, margin targets, task profile overrides, and fallback chains are all configurable.

The margin health system's continuous monitoring provides the data needed to inform configuration updates. When a new model becomes available, the router can be configured to A/B test it against the current primary for specific task types, comparing margin health and user engagement metrics before promoting it to primary status. This experimental routing capability means model updates do not require engineering deploys — they require configuration changes and a monitoring period.

The long-term trajectory for the routing system mirrors the broader platform economics: as semantic cache density increases and behavioral profile depth grows, the fraction of queries that require model inference decreases. The router's role becomes progressively less about selecting between models and more about confirming that cached responses remain valid — a dramatically cheaper operation. The routing system is built to adapt to this evolution without architectural changes.

The Router's Core Promise

Every request gets the best model that the economics justify at that revenue level, with automatic fallback if that model fails, with full margin transparency for every decision, running autonomously without human intervention. This is not a feature — it is the economic infrastructure that makes the platform's unit economics defensible at scale.

The Universal AI Router processes tens of thousands of requests per day without a single manual routing decision. It has maintained the $0.02 average cost through six months of production use across two product lines with different economic profiles, different quality requirements, and different query volume distributions. The router is not the most visible part of the Profiled platform — but it is the part that makes everything else financially viable.