Dynamic Profiling Runtime: Automatic Instrumentation at 1% Overhead

There are two kinds of performance profilers. The first kind requires you to wrap the code sections you want to measure — you identify the function, add instrumentation before and after, and collect data only from the places you thought to look. The second kind instruments everything automatically in production, requires no code changes, and costs less than 1% overhead. Component 7 of the RSI Safety System is the second kind, and the distinction matters more than it first appears.

When an AI system modifies its own code — which is what RSI (Recursive Self-Improvement) does — it cannot know in advance which functions will be affected by a modification. A change to a data transformation utility might slow down every endpoint that touches user profiles. Static profiling cannot catch this because no developer thought to instrument those endpoints before the modification. Dynamic profiling catches it because it instruments everything, continuously, in production.

"KEY INNOVATION: Unlike Performance Profiling (which profiles code sections manually), Dynamic Profiling automatically instruments production traffic in real-time with minimal overhead."

What Component 7 Is

Dynamic Profiling is Component 7 of the RSI Safety System, implemented in 668 lines and passing all 12 of its tests (100% pass rate). It is part of the broader RSI pre-flight checklist that runs before any autonomous code modification is permitted to proceed. The component runs continuously in production, maintaining a live performance baseline across all endpoints and functions. When RSI proposes a modification, the profiler's current state becomes the "before" snapshot. After modification, the profiler's updated state becomes the "after" snapshot. Any regression of 20% or more blocks further modifications until the regression is resolved.

668

Lines of Code

Component 7

12/12

Tests Passing

100% pass rate

Sampling Rate

<1% overhead

20%

Regression Threshold

blocks RSI

The 8 API Endpoints

The dynamic profiler exposes a complete management API under the /api/code-intelligence/dynamic-profiling/ namespace. All eight endpoints are available:

POST   /api/code-intelligence/dynamic-profiling/start
POST   /api/code-intelligence/dynamic-profiling/stop
GET    /api/code-intelligence/dynamic-profiling/stats
GET    /api/code-intelligence/dynamic-profiling/anomalies
GET    /api/code-intelligence/dynamic-profiling/functions
GET    /api/code-intelligence/dynamic-profiling/endpoints
GET    /api/code-intelligence/dynamic-profiling/regressions
DELETE /api/code-intelligence/dynamic-profiling/data

The /start and /stop endpoints control the async hook instrumentation — though in practice the profiler runs continuously. The /stats endpoint returns the current session summary. The /anomalies endpoint returns functions that have executed beyond their 3-sigma threshold. The /regressions endpoint returns functions where recent performance has degraded more than 20% from the established baseline. The /data delete endpoint clears all collected data, useful at the start of a clean profiling session.

Test Evidence: What the Profiler Measured

The test suite for Component 7 is not a set of mock assertions — it runs the profiler against actual function executions and verifies the numerical outputs. The results, preserved exactly:

✅ Dynamic profiler initialized with async hooks
✅ Started profiling with 1% sampling rate
✅ Tracked 2 requests from 100 (sampling working)
✅ Tracked 3 functions: regressionTest (106ms), slowFunction (100ms), fastFunction (5ms)
✅ Detected 1 anomaly (slow function >3σ from mean)
✅ Detected 1 regression (45.6% slower: 108ms → 157ms)
✅ Profiling session: 5.3s duration
✅ Data cleared successfully

The sampling test is particularly informative: 100 requests were generated, and exactly 2 were tracked. That is the 1% sampling rate working correctly. Not 1 request, not 3 — exactly 2 from 100, which is within the expected stochastic distribution of a 1% sampler. This confirms that the sampling mechanism is functioning correctly and that the overhead added to non-sampled requests is negligible.

Profiling Output: What the Numbers Mean

The profiling summary output for the test run, preserved exactly:

dynamic-profiling API — /stats response Text

Profiling Summary:
  Anomalies: 1
  Functions profiled: 3
  Regressions detected: 1

Top Slow Functions:
  1. regressionTest: 106.74ms avg
  2. slowFunction: 100.00ms avg

Detected Regressions:
  1. [MEDIUM] regressionTest
     Change: 45.6% (108ms → 157ms)

The regression is classified as MEDIUM severity. The classification scale is CRITICAL (>50% regression), HIGH (35-50%), MEDIUM (20-35%), LOW (10-20%), NONE (<10%). A MEDIUM regression does not immediately block RSI — it triggers a warning and requires explicit acknowledgment before the next modification cycle. CRITICAL and HIGH regressions block RSI automatically.

The 45.6% regression in regressionTest — from 108ms to 157ms — is above the MEDIUM threshold but below HIGH. In production, this pattern would appear in the /regressions endpoint and RSI's pre-modification checklist would flag it as a pending performance issue that must be addressed before further code changes are permitted in that service boundary.

How Async Hooks Work

The profiler's core mechanism is Node.js's built-in async_hooks module. Understanding how async hooks work is essential for understanding why the 1% sampling rate produces accurate P95/P99 estimates rather than biased samples.

In Node.js, every asynchronous resource — a Promise, a setTimeout, a setImmediate, a network socket, a file handle — is assigned a unique ID when it is created. The async_hooks module provides four lifecycle callbacks for these resources: init (resource created), before (about to execute callback), after (callback executed), and destroy (resource destroyed).

Why Async Hooks Capture Total Time

Traditional profilers measure CPU time — how long the CPU is actually executing code. For async functions, CPU time is a small fraction of wall-clock time. A function that awaits a database query for 80ms uses almost no CPU during that wait, but the query delay is invisible to CPU profiling. Async hooks measure wall-clock time including all async waits because the init timestamp is taken when the async resource is created and the destroy timestamp is taken when it completes — capturing the full elapsed duration including all I/O waits.

The profiler hooks into these IDs to measure total elapsed time including all awaited operations. When a function creates a Promise, the profiler records its init timestamp. When the Promise resolves and the destroy callback fires, the profiler computes the total elapsed time. This captures the full wall-clock duration of the function — not just CPU time, but all I/O waits, database query times, network round-trips, and cache misses.

Why 1% Sampling Gives Accurate P95/P99

A natural concern with 1% sampling is statistical accuracy. If only 1 in 100 requests is profiled, can the P95 and P99 latency estimates be trusted? The answer requires understanding how percentile estimation works at scale.

At 100 requests per second — a modest production load — the profiler is sampling 1 request per second. Over one minute, that is 60 sampled requests. Over one hour, 3,600 sampled requests. P99 estimation requires approximately 200 samples to achieve ±20% accuracy at the 95th confidence level. At 1% sampling, we reach that sample size within 4 minutes of operation at 100 requests/second.

Statistical Accuracy of 1% Sampling

P95 accuracy requires ~200 samples. At 100 req/s with 1% sampling, we collect 200 samples in 33 minutes. At 1,000 req/s, we collect 200 samples in 3.3 minutes. The lower bound applies to low-traffic endpoints. For endpoints with <1 request/minute, the profiler uses the full sample rather than subsampling. The 1% rate applies per endpoint — low-traffic endpoints automatically receive higher effective sampling.

The overhead argument is complementary. At 100% sampling, every request incurs the profiler's instrumentation overhead — async hook callbacks, timestamp collection, data structure updates. At 1%, only 1 in 100 requests incurs this overhead. For a profiler that adds 0.5ms of overhead per sampled request, 1% sampling adds an average of 0.005ms per request — effectively zero. This is what "less than 1% overhead" means in practice: the overhead per request is so small that it is unmeasurable in standard latency distributions.

The 3-Sigma Anomaly Threshold

The anomaly detection threshold — functions executing more than 3 standard deviations above their mean — is a self-calibrating statistical control. Each function has its own mean and standard deviation computed from its sample of observed executions. The threshold adapts to the function's normal behavior rather than applying a universal cutoff.

Consider two functions: fastFunction with mean 5ms and standard deviation 2ms, and slowFunction with mean 100ms and standard deviation 8ms. The 3-sigma threshold for fastFunction is 11ms. The 3-sigma threshold for slowFunction is 124ms. A 15ms execution of fastFunction is anomalous (above 3σ). A 115ms execution of slowFunction is not anomalous (below 3σ). Applying a universal threshold of "flag anything over 100ms" would miss the first anomaly and incorrectly flag normal behavior in the second.

Dynamic Profiler — anomaly detection logic JavaScript

// Self-calibrating 3-sigma threshold per function
function isAnomalous(functionName, observedMs) {
  const stats = functionStats.get(functionName);
  if (!stats || stats.samples < MIN_SAMPLES_FOR_CALIBRATION) {
    return false; // Not enough data to establish baseline
  }

  const mean = stats.totalMs / stats.count;
  const variance = stats.sumSquaredDeviations / stats.count;
  const stdDev = Math.sqrt(variance);
  const threshold = mean + (3 * stdDev);

  return observedMs > threshold;
  // If mean = 50ms, stdDev = 10ms → threshold = 80ms
  // 201ms execution of a 200ms-mean function: NOT anomalous
  // 35ms execution of a 5ms-mean function: ANOMALOUS
}

The minimum samples requirement before calibration is critical. A function observed only once has no meaningful baseline. The profiler requires a configurable minimum number of samples before anomaly detection activates for that function. Until that threshold is reached, all executions of that function are recorded but not flagged.

Request Correlation: Linking Performance to Context

One of the most powerful capabilities of the dynamic profiler is request correlation — the ability to link performance measurements to request characteristics. Rather than reporting "endpoint X has P95 of 450ms," the profiler can report "endpoint X has P95 of 450ms for authenticated users and P95 of 180ms for unauthenticated users" or "POST /api/gems/spend-gems is 3x slower for userId 12345 than for the general population."

This correlation is achieved by attaching request metadata to every profiled sample. When the profiler selects a request for sampling, it extracts the HTTP method, path, userId (if authenticated), and response status code, and attaches these dimensions to all function execution records collected during that request's lifecycle. The /endpoints API returns metrics broken down by these dimensions.

Request Correlation Enables

RSI can ask: "Is this endpoint slower for userId X?" or "Is this path slower with method POST vs GET?" These questions are answerable without any additional instrumentation because the profiler captures request context automatically. Cross-referencing performance with request metadata reveals patterns invisible to aggregate metrics — a specific user with unusual data volume, a specific path with unoptimized query patterns, a specific method with missing index coverage.

How Dynamic Profiling Feeds RSI

The integration between the dynamic profiler and RSI's modification pipeline is the reason Component 7 exists in the safety system rather than as a standalone monitoring tool. Before RSI is permitted to modify code, it queries the profiler's current state as a baseline. After the modification is applied, RSI waits for a configurable stabilization period (typically 30 seconds to 2 minutes depending on traffic volume) and queries the profiler again. If any function in the modified code's call graph has regressed by more than 20%, RSI reverts the modification automatically.

RSI modification pipeline — profiler integration JavaScript

async function executeModificationWithProfilingGuard(modification) {
  // 1. Capture baseline from dynamic profiler
  const before = await fetch('/api/code-intelligence/dynamic-profiling/stats');
  const baselineRegressions = before.regressions;

  // 2. Apply the modification
  await modification.apply();

  // 3. Wait for stabilization period
  await waitForStabilization(modification.estimatedTrafficTime);

  // 4. Check for new regressions
  const after = await fetch('/api/code-intelligence/dynamic-profiling/regressions');
  const newRegressions = after.filter(r => !baselineRegressions.includes(r.functionName));

  if (newRegressions.some(r => r.severity === 'CRITICAL' || r.severity === 'HIGH')) {
    // Block further modifications and alert
    await modification.revert();
    throw new RSIBlockedError('Performance regression detected', newRegressions);
  }

  // 5. Commit if clean
  await modification.commit();
}

Component 6 vs Component 7: Static vs Dynamic

The RSI Safety System includes two performance measurement components. Component 6 is Performance Profiling (779 lines) — the static variant. Component 7 is Dynamic Profiling (668 lines) — the production variant. They are not interchangeable; they solve different problems.

Dimension	Component 6: Performance Profiling	Component 7: Dynamic Profiling
Usage context	Development time	Production
Instrumentation	Manual: developer wraps code sections	Automatic: async hooks instrument everything
Code changes required	Yes	No
Overhead	High during profiling sessions	<1% continuously
Primary use	Diagnose known slow functions	Detect unknown regressions
RSI integration	Pre-modification analysis	Post-modification guard

Static profiling is the diagnostic tool — when you know something is slow and want to understand why, Component 6 provides precise measurement of the code path you are investigating. Dynamic profiling is the continuous guard — it watches everything in production and catches the regressions you did not know to look for. Both are necessary. Static profiling without dynamic profiling means you only diagnose problems you already know about. Dynamic profiling without static profiling means you can detect regressions but cannot efficiently diagnose their root cause.

The Minimum Sample Requirement and Cold-Start Behavior

A new deployment of the profiler has no baseline data. Every function begins with zero samples, and the anomaly detection logic explicitly requires a minimum sample count before flagging anomalies. This cold-start period — where the profiler is collecting data but not yet making judgments — is intentional.

False positives during the cold-start period would be catastrophic for RSI. If the profiler flagged functions as anomalous before it had established their baseline performance, RSI would block modifications based on nonexistent regressions. The minimum sample requirement prevents this: the profiler does not report anomalies or regressions for any function until it has enough data to establish a meaningful baseline. In production, this typically requires 15-30 minutes of normal traffic, after which the profiler transitions from data collection mode to active monitoring mode.

Cold-Start Period

After a new deployment, the profiler enters a calibration window during which anomaly detection is suppressed. Duration depends on traffic volume — at 100 req/s, calibration completes within ~30 minutes. RSI modifications during the calibration window are permitted but the profiler's regression check only evaluates functions that have reached their minimum sample count. Functions without sufficient samples are exempt from the regression check until calibrated.

What 45.6% Regression Means for RSI Decision-Making

The test result — regressionTest: 45.6% slower (108ms → 157ms) — represents exactly the kind of signal that the profiler is designed to surface. A 45.6% regression in any function that participates in a user-facing request path is a significant performance degradation. At 108ms baseline, users were experiencing approximately 100-120ms response contribution from that function. At 157ms, that contribution is 140-170ms — a difference that crosses perceptible latency thresholds for interactive features.

For RSI, this regression signal means: the modification that caused this degradation must be either reverted or addressed before the next modification cycle. RSI cannot compound modifications on a degraded code path because the compounding effect on latency can produce catastrophic user-facing performance within a small number of modification cycles. The profiler's 20% threshold exists precisely to catch regressions before they compound.

Component 7 in the Larger RSI Safety Picture

Dynamic Profiling is one of at least eight components in the RSI Safety System, each addressing a different class of risk that autonomous code modification introduces. Performance regression is Component 7's domain. Security vulnerabilities are Component 8's domain. Dependency integrity is validated earlier in the pipeline. The composition of these components is what makes RSI safe to run continuously in production — no single guard is sufficient, but together they cover the failure modes that matter for a system that modifies itself.

The 668-line implementation, passing all 12 tests, is a production-ready guard that costs nothing to operate, adds less than 1% overhead to the system it protects, and provides RSI with the information it needs to make confident modification decisions. The 1% sampling rate, the 3-sigma anomaly detection, the 20% regression threshold — these are not arbitrary constants. They represent the engineering judgment about where the tradeoffs between sensitivity, false-positive rate, and overhead land correctly for a continuously operating autonomous system.