CS329A research Procedural Drift Project
Deliberately open

Project hypothesis

The hypothesis stays open on purpose. It is framed to invite negative results, tradeoffs, and mechanism-specific findings.

Hypothesis statement

In autonomous financial monitoring workflows, agent behavior can drift over time even when models, prompts, and tools remain fixed. I hypothesize that this drift is driven less by model error than by the absence of explicit, executable procedure, and that self-improvement mechanisms which rely solely on implicit reasoning or unbounded context will often trade visible accuracy gains for increased behavioral variance. Conversely, I expect that agents constrained by versioned procedures and bounded memory policies will exhibit lower outcome variance and more predictable failure modes, though potentially at the cost of reduced flexibility. This project will test when self-improvement mechanisms genuinely stabilize agent behavior versus when they introduce silent procedural drift, and under what conditions explicit structure improves or degrades performance.

Tensions to test

Accuracy vs variance

Do self-improvement mechanisms reduce error while increasing drift?

Flexibility vs governance

Do explicit procedures improve stability but reduce coverage?

Memory vs reproducibility

When does context accumulation become procedural liability?

What will count as evidence

  • Behavior changes under identical replays
  • Escalation thresholds that drift without policy updates
  • Stability improvements under explicit procedure and bounded memory