Project hypothesis
The hypothesis stays open on purpose. It is framed to invite negative results, tradeoffs, and mechanism-specific findings.
Hypothesis statement
In autonomous financial monitoring workflows, agent behavior can drift over time even when models, prompts, and tools remain fixed. I hypothesize that this drift is driven less by model error than by the absence of explicit, executable procedure, and that self-improvement mechanisms which rely solely on implicit reasoning or unbounded context will often trade visible accuracy gains for increased behavioral variance. Conversely, I expect that agents constrained by versioned procedures and bounded memory policies will exhibit lower outcome variance and more predictable failure modes, though potentially at the cost of reduced flexibility. This project will test when self-improvement mechanisms genuinely stabilize agent behavior versus when they introduce silent procedural drift, and under what conditions explicit structure improves or degrades performance.
Tensions to test
Accuracy vs variance
Do self-improvement mechanisms reduce error while increasing drift?
Flexibility vs governance
Do explicit procedures improve stability but reduce coverage?
Memory vs reproducibility
When does context accumulation become procedural liability?
What will count as evidence
- Behavior changes under identical replays
- Escalation thresholds that drift without policy updates
- Stability improvements under explicit procedure and bounded memory