Evolving research scaffold
Measuring Procedural Drift in Autonomous Financial Monitoring Agents
This project studies how autonomous financial monitoring agents drift over repeated executions even when prompts, tools, and models stay fixed. The emphasis is on behavioral consistency, escalation reproducibility, and drift over time rather than one-off accuracy.
While the experimental domain is financial monitoring, the failure mode studied—procedural drift under repeated execution—applies broadly to agentic systems in production.
Project focus
Research question
When and why do autonomous financial agents exhibit procedural drift under repeated execution, and which constraints improve stability without destroying usefulness?
Core comparisons
- Prompt-only baseline vs skill-executing agents
- Bounded vs unbounded memory policies
- Verifier feedback where useful
Primary metrics
- Decision Disagreement Rate (DDR) across replays
- Switch rate (SR) between adjacent replays
- Escalation consistency and trace similarity
Planned experiments
- Prompt-only vs skill-based agent variants
- Memory policies: none, rolling window, TTL-bounded
- Repeated replays per scenario to quantify drift
- Report DDR, SR, escalation consistency, and trace similarity
Scope boundaries
- Not a trading system
- Not a production-ready agent framework
- Not a fine-tuning project