Motivation
Agentic systems in finance rarely fail by crashing. They fail by changing behavior quietly over time. That drift erodes trust even when individual outputs seem reasonable.
Why procedural drift is dangerous
Procedural drift is evaluated under fixed prompts, models, and sampling policies; variance is treated as an operational property of deployed systems rather than a nuisance to be eliminated.
Silent variance
The same task can yield different decisions across runs, especially as context accumulates. Small shifts become operational policy changes without approval.
Escalation risk
Escalation thresholds drift first. That means a quiet shift in what gets flagged, reviewed, or ignored.
Governance gaps
When procedures are implicit, there is no versioned behavior to audit or roll back.
Finance-native workflows
- Portfolio monitoring and escalation under shifting regimes
- Drawdown triage and risk narrative generation
- Anomaly investigation triggered by metric shifts
CS329A fit
The project is evaluation-first and focuses on test-time behavior. It isolates drift under fixed prompts and tools, then measures how procedures, memory policies, and verification change the stability of agent behavior.