Why this matters

Motivation

Agentic systems in finance rarely fail by crashing. They fail by changing behavior quietly over time. That drift erodes trust even when individual outputs seem reasonable.

Why procedural drift is dangerous

Procedural drift is evaluated under fixed prompts, models, and sampling policies; variance is treated as an operational property of deployed systems rather than a nuisance to be eliminated.

Silent variance

The same task can yield different decisions across runs, especially as context accumulates. Small shifts become operational policy changes without approval.

Escalation risk

Escalation thresholds drift first. That means a quiet shift in what gets flagged, reviewed, or ignored.

Governance gaps

When procedures are implicit, there is no versioned behavior to audit or roll back.

Finance-native workflows

Portfolio monitoring and escalation under shifting regimes
Drawdown triage and risk narrative generation
Anomaly investigation triggered by metric shifts

CS329A fit

The project is evaluation-first and focuses on test-time behavior. It isolates drift under fixed prompts and tools, then measures how procedures, memory policies, and verification change the stability of agent behavior.