CS329A research Procedural Drift Project
Evolving research scaffold

Measuring Procedural Drift in Autonomous Financial Monitoring Agents

This project studies how autonomous financial monitoring agents drift over repeated executions even when prompts, tools, and models stay fixed. The emphasis is on behavioral consistency, escalation reproducibility, and drift over time rather than one-off accuracy.

While the experimental domain is financial monitoring, the failure mode studied—procedural drift under repeated execution—applies broadly to agentic systems in production.

Project focus

Research question

When and why do autonomous financial agents exhibit procedural drift under repeated execution, and which constraints improve stability without destroying usefulness?

Core comparisons

  • Prompt-only baseline vs skill-executing agents
  • Bounded vs unbounded memory policies
  • Verifier feedback where useful

Primary metrics

  • Decision Disagreement Rate (DDR) across replays
  • Switch rate (SR) between adjacent replays
  • Escalation consistency and trace similarity

Planned experiments

  • Prompt-only vs skill-based agent variants
  • Memory policies: none, rolling window, TTL-bounded
  • Repeated replays per scenario to quantify drift
  • Report DDR, SR, escalation consistency, and trace similarity

Scope boundaries

  • Not a trading system
  • Not a production-ready agent framework
  • Not a fine-tuning project

Navigate the project

Motivation

Why procedural drift matters in finance and in agent governance.

Read motivation

Hypothesis

The deliberately open hypothesis and tensions to test.

Read hypothesis

Experimental design

Agent variants, task regimes, and replay structure.

Read design

Evaluation

Behavioral metrics, drift indicators, and repeatability checks.

Read evaluation