Evolving research scaffold

Measuring Procedural Drift in Autonomous Financial Monitoring Agents

This project studies how autonomous financial monitoring agents drift over repeated executions even when prompts, tools, and models stay fixed. The emphasis is on behavioral consistency, escalation reproducibility, and drift over time rather than one-off accuracy.

While the experimental domain is financial monitoring, the failure mode studied—procedural drift under repeated execution—applies broadly to agentic systems in production.

Project focus

Research question

When and why do autonomous financial agents exhibit procedural drift under repeated execution, and which constraints improve stability without destroying usefulness?

Core comparisons

Prompt-only baseline vs skill-executing agents
Bounded vs unbounded memory policies
Verifier feedback where useful

Primary metrics

Decision Disagreement Rate (DDR) across replays
Switch rate (SR) between adjacent replays
Escalation consistency and trace similarity

Planned experiments

Prompt-only vs skill-based agent variants
Memory policies: none, rolling window, TTL-bounded
Repeated replays per scenario to quantify drift
Report DDR, SR, escalation consistency, and trace similarity

Scope boundaries

Not a trading system
Not a production-ready agent framework
Not a fine-tuning project

Navigate the project

Motivation

Why procedural drift matters in finance and in agent governance.

Read motivation

Hypothesis

The deliberately open hypothesis and tensions to test.

Read hypothesis

Experimental design

Agent variants, task regimes, and replay structure.

Read design

Evaluation

Behavioral metrics, drift indicators, and repeatability checks.

Read evaluation