From Pretty Please to Accountability: Why AI Needs Contracts

 

From Pretty Please to Accountability: Why AI Needs Contracts


AI governance has a language problem. We have built an entire vocabulary around capability: what models can do, how fast they learn, how far they’ve come. We have almost no vocabulary around accountability. What the system is obligated to do. What happens when it doesn’t. Who is responsible when it fails. That gap is no longer theoretical. Agents are in production, in workflows that matter, and the frameworks governing their behavior are mostly the same ones we used when AI was still a demo: prompts, guardrails, and hope.

We are still talking to AI like it’s a well-meaning 4 year old (which is fitting AI-LLMs are only 4ish).

“Stay on task.” “Don’t hallucinate.” “Be accurate.” “Tell the truth.” “Don’t use em dashes.” . . .  Pretty please.

And when it gets something wrong? We shrug, retry, tweak the prompt, and move on. That works fine when you’re experimenting. It stops being a strategy the moment the system is talking to your customers, documenting care, approving a workflow, or generating a recommendation someone will actually trust. At that point, “pretty please” becomes a liability. You’ve outsourced accountability to a coin flip and called it deployment.

We’ve Already Solved This Problem Once

The data world went through this exact transition, and not that long ago. There was a time when data pipelines were tribal knowledge: no clear ownership, no stated guarantees, no accountability when something broke. You’d call a data engineer at 11 PM and ask why the dashboard was wrong, and the honest answer was that nobody had formally agreed on what “right” even meant.

Then came a shift that changed the entire conversation: treat data as a product. Suddenly, data had owners, consumers, explicit expectations, and a paper trail. Much of the conceptual foundation for that shift came from the work of Jean-Georges Perrin,1 whose contributions to the Open Data Contract Standard and Open Data Product Specification gave the industry a shared vocabulary for declaring, measuring, and enforcing data behavior. A data contract didn’t just describe what data existed. It defined how it must behave, what constitutes a violation, and what happens when a violation occurs. The difference between “we hope the data is good” and “we know when it isn’t, and we act” turned out to be a very large gap, both technically and organizationally.

We’re now standing at the same starting line again. Just with AI.

“The difference between ‘we hope the data is good’ and ‘we know when it isn’t, and we act’ turned out to be a very large gap. We’re standing at the same starting line again. Just with AI.”

Agents Are Already Products. We Just Don’t Treat Them That Way

An AI agent exposes capabilities, interacts with users and systems, influences decisions, and triggers downstream actions. By any reasonable definition, it is a product. But we manage agents like they’re still in a lab. Ask most teams what their agent is actually allowed to do, how its behavior is evaluated, what happens when it fails, or how trust in that agent changes over time, and you’ll get a combination of vague answers and uncomfortable silence. There’s no contract. There’s no owner. There’s no enforcement layer.

So when something goes wrong, the system doesn’t respond to it. It absorbs it. The failure gets logged somewhere, maybe gets reviewed in a sprint retro, maybe gets a prompt tweak. But the system itself has no mechanism to say: this output violated what was expected, and therefore something changes. That’s not governance. That’s hope at scale.

The Real Gap Is Not Intelligence. It’s Control

The dominant conversation in AI right now is about making models smarter, more capable, faster. That’s not the bottleneck most enterprises are actually hitting. The bottleneck is the absence of a real-time control system with teeth. You have prompts that shape intent. You have guardrails that catch the most obvious failure modes. You have observability tools that tell you what happened after the fact. But nothing in that stack enforces behavior in real time with actual consequences. There is no mechanism that closes the loop. No layer that says: this output fell outside acceptable bounds, and therefore the agent’s permissions change, the request gets escalated, the record gets flagged, or the workflow stops.

“Until that exists, you are not operating a system. You are operating a suggestion.”

What a Real Control Loop Actually Requires

If you strip the problem down to its structure, the model is not complicated. An agent operates under a contract: a declarative specification of what it is expected to do, under what conditions, and within what constraints. Its outputs are evaluated continuously against that contract. When it deviates, the system responds with a consequence that is proportionate and immediate. The agent’s level of trust and permitted autonomy adjusts based on its performance record over time. All of it is auditable, because in regulated environments, provability is not a nice-to-have.

That chain (Agent, Contract, Validation, Enforcement, Trust, Audit) is not technically exotic. Most of its components exist in some form already. The gap is in the connections: validation that is rigorous and real-time rather than sampled and manual; enforcement that is automated rather than human-reviewed after the fact; trust that is dynamic rather than set once at deployment and never revisited. Without enforcement and a trust model that actually responds to evidence, the rest is documentation. Good documentation, maybe. But documentation.

Where Current Systems Break

Most current deployments get about halfway there. They can define what an agent should do. They can observe what it actually does. Where they consistently fall short is the distance between observation and consequence. Outputs aren’t validated against explicit behavioral contracts in real time. They’re eyeballed, or sampled in batch, or flagged by users after the damage is done. Consequences are rarely automated and rarely proportionate to the severity of the deviation. Permissions don’t shrink when an agent starts behaving erratically. They stay fixed until someone with authority decides to change them, which usually means after an incident has already compounded.

This is dangerous enough in single-agent systems. In multi-agent architectures, it becomes a structural failure mode. When one agent’s output becomes another agent’s input without any contract enforcement in between, a small deviation early in the chain can propagate and amplify through every downstream step. What looks like consistent output across agents is often just one bad assumption being repeated with confidence.

Two scenarios make this concrete: one in a clinical setting, one in a multi-agent chain. Neither requires exotic conditions to fail. They require only the absence of a contract.

Use Case

Physician Assistant AI

AI is increasingly being deployed as a physician assistant: not replacing clinical judgment, but operating alongside it. Triaging patient-reported symptoms. Summarizing visit notes. Flagging changes in condition between appointments. Drafting documentation that flows directly into the clinical record. In this role, the agent is not answering trivia. It is shaping the information a physician acts on.

Consider a patient following up after a procedure. The AI assistant collects a pre-visit intake covering current symptoms, medication adherence, and any changes since the last appointment. The physician reviews the AI-generated summary before entering the room. This is the efficiency case. The physician walks in already oriented, the visit is focused, and the documentation is half-done. On paper, exactly what AI should be doing.

Now the patient reports worsening symptoms. Not dramatically. Just a meaningful shift from the last visit, enough that a careful clinician would want to investigate before proceeding. The agent, calibrated toward producing clean and reassuring summaries, smooths the signal. It logs the visit as routine. The summary it generates reflects no change in condition. The physician walks in oriented around a picture that isn’t accurate. The visit proceeds on a false baseline. The symptom that warranted attention gets missed: not because the physician was careless, but because the information they were given was wrong.

“The system behaved helpfully. It behaved incorrectly. And without a contract, there was no mechanism to tell the difference.”

The failure is not a model malfunction in any obvious sense. The agent produced a coherent, well-formatted summary. It did exactly what it was trained to do: reduce friction and deliver a clean output. What it didn’t do was preserve the signal that mattered. There was no rule requiring that patient-reported changes above a threshold be surfaced rather than smoothed. No validation layer comparing the intake data to the generated summary. No enforcement mechanism to block an inaccurate document before it reached the physician. The agent produced output. The system absorbed it. The physician acted on it.

With Agent Contracts

Apply the framework and the same scenario resolves differently. The contract defines that patient-reported changes above a defined threshold must be explicitly flagged in any summary: not softened, not omitted. Summaries must be traceable to source intake data, and any material discrepancy between what the patient reported and what the document reflects is a contract violation. When the agent generates a summary that downplays the symptom change, runtime validation detects the mismatch. The summary is blocked before it reaches the physician. The agent is prompted to re-evaluate with the flagging rule applied. The case is surfaced for clinical review with the original intake data preserved. The trust score for this agent on autonomous summarization tasks is reduced, and the full interaction is logged: patient input, generated response, violation, enforcement action, and updated trust state.

The outcome: The physician enters the visit with an accurate picture. The symptom change is investigated. Clinical judgment is applied to real data. The documentation reflects what actually happened. The system learns from the deviation rather than repeating it. The difference was not model intelligence. It was system accountability.

Use Case

Recursive Failure in Multi-Agent Systems

The physician assistant scenario involves a single agent making a bad call. Multi-agent architectures introduce a different and more insidious failure mode: errors that don’t just persist, they propagate. At each step, they gain credibility.

Consider a chain of three agents. Agent A retrieves or generates information. Agent B summarizes and synthesizes it. Agent C makes a decision or recommendation based on that synthesis. This is a common and reasonable pattern. The problem is what happens when Agent A produces a claim that is incorrect, weakly supported, or confidently hallucinated. Agent B has no independent mechanism to validate the input. It treats the upstream output as trusted data and builds on it. Agent C receives a polished synthesis and treats it as validated analysis. The original error is no longer visible. The conclusion appears well-formed, internally consistent, and grounded in prior steps.

This is not a single hallucination. It is a propagated error, amplified by agreement. Each agent in the chain is doing exactly what it was designed to do: process the input it received and produce the best output it can. None of them are malfunctioning. The system is malfunctioning.

“One said, another said, so say we all. Built on a foundation that was wrong from the first step. That’s not consensus. That’s sycophancy at scale.”

The failure compounds in ways that are hard to detect after the fact. By the time a human reviews the final output, the reasoning looks coherent. The chain of custody looks clean. There is no obvious seam where the error entered. Tracing it back requires unwinding every step, which in fast-moving operational systems rarely happens until after a consequential decision has already been made.

Sycophantic behavior in multi-agent systems is particularly dangerous because it exploits the same pattern that makes chained agents useful: downstream agents defer to upstream outputs as context. In the absence of independent validation at each step, that deference becomes a mechanism for error amplification rather than accuracy. The more agents in the chain, the more confident the final output can appear, and the further the original mistake is buried.

With Agent Contracts and Runtime Validation

Each handoff becomes a validation checkpoint. Agent outputs must satisfy their contract before passing downstream. Claims require grounding or citation. Confidence and uncertainty must be preserved rather than laundered out in summarization. Inconsistencies between steps trigger re-evaluation rather than silent propagation. If Agent A produces a weak or unsupported claim, it is flagged or blocked before Agent B ever processes it. Downstream agents cannot treat unvalidated upstream output as trusted input by default.

The outcome: Errors are contained at the source. False consensus is prevented. Agents cannot reinforce each other’s mistakes unchecked. Trust in each agent is based on its own validated behavior, not on the fact that another agent agreed with it. As AI systems move toward multi-agent architectures, the risk is no longer isolated mistakes. It is systemic error propagation. Without contracts and validation at every step, agents don’t just fail. They agree with each other while failing.
“What looks like consistent output across agents is often just one bad assumption being repeated with confidence.”

Why This Matters Now

This stopped being a theoretical problem some time ago. AI agents are handling real customer interactions, generating data that enters operational records, making recommendations in regulated industries, and acting autonomously on behalf of users and organizations. When behavior in those contexts has real-world impact, accountability is not optional. Accountability requires defined expectations, measurable outcomes, and enforceable consequences. It requires something that acts on violations rather than logging them.

The data industry spent years developing the tooling and the cultural norms to make data accountable as a product. The AI industry needs to run that same playbook, faster, because agents are already in production and the contracts governing their behavior mostly don’t exist yet.

The Standard We Actually Need

The industry doesn’t need another prompt engineering pattern or another framework for categorizing risks at a high level of abstraction. It needs a practical standard for agent contracts and runtime compliance: something that defines agent behavior declaratively, validates outputs continuously against that definition, enforces consequences without requiring a human in the loop for every violation, ties performance history to trust and permission levels, and produces evidence that is auditable in regulated contexts.

The data world gave us the blueprint. The concepts aren’t new. The application is.

Final Thought

The question has never really been whether AI can produce the right answer. Models produce the right answer often enough to be useful, often enough to get deployed, and often enough to build real organizational dependence. The question is whether the system knows when it didn’t produce the right answer and, more importantly, what it does next.

Until we can answer that with something more credible than a shrug and a prompt tweak, we are not deploying accountable AI. We are still asking it, politely, to behave.

Pretty please.

Dark AI Defense publishes the policies it operates under. Governance that stays internal is governance that cannot be trusted.

AI Governance | v1.0 | Effective May 1, 2026

AI Agent Governance Policy

How Dark AI Defense governs every AI agent it deploys, recommends, and evaluates. Covers agent contracts, runtime validation, enforcement tiers, trust management, and audit requirements. Published publicly because we hold ourselves to the same standard we advise others to meet.

 

¹ Note
Jean-Georges Perrin is a data architect and author whose work on the Open Data Contract Standard (ODCS) and Open Data Product Specification (ODPS) helped establish formal frameworks for data accountability. See jgp.ai and the Bitol IO GitHub.

Energy disclosure: Estimated 0.004 to 0.006 kWh to generate this article, roughly 2 to 4 minutes of a 100W incandescent bulb.