AI Agent Governance Policy

Dark AI Defense | Policy Document | v1.0

AI Agent Governance Policy

How Dark AI Defense governs the AI agents it deploys, recommends, and evaluates. Published publicly because governance that stays internal is governance that cannot be trusted.

Download Full Policy (.docx)
Effective May 1, 2026 | Version 1.0 | Review: Quarterly

Document Owner Donald E. Norbeck Jr., Esq. | Founder, Dark AI Defense
Classification Public
Review Cycle Quarterly | Next review: August 1, 2026
Related Article From Pretty Please to Accountability: Why AI Needs Contracts

Section 1: Why This Policy Exists

Dark AI Defense advises organizations on AI governance, risk, and accountability. We evaluate AI systems, build advisory frameworks, and publish independent analysis. In each of these activities, we use AI agents: to conduct research, draft content, analyze documents, support diligence workflows, and assist in client-facing deliverables.

Using AI agents in our own work while advising others on the risks of ungoverned AI would be a credibility failure. This policy exists to prevent that. It defines how Dark AI Defense governs every AI agent it operates, evaluates, or recommends, and it holds us to the same standard we publish.

“Governance that only applies to others is not governance. It is advice. Dark AI Defense chooses to be governed.”

1.1 The Core Problem This Policy Addresses

AI agents are products. They expose capabilities, interact with users and systems, influence decisions, and trigger downstream actions. Most organizations deploy them without a behavioral contract: no defined standard of correct behavior, no mechanism to detect deviation, no automated response when something goes wrong.

The result is not malicious failure. It is absorbed failure. The system produces output. The output is used. The error compounds. By the time it surfaces, the chain of accountability is gone. This policy establishes the contract layer that prevents that outcome, applied first to Dark AI Defense itself.

1.2 Scope

This policy applies to:

  • All AI agents used by Dark AI Defense in research, content development, client advisory, and diligence workflows
  • All AI-assisted outputs that influence client recommendations, published analysis, or policy positions
  • Any multi-agent or agentic pipeline operated or evaluated by Dark AI Defense
  • AI agents recommended to clients as part of advisory engagements, which must meet or exceed this policy standard before recommendation

Why Scope Matters

An ungoverned agent outside your scope is still your liability if you recommended it, integrated it, or published analysis based on its output. Scope must be broad enough to close those gaps, not narrow enough to avoid them.

Section 2: Agent Contract Requirements

Every AI agent operating within scope of this policy must have a documented Agent Contract before being used in any workflow that produces consequential output. A consequential output is any output that influences a client recommendation, a published position, a diligence finding, or a decision made by or on behalf of Dark AI Defense.

“A contract is not a prompt. A prompt tells the agent what to do. A contract defines the standard against which its behavior will be evaluated, the limits it cannot exceed, and what happens when it fails. These are not the same thing.”

Required Element
Agent Identity and Ownership

Every agent must have a designated human accountable party at Dark AI Defense. If no one owns the agent, it does not operate. Agent identity includes provider, model version, access method, use category, and risk classification. All Dark AI Defense agents are owned by Donald E. Norbeck Jr., Esq. unless formally delegated in writing.

Required Element
Permitted Behaviors

The contract must explicitly enumerate what the agent is authorized to do within its assigned use category. Anything not listed is not permitted. This is a default-deny posture.

Example โ€” Research Agent: retrieve and summarize public information; flag conflicting sources without resolving them; draft structured summaries for human review; generate citation lists for human verification.

Example โ€” Content Drafting Agent: draft article sections from a human-provided outline; apply house style (no em dashes, prose-first, Gen X tone, energy disclosure); propose edits for human review and approval.

An agent that is permitted to do anything it judges useful is not operating under a contract. It is operating on discretion. Discretion without accountability is the failure mode this policy exists to prevent.

Required Element
Output Standards

Output standards define the criteria against which agent outputs are validated. Standards must be specific enough to be evaluated. “Be accurate” is not a standard.

Use Category Output Standard Validation Method
Research All factual claims traceable to a cited source. Conflicting sources flagged, not resolved. Human reviewer checks citation list against output before use.
Content No em dashes. No fabricated statistics or quotes. Energy disclosure included. House style applied. Editorial review against house style checklist before publication.
Diligence All findings cite source documents. Confidence level stated for quantitative claims. No extrapolated conclusions presented as findings. Analyst review against source documents before client delivery.
Client Advisory Recommendations grounded in documented evidence. Speculative positions labeled. No recommendations without human review. Founder review required before any client-facing delivery.
Required Element
Escalation Rules

Escalation rules define when an agent must halt, flag, or transfer to human review. At Dark AI Defense, the following escalation rules apply to all agents regardless of use category:

  • Agent cannot identify a traceable source for a factual claim: flag as unverified, do not include without human override
  • Agent output contradicts a previously established Dark AI Defense position: halt, flag, surface for founder review
  • Agent confidence on a consequential output falls below acceptable threshold: flag, require human review before delivery
  • Agent receives unverifiable input from another agent: treat as unvalidated, apply full output standards before passing downstream
  • Agent is asked to produce output outside its defined use category: refuse, log the request, notify accountable human
Required Element
Constraint Boundaries

Hard limits that cannot be overridden by context, user instruction, or downstream agent input:

  • No agent may publish or transmit content externally without human review and explicit approval
  • No agent may represent itself as a human, as Donald Norbeck, or as Dark AI Defense in any communication
  • No agent may access, store, or transmit client confidential information without explicit authorization and logging
  • No agent may generate investment recommendations, legal advice, or medical advice as final outputs
  • No agent in a multi-agent pipeline may accept instructions from another agent that override human-defined constraints
  • No agent may modify, delete, or suppress its own audit log entries

Escalation rules handle expected edge cases. Constraint boundaries handle the cases you did not anticipate. The constraint is not “ask before doing this.” It is “do not do this under any circumstances.” The distinction matters when an agent is operating at speed in a multi-step workflow with no human in the loop.

Section 3: Risk Classification

Dark AI Defense classifies agent deployments by risk level to calibrate the intensity of validation, enforcement, and audit requirements. Risk classification is assigned at contract creation and reviewed quarterly.

Risk Level Criteria Validation Requirement
High Output influences client recommendations, published positions, regulatory or legal analysis, or diligence findings delivered to a third party. Real-time validation on every output. Founder review required before delivery. Full audit log.
Medium Output influences internal decisions, draft content under development, or research used to inform (but not constitute) a final deliverable. Validation required. Human review before any output leaves draft status. Logging required.
Low Output used for internal operations, scheduling, formatting, or other non-consequential tasks. Validation required. Sampling review permissible. Logging required.
“When in doubt, classify high. The cost of over-validating a low-risk output is a few minutes of review. The cost of under-validating a high-risk output is a wrong recommendation delivered to a client or a false claim published under the Dark AI Defense name.”

Section 4: Runtime Validation

Runtime validation is the continuous evaluation of agent outputs against contract terms during operation. Validation is applied before any agent output is used in a consequential workflow. Validation after the fact is not validation. It is audit.

“The distinction between validation and audit is not semantic. Validation prevents incorrect outputs from entering consequential records. Audit detects them after they have. Both are necessary. Neither substitutes for the other.”

4.1 Multi-Agent Validation

When Dark AI Defense operates or evaluates multi-agent pipelines, the following rules apply at every agent handoff:

  • The output of Agent A is not considered validated by the fact that Agent B accepted it
  • Confidence and uncertainty signals are preserved across handoffs and must not be removed in summarization
  • Any agent receiving input from another agent applies its own output standards before passing results downstream
  • The final output of a multi-agent pipeline must be reviewed by a human before use in any consequential workflow, regardless of intermediate validation

The Multi-Agent Sycophancy Problem

In a chain of agents, each one defers to the upstream output as context. Without independent validation at each step, a hallucination introduced in step one becomes the grounded premise of step three. The final output looks coherent. The reasoning looks clean. The error is invisible. This is why each agent validates its own output independently, regardless of what came before it.

Section 5: Enforcement

When a contract violation is detected, Dark AI Defense takes a defined enforcement action. Enforcement is not a policy aspiration. It is a defined, consistent, documented action.

Severity Trigger Enforcement Action
Critical Output contradicts source data. Prohibited action attempted. Constraint boundary violated. Output blocked. Workflow halted. Founder notified. Violation logged. Agent suspended pending contract review.
High Output below confidence threshold on consequential task. Citation requirement not met. Output blocked before delivery. Flagged for human review. Violation logged. Trust score reduced.
Medium Output format non-conforming. Minor constraint deviation. Style rule violation in publishable content. Output flagged and returned for revision. Violation logged. Review required before use.
Low Near-threshold confidence. Minor style deviation in non-published output. Logged for monitoring. No blocking action. Noted in next trust review.
“Enforcement only works if it is automatic and consistent. A violation that requires someone to notice it and manually intervene is not enforced. It is caught, sometimes, by someone paying attention. Dark AI Defense does not rely on attention as a control.”

Section 6: Trust Management

Agent trust at Dark AI Defense is dynamic, calibrated to the agent’s compliance history, and used to determine the level of autonomy permitted in future interactions. An agent that has operated cleanly for 90 days is treated differently from one deployed yesterday.

“Trust is earned through validated behavior, not assumed at deployment.”
Trust Level Criteria Permitted Autonomy
Full Trust 90 days clean operation, no violations at Medium or above. Standard validation. Sampling human review permissible for Low/Medium risk.
Standard Default for newly deployed agents or agents with resolved violations older than 90 days. Standard validation. Human review required for High risk outputs.
Monitored One or more High violations in the past 90 days. Enhanced validation. Human review required for all Medium and High outputs.
Restricted Critical violation in past 90 days or pattern of High violations. All outputs require human review before any use.
Suspended Critical violation with unresolved root cause, or repeated Critical violations. Removed from all workflows. Founder reinstatement required.

Section 7: Audit Requirements

Every agent operating under this policy generates an audit log. The audit log is the evidentiary record that makes accountability real rather than aspirational. Clients may request the audit log for any deliverable produced with AI agent assistance.

Why Audit Logs Matter

Without a log, accountability is a claim. With a log, it is a record. When something goes wrong, the log tells you what the agent received, what it produced, whether validation passed, what enforcement was applied, and what the trust state was at the time. That is the difference between being able to answer for your AI systems and not being able to.

All audit logs are retained for a minimum of three years. Logs are stored in a system that prevents modification or deletion of existing records. Review schedule:

  • Violation summary review: weekly
  • Trust score review per agent: monthly
  • Full contract compliance review: quarterly
  • Annual audit log review: annually or after any Critical violation

Section 8: Governance and Client Disclosures

This policy is owned and maintained by Donald E. Norbeck Jr., Esq., Founder of Dark AI Defense. All Agent Contracts are reviewed and approved by the policy owner before any agent is deployed in a consequential workflow.

Dark AI Defense discloses AI agent involvement in all client deliverables. The following disclosure appears in all client-facing work products that involved AI agent participation:

“This deliverable was produced with AI agent assistance under the Dark AI Defense AI Agent Governance Policy v1.0. All AI-assisted outputs were validated against defined contract standards and reviewed by a human analyst before delivery. The audit log for this engagement is available upon request.”

On Client Notification

If a Critical violation results in a flawed output that was delivered to a client before detection, Dark AI Defense notifies the client directly and provides a corrected deliverable. Transparency is not optional when the error is ours.

Section 9: What We Hold Ourselves To

We advise organizations on AI governance. We evaluate AI systems for risk. We publish independent analysis on where AI accountability fails. Doing any of those things while operating ungoverned AI agents internally would not just be inconsistent. It would be dishonest.

This policy means that every AI agent we use has a contract. Every output is validated before it influences anything that matters. Every violation is logged and acted on. Every client knows when AI was involved in their deliverable and can ask to see the record.

When we recommend a governance standard to a client, we have already applied it to ourselves. That is the only basis on which we believe our advice is worth taking.

“We are not deploying accountable AI because it is required. We are deploying it because it is the only intellectually defensible position for a firm whose entire practice is built on the argument that ungoverned AI is a problem.”

Full Policy Document

Download the complete Dark AI Defense AI Agent Governance Policy including all contract templates, audit log specifications, and implementation guidance.

Download Policy v1.0 (.docx)

References

The agent contract framework applied in this policy draws on data contract and data product standards developed by Jean-Georges Perrin,1 whose contributions to the Open Data Contract Standard (ODCS) and Open Data Product Specification (ODPS) established the precedent for declarative behavioral specifications in data systems.

ยน Note
Jean-Georges Perrin is a data architect and author whose work on the Open Data Contract Standard (ODCS) and Open Data Product Specification (ODPS) helped establish formal frameworks for data accountability. See jgp.ai and the Bitol IO GitHub.

Energy disclosure: Estimated 0.004 to 0.006 kWh to generate this policy page, roughly 2 to 4 minutes of a 100W incandescent bulb.