The Book · Chapter 10

From Prompt Craft to the Agent Contract

Enterprises now deploy systems that interpret instructions, retrieve context, select tools, and act on business processes within a partially open space. Governing such systems requires more than model selection and prompt craft. It requires a typed architectural specification that binds intent, capability, policy, decision, source model, control, and evaluation into a single governed unit.

This chapter calls that unit the agent contract. It extends the EA codex specification family established in Chapter 8 into the artifact that closes the gap between architecture and autonomous execution.

1. When AI stops being a feature and becomes an architectural problem

AI does not belong only to the technology section of an enterprise story. It belongs wherever an enterprise must decide what can act, on what basis, within which limits, under whose accountability.

AI reopens the architecture question because an AI system executes within a partially open space. It interprets instructions, retrieves context, selects tools, produces explanations, and in some cases triggers side effects. The enterprise is therefore no longer governing only deterministic software behavior. It is governing bounded interpretation.

Chapter 4 established why AI and automation raise the cost of architectural ambiguity. This chapter takes the argument further. It does not merely argue that AI makes architecture more urgent. It shows what architecture must produce to make AI governable: the agent contract as the practical unit of AI architecture, and the semantic and decision infrastructure that makes agent contracts enforceable.

The central temptation is to believe that AI can compensate for architectural ambiguity. In practice it amplifies ambiguity. If the enterprise vocabulary is unstable, the system retrieves the wrong material. If the data landscape is weakly governed, the system mixes draft content with approved content. If authority is unclear, the agent acts beyond its mandate. If design decisions are trapped in meetings or slide decks, the runtime has nothing reliable to enforce.

Seen in that light, AI is a pressure test on the quality of architectural semantics.

It asks whether the enterprise can express scope, entitlement, provenance, policy, approval, and exception handling in machine-usable form. Where the answer is yes, AI becomes useful. Where the answer is no, AI remains impressive but ungovernable.

In a document-centered architecture practice, this burden is easy to underestimate. Teams may believe that an AI capability can be added to an existing application landscape by attaching a model endpoint, a vector store, and a prompt template. That may be enough for experimentation. It is not enough for enterprise operation. The moment the system is expected to answer from enterprise knowledge, act within business processes, respect regulatory boundaries, or justify its behavior, the enterprise discovers that the model itself is not the core design problem. The core design problem is the structure around the model.

This is also where the old opposition between “business architecture” and “technical architecture” becomes especially damaging.

AI needs both:

  • It needs business semantics to know what a concept means, where authority resides, and which capability it is serving.
  • It needs technical structure to know where context is retrieved, how tools are invoked, how policies are evaluated, and how traces are persisted.

If either side is weak, the whole arrangement becomes unreliable. This is why enterprise architecture matters more in an AI-rich environment, not less. The architect is no longer only describing systems and transitions. The architect is defining the semantic and control envelope in which AI can operate safely enough to become part of enterprise execution.

2. Why traditional enterprise architecture fails AI in practice

Traditional EA struggles with AI for structural reasons. Much of classical EA assumes that architecture influences delivery indirectly through principles, target states, standards, and review checkpoints. AI systems require architecture to participate at a finer level of operational detail.

A generic principle such as “customer data must be protected” is useful as direction but not to an agent runtime. The runtime needs to know many things:

  • Which data classes may be retrieved, whether draft documents are admissible, and how conflicting sources are prioritized.
  • Which actions require human approval, and which outputs must include citations.
  • What confidence threshold triggers escalation and how traces are stored for review.

That is architecture expressed as executable specification, not as guidance prose.

The weakness becomes especially visible in the treatment of design decisions. In AI systems, design decisions are the real operational hinge. Decisions that must survive beyond the meeting that produced them include which model class is allowed for which use case, whether retrieval is mandatory and whether internet search is prohibited, whether the agent may write into a system of record, whether human approval is required before side effects, whether outputs must be grounded only in approved corporate sources, whether reasoning traces are stored, and whether a low-confidence answer must refuse rather than improvise. Without explicit decisions, AI systems are governed through tribal memory and local prompt engineering. That is not enterprise architecture; it is operational luck.

Another weakness appears in the gap between policy and implementation. Human teams can often bridge that gap informally: an experienced analyst knows which source is trusted, a compliance officer knows which step needs sign-off, and a solution architect knows which workflow is sensitive. AI systems do not share this tacit knowledge unless it is encoded somewhere they can consume or that the surrounding platform can enforce. A prompt alone does not solve this, because prompts are too fragile, too local, and too easy to override through context drift.

The failure mode is predictable. An organization launches a pilot copilot that works because a small team curates the documents, watches the responses, and manually corrects edge cases. Confidence rises, and the system is extended across functions, geographies, and data domains. At that point the hidden architecture problem emerges: terms differ between business units, document status is inconsistent, ownership of knowledge sources is unclear, tool permissions reflect historical accidents rather than business policy, and approval rules are implemented in human habit rather than in systems. The AI does not create this disorder; it exposes it.

Industry tooling already reflects this reality. Contemporary agent frameworks are organized around tools, guardrails, state, approvals, and evaluation rather than around model calls alone. OpenAI’s current agent documentation defines agents in terms of tools, guardrails, MCP servers, handoffs, structured outputs, resumable state, human review, and trace-based evaluation.

The market has also learned that AI usefulness comes from context architecture, not model selection.

  • Microsoft’s architecture for Microsoft 365 Copilot explains that the system improves prompts through grounding, drawing in text from files or other discovered content, bounded by existing permissions so users see only content they are allowed to access.
  • Amazon Bedrock exposes the same truth: its knowledge base capability connects to unstructured and structured sources and can be included directly in agent workflows, while guardrails are an explicit part of agent creation. In both cases, the useful system is an architectural assembly of governed sources, retrieval logic, policy enforcement, and action boundaries.

The distinction between a model and an agent becomes important here:

  • A model is a reasoning and generation capability.
  • An agent is an operational construct. It has a role, a scope, a set of tools, a context envelope, a policy boundary, and a trace.

The proper EA question is not “Where do we use LLMs?” The proper question is “Which bounded agent roles should exist in which capabilities, under which decisions, with which specifications and controls?”

Not every AI use requires full agent architecture. Classification, extraction, summarization, and translation tasks may need only lightweight controls: source validation, output schema, and quality monitoring. As operational consequence increases (writing to systems of record, acting on business processes, advising on regulated matters), architecture must become more explicit. That is not bureaucracy. It is risk shaping. The architecture function should calibrate governance weight to operational consequence rather than applying uniform formality across every AI use case.

The Enterprise Architecture Codex becomes especially valuable here because AI systems require a reliable substrate of meaning. They need more than documents. They need definitions, approved source classes, decision records, policy mappings, process context, relationship models, validation rules, and reusable patterns.

The Codex is therefore not only a support for architects and developers, it is also a support for agents. It provides the semantic envelope in which retrieval, reasoning, action selection, and control can occur with discipline. Once that is understood, the role of enterprise architecture changes. EA is no longer cataloguing AI initiatives. It is defining the semantic and decision architecture that makes those initiatives trustworthy enough to scale.

3. From architecture repository to agent contract

To make the shift operational, the enterprise needs an architectural construct that sits between abstract guidance and runtime behavior. This chapter calls that construct an agent contract.

An agent contract is a compact specification that binds business purpose, enterprise semantics, permitted context, tool boundaries, policy requirements, decision inheritance, and control expectations into one governed unit.

It is the missing middle between architecture theory and agent runtime:

  • The contract begins with intent: every enterprise agent should be traceable to a business purpose and an expected outcome.
  • It binds that intent to business scope through capabilities that provide the stable organizational frame.
  • Policies enter as explicit constraints.
  • Design decisions give the contract its operational form: whether retrieval is mandatory, whether draft content is excluded, which source classes outrank others, whether the agent may call transaction systems, what human approvals are required, and what evidence must be captured.
  • The specification layer translates decisions into runtime-consumable artifacts.
  • Controls enforce boundaries.
  • Feedback measures misfires, overrides, refusals, source quality, and business value.

What matters is the separation of concerns. Intent is not a prompt, policy is not a decision, and a decision is not yet a control. When these are collapsed, AI systems become hard to govern because nobody can tell whether a failure reflects poor business scope, weak policy, a missing design choice, bad tooling, or runtime drift.

3.1. Agent Contract

Figure 1 shows an Agent contract for a customer dispute resolution assistant. It gives enterprise architecture a unit of control that is neither too abstract nor too platform-specific.

  • The capability anchor keeps the agent tied to a stable business domain.
  • The delegation level expresses the contract’s authority profile in the same L1 to L4 vocabulary the EA Council uses for every other delegated work product, so a customer dispute assistant at L2 is governed by the same rules of accountability that apply to any other approval-bounded agent.
  • The semantic context prevents the runtime from operating against raw text alone.
  • The source model distinguishes authority from convenience.
  • The three-tier classification (authoritative, conditional, prohibited) is not a nicety. It determines the trustworthiness of every answer the agent produces. In a customer dispute context, an answer grounded in the approved service policy library carries different weight than one drawn from a knowledge base FAQ. The architecture makes that distinction explicit and enforceable rather than leaving it to the model’s implicit weighting of retrieved passages.
  • The design decisions express deliberate trade-offs.
  • The controls show what the enterprise is willing to automate and where it insists on intervention.
  • The feedback section prevents the arrangement from becoming static.
apiVersion: ea.codex/v1
kind: AgentContract
metadata:
  id: AGENT-CDRA-001
  name: customer-dispute-resolution-assistant
  status: approved
  version: "1.0.0"
  domain: customer-service
spec:
  intent:
    capability: Customer Service Management
    objective: Reduce time to assemble policy-grounded case summaries
    delegationLevel: L2
    serviceBoundaries:
      allowed:
        - summarize_case_history
        - propose_resolution_options
        - cite_applicable_policy
      forbidden:
        - approve_financial_compensation
        - alter_customer_master_data
        - send_binding_external_commitments
  semanticModel:
    entities:
      - Customer
      - Complaint
      - PolicyClause
      - Product
      - CompensationDecision
    glossaryPack: codex://service/customer-disputes/glossary/v3
    decisionPack: codex://service/customer-disputes/decisions/v5
  sourcePolicy:
    authoritative:
      - crm_case_records
      - approved_service_policy_library
      - product_terms_repository
    conditional:
      - knowledge_base_faq
    prohibited:
      - personal_notes
      - draft_policy_documents
      - public_web_search
  designDecisions:
    retrievalRequired: true
    answerMustInclude:
      - cited_source_ids
    sourcePrecedence:
      - approved_service_policy_library
      - crm_case_records
      - product_terms_repository
      - knowledge_base_faq
    actionProfile: readOnly
    mandatoryEscalationConditions:
      - compensation_above_threshold
      - policy_conflict_detected
      - low_source_confidence
  controls:
    inputGuardrails:
      - redact_payment_card_data
      - reject_requests_for_policy_override
    outputRules:
      - no_legal_commitment_language
      - cite_only_retrieved_sources
      - mark_uncertainty_explicitly
    humanReview:
      requiredBefore:
        - compensation_exception
        - goodwill_offer
    trace:
      storePromptContextHash: true
      storeRetrievedSourceIds: true
      storeDecisionPath: true
  feedback:
    evaluateOn:
      - citation_accuracy
      - policy_alignment
      - escalation_precision
      - average_handling_time_reduction
    reviewCadence: weekly

Figure 1: Agent contract for a customer dispute resolution assistant

An architecture repository that cannot produce something like this is not yet ready to govern enterprise AI. It may still be useful for analysis, but not yet useful for bounded agency.

In operational terms, the contract is the seed that the BMAD-style attractor loop introduced in Chapter 7 validates at release time and continues to test against runtime evidence. The agent realizes; the contract is what the realization is checked against.

The EA Codex becomes operational here. The contract refers to glossary packs, decision packs, policy packs, source classes, and control logic that can be managed as reusable enterprise assets. A pack is a different kind of object from the architecture package introduced in Chapter 7: the architecture package is the per-initiative lifecycle container that carries a work unit through BMAD, and it may itself reference several packs composed together with the contract and its evidence.

Once those assets exist, AI design stops being handcrafted per use case. It becomes an industrialized architectural practice. The same glossary pack that governs the customer-dispute agent can be used by a customer-analytics agent, a customer-communications agent, or any other bounded role within the same capability domain. The same decision pack can propagate source priority rules across multiple agents. That is where the product-line thinking from Chapter 9 becomes directly relevant to AI agent architecture.

Agent contracts should also be designed to evolve with AI capability maturity. Today’s contracts may center on retrieval-augmented generation patterns. Tomorrow’s may need to accommodate multi-step reasoning, tool orchestration, or cross-agent delegation. The contract structure (intent, semantic model, sources, decisions, controls, feedback) is stable across those capability shifts because it describes what the enterprise governs, not how the model works. When new AI capabilities emerge, the decisions and controls change; the contract structure persists.

3.2. When agents must talk to other agents

The agent contract introduced in the previous section governs an agent in isolation: its intent, semantic model, source policy, design decisions, controls, and feedback. That structure is sufficient when the agent operates as a single principal interacting with humans and tools. It is not sufficient when two agents must exchange messages, hand off work, or coordinate a multi-step decision.

This case is no longer rare.

  • Google Cloud Next ’26 placed agent-to-agent communication at the center of the Gemini Enterprise Agent Platform announcement.
  • Microsoft’s Copilot ecosystem and Anthropic’s MCP arc are converging on similar primitives. A2A and MCP are crossing into mainstream enterprise adoption faster than the architectural artifacts that should govern them.

The pharmacovigilance scenario illustrates the gap. The AI Triage Service from earlier chapters performs an extraction step. A separate Case Quality Review agent validates the completeness of the extraction before the pharmacovigilance platform creates a regulated case record. These two agents have distinct contracts. Each is well-governed in isolation. But the relationship between them, the message types they exchange, the data they may share, the redaction rules that apply, the autonomy each retains during the exchange, and the conditions under which the exchange escalates to a human, sit in no contract.

The Codex addresses this gap with a kind named AgentInteractionContract. It governs the interaction between two agents as a first-class architectural object, distinct from the contracts of the participating agents.

A contract is shown in Figure 2 below. The reader should notice five blocks:

  • the participants with their identities,
  • the protocol used,
  • the purpose anchored in a capability,
  • the exchange rules that constrain message types and data,
  • the autonomy boundary with escalation and termination conditions.
apiVersion: ea.codex/v1
kind: AgentInteractionContract
metadata:
  id: AIC-PV-001
  name: triage-to-case-quality-handoff
  status: approved-with-controls
  version: "1.0"
  domain: pharmacovigilance
  owner: pharmacovigilance-ai-governance
spec:
  sourceAgent:
    agentContractRef: AGENT-PV-TRIAGE-001
    role: emitter
    identity:
      verificationMethod: signed-agent-card
      trustAnchor: acme-internal-root-ca
  targetAgent:
    agentContractRef: AGENT-PV-CASE-QUALITY-001
    role: receiver
    identity:
      verificationMethod: signed-agent-card
      trustAnchor: acme-internal-root-ca
  protocol:
    name: A2A
    version: "1.0"
    transport: https
  purpose: >
    Hand off triaged adverse-event candidates to the case quality review
    agent for completeness validation before pharmacovigilance case creation.
  purposeRef: INTENT-PV-001
  capabilityRef: CAP-PV-001
  allowedMessageTypes:
    - TriageOutput
    - QualityAssessment
    - MissingEvidenceRequest
  messageControls:
    - identity-verification
    - data-minimization
    - patient-identifier-redaction
    - purpose-bound-message
  exchange:
    messageDirections:
      TriageOutput: source-to-target
      QualityAssessment: target-to-source
      MissingEvidenceRequest: target-to-source
    dataExchange:
      allowedDataProducts: [DP-PV-TRIAGE-OUTPUT-V1]
      prohibitedFields: [patientFullName, patientNationalId]
      redactionRules:
        - { field: patientReportedNarrative, rule: pii-redaction }
  autonomyBoundary:
    sourceAutonomyLevel: human-supervised
    targetAutonomyLevel: human-supervised
    stopConditions:
      - { condition: maxRoundtrips > 3 }
      - { condition: humanOverride }
  controls:
    - { type: identity-verification, requiredEvidence: signed-agent-card-trace }
    - { type: pii-minimization, requiredEvidence: redaction-log }
  escalation:
    humanRole: pharmacovigilance-case-owner
    triggerConditions: [low-confidence, regulatoryConflict]
    slaMinutes: 30
  termination:
    timeoutSeconds: 1800
    exitConditions: [quality-assessment-completed, escalation-accepted-by-human]
  linkedPrinciples: [AI-005, AI-013, DATA-001, DATA-005]
  linkedDecisions: [DEC-PV-001]
  linkedEvaluations: [EVAL-PV-001]

Figure 2: ACME Pharma agent interaction contract

The kind is governed by an OPA policy package that enforces seven rules at PR time. The most important are AIC-001 (cross-trust-boundary interactions require identity verification), AIC-002 (prohibited fields require a PII minimization control), and AIC-003 (termination must define a structured exit condition, not just a timeout).

The relationship between AgentContract and AgentInteractionContract is bidirectional but asymmetric. Each AgentContract may declare an interactionContracts[] array listing the AICs in which the agent participates. The AIC remains the source of truth for the interaction itself. This asymmetry matters because an agent’s contract is owned by the agent’s domain team, while the interaction contract may be owned by a coordinating function (an AI governance team, a regulatory officer, a domain steward) responsible for the cross-domain handoff.

The Codex treatment of multi-agent interaction is therefore not an extension of the agent contract. It is a separate governance object with its own owner, its own lifecycle, its own validation, and its own evidence requirements. The agent contract describes who the agent is. The interaction contract describes who the agent talks to and under what conditions. Both must exist for any A2A flow that touches a regulated capability.

3.3. When agents remember

The agent contract introduced earlier in this chapter governs what an agent reads at request time. It classifies sources as authoritative, conditional, or prohibited. It does not govern what the agent retains between invocations. That distinction was theoretical when agents were stateless. It is no longer theoretical.

Memory Bank in Vertex AI and Memory Profiles in Agent Builder, announced at Google Cloud Next ’26, ship managed agent memory as a first-class platform service. The same primitive exists in adjacent ecosystems. Memory is not a research concept anymore; it is an operational service that an enterprise AI agent will consume by default unless governance prevents it.

This creates an asymmetry the existing agent contract does not handle. The source policy answers the question “what can the agent read this turn.” A separate question now arises: “what can the agent retain across turns, across sessions, across users, across workflows.” The answers differ.

Consider the AI Triage Service from earlier in this book. The source policy permits the service to read inbound adverse-event reports, the pharmacovigilance ontology, and approved case reference data. The agent legitimately needs these at every invocation. But what does it retain afterward?

A few legitimate categories: the active triage workflow step, so an interrupted review can resume; reviewer-set preferences, such as default region filter; the reviewer’s organizational role for the duration of a session.

A larger set of categories that must not be retained: patient names, medical record numbers, national identifiers, contact details, dates of birth; investigator private notes flagged confidential at the source; AI-generated regulatory opinions not validated by a qualified human; unredacted patient narrative text before PII processing.

A traditional source policy cannot make these distinctions clearly because it was written for the read path, not the retention path. The Codex therefore introduces a separate kind, AgentMemoryPolicy, dedicated to memory governance.

The structure of an AgentMemoryPolicy carries five blocks. The first declares which agents and capabilities the policy applies to. The second declares retention modes per category, with TTLs and owner controls. The third declares the classification: explicit allowed categories with sensitivity levels and review frequencies, and explicit forbidden categories with regulatory anchors. The fourth declares the controls, naming the operational mechanisms (memory inspection, deletion-on-request, redaction-on-classification, export, anomaly detection) and assigning each to an owner role with evidence requirements. The fifth declares the policy review cadence itself.

Figure 3 below shows a policy fragment for the AI Triage Service.

apiVersion: ea.codex/v1
kind: AgentMemoryPolicy
metadata:
  id: AMP-PV-001
  name: ai-triage-service-memory
  status: approved
  domain: pharmacovigilance
  owner: pharmacovigilance-data-protection-officer
spec:
  agentContractRef: AGENT-PV-TRIAGE-001
  retentionRules:
    - category: session-context
      retention: "1h"
      mode: session
      description: Default in-session ephemeral context
    - category: workflow-state
      retention: "24h"
      mode: workflow-bound
      sensitivityLevel: internal
      reviewFrequency: quarterly
      description: Active triage workflow step and transient context
    - category: user-preference
      retention: "365d"
      mode: persistent
      ownerControl: user
  prohibitedRetention:
    - patient-identifiable-data
  prohibitedCategories:
    - id: patient-identifiable-data
      description: Names, MRN, national IDs, contact details, dates of birth.
      regulatoryAnchor:
        - { framework: HIPAA, obligation: minimum necessary standard }
        - { framework: GDPR, obligation: data minimization (Art 5(1)(c)) }
        - { framework: PMDA, obligation: protection of personal information }
      enforcement: blocking
  auditRequired: true
  controls:
    - { type: memory-deletion-on-request, ownerRole: data-protection-officer,
        slaHours: 72, regulatoryAnchor: GDPR-Art-17 }
    - { type: memory-inspection, ownerRole: data-protection-officer,
        frequency: quarterly, evidenceRequired: inspection-report }
  review:
    ownerRole: data-protection-officer
    frequencyDays: 90

Figure 3: ACME Pharma memory policy fragment for the AI Triage Service

The policy is bound to the agent through an optional field on AgentContract: memoryPolicyRef: AMP-PV-001. The OPA policy package eacodex.agent_memory enforces six rules. Three are blocking: regulated agents must declare a forbiddenCategory with a regulatoryAnchor; persistent retention requires owner control; references to GDPR or PMDA require a memory-deletion-on-request control. Three are warnings: review cadence above 90 days for regulated agents, weak inspection frequency, and high-sensitivity allowed categories without review frequency.

The deeper architectural point is that memory governance is not data governance. Data governance handles enterprise data assets that have stable definitions, owners, and lifecycles. Memory is contextual, dynamic, and derived from interactions. Governing memory through data primitives misses the point. The AgentMemoryPolicy is the form the Codex gives to that distinction, and it is what makes agent memory operationally auditable rather than rhetorically committed.

4. ACME Pharma: a governed clinical protocol assistant

Assume ACME Pharma wants an assistant to help study teams answer protocol questions during site activation. The intent is to reduce delay in interpreting approved study documentation, country addenda, and standard operating procedures, while preserving regulatory discipline and escalation to qualified humans when interpretation becomes consequential. Figure 4 below shows an example of Agent Contract for this protocol assistant.

apiVersion: ea.codex/v1
kind: AgentContract
metadata:
  id: AGENT-PROTO-001
  name: protocol-assistant
  status: approved
  version: "1.4.0"
  domain: clinical-study-startup
spec:
  intent:
    capability: Clinical Study Startup Support
    objective: Shorten turnaround for protocol and site-pack question handling
    delegationLevel: L1
    serviceBoundaries:
      allowed:
        - explain approved protocol text
        - compare protocol with country addendum
        - locate relevant SOP references
        - draft non-binding response for study team review
      forbidden:
        - interpret patient-level eligibility decisions
        - provide medical advice
        - modify protocol content
        - submit regulatory responses
  semanticModel:
    entities:
      - Study
      - Protocol
      - ProtocolVersion
      - CountryAddendum
      - SOP
      - InclusionCriterion
      - ExclusionCriterion
      - InvestigatorQuestion
      - EscalationCase
    authoritativeRelations:
      - ProtocolVersion supersedes ProtocolVersion
      - CountryAddendum constrains ProtocolVersion
      - SOP governs Activity
      - InvestigatorQuestion may_trigger EscalationCase
  sourcePolicy:
    authoritative:
      - validated_protocol_repository
      - approved_country_addenda
      - controlled_sop_library
    conditional:
      - lessons_learned_library
    prohibited:
      - email_attachments
      - personal_share_drives
      - draft_medical_commentary
  designDecisions:
    retrievalRequired: true
    retrievalMustBeScopedBy:
      - study_id
      - country_code
      - document_effective_date
    sourcePrecedence:
      - approved_country_addenda
      - validated_protocol_repository
      - controlled_sop_library
      - lessons_learned_library
    answerMustInclude:
      - cited_source_ids
      - document_versions
      - uncertainty_statement_when_applicable
    actionProfile: readOnly
    mandatoryEscalationConditions:
      - conflict_between_sources
      - ambiguity_in_eligibility_language
      - safety_or_adverse_event_reference
      - missing_authoritative_source
  controls:
    outputRules:
      - do_not_state_medical_judgment
      - do_not_invent_missing_document_content
      - distinguish_quote_from_inference
    humanReview:
      requiredBefore:
        - response_sent_to_investigator
        - protocol_deviation_guidance
    trace:
      storeRetrievedDocuments: true
      storeVersionLineage: true
      storeEscalationTrigger: true
  feedback:
    evaluateOn:
      - citation_correctness
      - protocol_alignment
      - escalation_recall
      - response_cycle_time
      - human_override_rate

Figure 4: ACME Pharma protocol-assistant agent contract

  • The service boundary is narrow by design. A protocol assistant that tries to answer anything about a study will quickly cross into judgment, regulation, or medical responsibility. The allowed and forbidden actions define a bounded role.
  • The semantic model is the basis for retrieval, resolution of conflicts, and interpretation of supersession. In pharmaceuticals, version lineage is a condition of trustworthy response. A system that cites the wrong protocol version is not merely inaccurate; it is operationally dangerous.
  • The source policy separates authoritative content from merely helpful content. This is essential in regulated environments where an apparently useful answer can still be unacceptable if it depends on an uncontrolled source. By making source classes explicit, ACME Pharma avoids a common anti-pattern in which retrieval quality is measured only by relevance while provenance is treated as secondary. An answer that is semantically relevant but drawn from a personal share drive or a draft medical commentary is not acceptable in a clinical context, no matter how well the model summarizes it.
  • The design decisions are the operational heart. Retrieval is mandatory because unguided generation is not acceptable. Retrieval must be scoped by study, country, and effective date because semantic relevance alone is insufficient. Source precedence expresses a legal and procedural ordering. Mandatory escalation conditions convert architectural judgment into runtime behavior. A system that detects conflicting guidance between a country addendum and the base protocol does not attempt to resolve the conflict autonomously. It escalates to a qualified human, because conflicting regulatory guidance in a clinical trial is exactly the kind of situation where AI should surface the problem rather than hide it behind a plausible-sounding answer.
  • The controls turn that architecture into enforceable discipline. The assistant is not merely told to be careful. It is constrained from expressing medical judgment, prohibited from inventing missing content, and required to distinguish quotation from inference. Human review is tied to specific downstream uses (response sent to investigator, protocol deviation guidance) rather than applied as a vague blanket.

This is how AI becomes governable in an enterprise setting. The model may be powerful, but the value comes from the architecture around it.

Figure 5 below shows how architecture decisions become enforceable at release time. A policy checks conformance between the approved contract and the runtime configuration. It blocks prohibited sources, enforces retrieval when required, and prevents deployment when citation controls are missing. For ACME Pharma, this matters because AI governance cannot depend on manual diligence alone. Clinical operations run under schedule pressure. A release control that refuses unsafe configurations is a necessary part of the operating model.

package acme.agentrelease
default allow = false
approved_sources := {
  "validated_protocol_repository",
  "approved_country_addenda",
  "controlled_sop_library",
  "lessons_learned_library"
}
deny[msg] if {
  some src
  src := input.agent.source_policy.prohibited[_]
  input.runtime.retrieval_connectors[_] == src
  msg := sprintf("deployment connects to prohibited source: %s", [src])
}
deny[msg] if {
  input.agent.design_decisions.retrieval_required
  count(input.runtime.retrieval_connectors) == 0
  msg := "retrieval is mandatory but no retrieval connector is configured"
}
deny[msg] if {
  not input.runtime.output_controls.citations_required
  msg := "citations are required for this agent"
}
allow if {
  not deny[_]
  input.runtime.mode == "read_only"
  input.runtime.output_controls.citations_required
}

Figure 5: Release-time Rego policy enforcing agent-contract conformance (ACP-ARP-012)

A further step in a mature implementation would have several characteristics:

  • The pipeline ingest authoritative metadata from the Codex rather than relying on manually copied values.
  • The model card status would come from a governed registry.
  • Source policy would be resolved dynamically from the agent contract stored in the Codex.
  • Runtime monitors would detect drift between approved configuration and actual retrieval behavior.
  • Exception approval would create a time-bound override object with compensating controls and escalation deadlines.

The principle remains unchanged: architectural decisions are authored once at the contract level and projected into the enforcement mechanisms where conformance must be checked.

5. What this changes for enterprise architects

Once AI is treated as bounded agency rather than as an application feature, the architect’s role changes materially.

To see what that looks like concretely, consider a situation at ACME Pharma. The clinical operations team wants to extend the protocol assistant to answer questions about investigational product storage conditions, arguing that the same infrastructure and retrieval pipeline can be reused:

  • Under the old model, the platform team might simply add the storage-condition documents to the retrieval corpus.
  • Under the agent contract model, the architect checks whether the extension stays within the contract’s scope.

The current contract covers Clinical Study Startup Support with a service boundary that includes explaining approved protocol text and locating SOP references. Investigational product storage conditions fall under a different capability (Investigational Product Management) with different governing policies (GMP compliance, cold-chain validation) and different accountability (pharmacy and supply chain rather than clinical operations). The architect identifies that this is not a scope extension. It is a new agent role that requires its own contract with its own source policy, design decisions, and escalation conditions. Reusing the same retrieval infrastructure is a valid platform decision. Reusing the same agent contract is an architectural mistake because it would blur the accountability boundary between clinical operations and investigational product management.

The pattern is broader than this single example. The architect must shape agent roles, source classes, semantic boundaries, decision inheritance, approval thresholds, and control points. This is closer to operating model design than to traditional repository curation.

Architecture work becomes much more decision-centric. AI programs generate many debates that look technical but are actually architectural: whether an agent can write to a system of record, whether public internet sources are ever admissible, whether one enterprise glossary is sufficient across regions, whether confidence scoring can justify auto-approval, whether an answer without provenance may be shown to a user, and whether an assistant should refuse broadly or escalate narrowly. These are design decisions with business consequences. They must be recorded, versioned, related to capabilities and policies, and translated into specifications. Without that discipline, each AI team resolves the same questions independently, producing inconsistent governance across the estate.

Skills shift accordingly. 

  • Semantic modeling becomes more important because AI systems are sensitive to ambiguity in enterprise terms.
  • Information architecture becomes more important because provenance and document status directly affect answer quality.
  • Platform understanding becomes more important because controls are enforced through retrieval pipelines, agent runtimes, and policy engines.
  • Evaluation literacy becomes more important because AI quality includes citation behavior, escalation behavior, refusal behavior, and domain alignment, not just accuracy.

The architect who cannot reason about source classes, confidence thresholds, escalation logic, and trace structures will be confined to commentary while AI governance happens elsewhere.

The architect also acquires a mediation role:

  • AI teams optimize for utility.
  • Risk teams optimize for restriction.
  • Business teams optimize for local outcomes.

The architect converts these tensions into explicit structures rather than endless argument. Which capabilities deserve specialized agents, which ones need only retrieval assistance, which actions remain human, which sources count as authoritative, which failure modes are tolerable, and which controls must be uniform across the estate: these are architectural synthesis problems.

There is also a change in process. Reviews cannot remain isolated stage gates performed after technical design is fixed. Architecture review must happen where agent contracts, tool permissions, source bindings, and policy mappings are defined. It must also continue into runtime through traces, evaluations, and incident analysis. An AI architecture that is never inspected after deployment is not governed. It is merely launched.

That does not mean every enterprise needs a vast AI architecture program. Many organizations should begin with a small number of high-value, tightly bounded agent roles. Even in a small portfolio, the same principles apply. The architect’s obligation is to make the boundary explicit and reusable so that growth does not multiply inconsistency.

The architect also becomes responsible for ensuring that the feedback loop closes. The agent generates traces, evaluations, and incident reports. Those signals must inform the next revision of the agent contract; not disappear into a monitoring tool nobody reads. The architecture function must operate the loop, not only design it. The metrics that matter for an AI agent contract (refusal rates, escalation triggers, retrieval success, override frequency, evidence completeness) are operational measures of architectural fitness. Architecture that does not look at them is flying blind, and an agent fleet without that feedback discipline will drift away from its contract over time without anyone noticing until something fails publicly.

6. Risks, limits, and trade-offs

Several limits and trade-offs deserve naming for any enterprise serious about agent governance.

Modeling cost is the obvious starting point. Building agent contracts, source taxonomies, decision packs, and control mappings takes effort. Low-risk assistance can tolerate lighter contracts and simpler controls. High-impact or regulated use cases cannot. The answer is calibrated structure, not uniform weight, with the depth of the contract matching the operational consequence of the agent.

The maintenance burden compounds over time. Enterprise context changes. Policies evolve. Document libraries are reclassified. If agent contracts are not maintained, the architecture becomes stale and the controls become either ineffective or obstructive. AI makes stewardship more continuous, not less, because the systems being governed change faster than the documents that govern them.

Organizational friction is real and predictable. Explicit decisions expose disagreements that informal practice previously concealed. A sales team may want wide retrieval. Legal may insist on a narrow admissible source set. Architecture makes these tensions visible because it has to translate them into enforceable choices. The friction is not a bug; it is the price of moving from advisory to executable governance.

The spectrum of AI uses also matters. Not every AI application needs a full agent contract. Simple classification or extraction tasks may need only source validation, output schema, and quality monitoring. The agent contract model is most valuable for AI uses with operational consequence: writing to systems, advising on regulated matters, or acting within business processes. The architecture function should calibrate governance weight to operational risk rather than applying uniform formality. A free-form drafting assistant for low-risk internal ideation can tolerate a broad scope and lightweight controls. A procurement negotiation assistant, a clinical trial assistant, or an HR policy agent cannot.

The limits of formalization are honest constraints. Not all enterprise knowledge reduces cleanly to schemas, policies, and decision rules. Some judgment remains contextual and contested, particularly in areas involving ethics, negotiation, creative strategy, or ambiguous regulation. In those domains, the honest architectural answer may be to keep AI in advisory mode and preserve human accountability at the point of commitment, rather than to force a contract that papers over the contested terrain.

False confidence is perhaps the most dangerous failure. A well-structured agent with strong controls can still fail because the authoritative source is wrong, because retrieval quality is degraded, because the underlying model generalizes badly in a rare edge case, or because users overtrust polished language. Architecture does not eliminate uncertainty. It makes uncertainty more visible and more manageable. Enterprises that forget this will treat control as certainty and will be disappointed.

Political resistance is the last and least technical concern. Executable AI architecture affects permissions, deployment gates, workflow approvals, and operating boundaries. It changes who gets to decide. Teams accustomed to local autonomy may resist. The move from documentation to semantic control is a governance transition, not only a technical one. Traditional architecture could survive while being partially ignored because it often lived in advisory space. Executable AI architecture cannot be ignored because it shapes what is deployable, and that visibility is precisely what makes the transition contested.

7. Conclusion

AI does not make enterprise architecture obsolete. It makes weak enterprise architecture intolerable.

The enterprise cannot govern AI systems by relying on model selection, prompt craft, or retrospective review alone. Useful and governable AI requires structure around the model: enterprise semantics, bounded context, admissible sources, explicit decisions, executable specifications, control points, and feedback loops. That is architecture in the fullest sense.

This chapter has argued that the practical unit of this work is the agent contract: a governed specification that ties purpose, capability, policy, decision, source model, control logic, and evaluation into a form that both people and systems can use. Design decisions become visible as the primary hinge between business intent and runtime behavior. The Codex matters because agents need semantic context. Decisions matter because agents need execution choices that survive beyond conversation. Controls matter because bounded agency without verification is only optimism. Feedback matters because AI systems change conditions of operation continuously.

Thus, AI becomes valuable not when the enterprise adds a model to the landscape, but when the enterprise makes its own structure legible enough for bounded machine participation.

The more ambitious the enterprise becomes with automation, the less it can afford to leave architecture implicit. AI pulls enterprise architecture toward its future form: continuous, semantic, decision-aware, specification-driven, and executable.

The practical consequence for an architecture function planning its AI work is that the agent contract is the primary unit of investment. Building reusable source taxonomies, decision packs, and control mappings is not overhead before the real work begins. It is the real work. The agent contract takes its place alongside principles, standards, reference architectures, and blueprints as a typed Codex object: linked to the principles it inherits, the standards it conforms to, the reference architecture it instantiates, and the roadmap that sequences its rollout.

It is not a new asset category competing with the TOGAF building blocks; it is the specialized form those primitives take when the subject is an autonomous system with operational consequence. Enterprises that build this discipline early will carry it into every subsequent AI initiative, while those that defer it will find each new initiative relitigating governance questions that should already have been settled. The organizations that treat agent contracts as first-class architectural artifacts will produce governable AI; those that treat them as paperwork will produce AI that looks impressive in demonstrations and fails unpredictably in production. The choice is made long before any agent is deployed, and it is made by how seriously the architecture function commits to the discipline of the contract.

8. Sources