Architecting Determinism: Taming AI Agents with Skills, Context, and Binary Assertions

If you are a principal engineer building strict, multi-tenant architectures, skepticism toward AI coding agents is not just healthy—it is a survival requirement.

Large Language Models are fundamentally probabilistic guessing machines. Ask one to write a Go microservice using Domain-Driven Design, and there is a high probability it will leak a database/sql import directly into your pure domain layer. Ask it to scaffold a React component, and it will happily inline raw CSS into a codebase governed by a design system. The outputs look correct. They compile. And they silently violate every architectural boundary you spent months enforcing.

Historically, engineers tried to fix this by stuffing a massive system prompt full of rules: "Never import infrastructure packages into the domain layer. Always use the design system tokens. Follow this 50-page DDD reference." This approach fails catastrophically. It blows up the context window, degrades the model's reasoning capacity, and drives up API costs with every single tool-call iteration.

But an emerging standard for agent capabilities, combined with a recent paradigm-shifting discovery from Vercel Labs, finally gives us the blueprint for taming non-determinism. Here is how modern agent architectures actually work, why the way we used to manage context was fundamentally wrong, and how to apply this to real Go and React projects.


The Token Trap

AI coding agents operate on a continuous tool-calling loop: Observe → Think → Act. The agent reads your codebase, reasons about the task, generates code, runs a linter, reads the errors, and iterates. Every cycle, you pay for the entire accumulated context window again.

This is the "Token Trap." As conversations grow, token costs compound exponentially. Every file the agent reads, every error message it processes, every previous response it must recall—all of it inflates the context window. Stuffing hundreds of lines of architectural rules into that window from the start is fiscal and cognitive suicide.

To escape this trap, the industry coalesced around a standard that treats agent capabilities as portable, on-demand folders—commonly called Skills. The idea was elegant: hide your massive framework documentation, DDD standards, and style guides inside discrete Skill packages. When the agent needs to know how to structure a Go handler, it invokes the relevant skill, reads the docs, and writes the code.

The theory was clean. The results were not.


The Plot Twist: Vercel's Discovery

When Vercel Labs evaluated how to teach coding agents the Next.js App Router API, they ran controlled experiments comparing different documentation strategies. The results exposed a fatal flaw in the on-demand Skill model.

Skills—documentation loaded only when the agent decides to look it up—maxed out at a 79% pass rate. Without explicit instructions telling the agent to use them, that number dropped to 53%. That is the exact same pass rate as having no documentation at all.

The reason is pure non-determinism. By placing documentation inside a skill that must be invoked, you force the LLM to make a probabilistic decision: "Should I look this up, or do I already know enough to just write the code?" More often than not, the model chooses confidence over caution. It guesses. And it guesses wrong 47% of the time.

The breakthrough came when Vercel tried a radically different approach. They took the documentation, compressed it into an ultra-dense 8KB index, and injected it directly into a persistent AGENTS.md file—a file the agent reads automatically at the start of every session.

The pass rate jumped to 100%.

The takeaway was profound: the decision of whether to consult the rules must be removed from the LLM entirely. The rules must simply be there, always, passively loaded into the context window before the agent writes a single line of code.


The Hybrid Model: Passive Context vs. Action Skills

To achieve what I call "bare-minimum determinism," we must split the agent's brain into two strict layers. One layer handles knowledge. The other handles verification.

LayerImplementationPurposeThe Skeptic's Advantage
Passive ContextAGENTS.md (System Prompt)Broad, horizontal knowledge that is always present.Removes the LLM's "decision to look up." Architectural rules are loaded before reasoning begins.
Action Skills.agents/skills/Vertical, deterministic verification workflows.Offloads heavy lifting to bash and Go scripts. The agent orchestrates; it does not guess.

Layer 1: The Passive Context

The AGENTS.md file is your compressed architectural constitution. You do not dump 50 pages of DDD philosophy into it. You write an ultra-dense index—pointers, constraints, and file-tree references that the model can use to navigate your codebase with precision.

The critical insight is this: if an LLM knows a file exists in the file tree, it will not hallucinate the contents. It will read the file. Your job is to make the architecture's shape visible at a glance.

# AGENTS.md — Architectural Index

## Go Backend (`/internal`)
- Domain layer (`/internal/domain/`) is PURE. Zero external imports.
  Allowed: stdlib only (errors, fmt, time, context).
- Application layer (`/internal/app/`) orchestrates domain logic.
  Depends on: domain interfaces. Never concrete infra.
- Infrastructure (`/internal/infra/`) implements domain interfaces.
  Database, HTTP clients, third-party SDKs live here exclusively.

## React Frontend (`/web`)
- All UI components MUST use design system tokens from `@/theme`.
- Raw CSS, inline styles, and arbitrary Tailwind values are forbidden.
- Component API contracts live in `/web/src/types/components.ts`.

## Reference Files
- DDD Glossary: `/docs/ddd-glossary.md`
- API Contracts: `/docs/api-contracts.yaml`
- Error Taxonomy: `/docs/error-codes.md`

This costs roughly 300 tokens. It is always present. The agent never has to decide whether to look it up.


Layer 2: The Action Skills (Deterministic Assertions)

Here is where the architecture becomes truly powerful. Instead of using Skills for documentation lookups—which we have proven fails—we use them for binary verification.

You cannot prompt away non-determinism. But you can sandbox it. The flow becomes:

  1. The LLM generates code (non-deterministic).
  2. The agent triggers a deterministic Skill.
  3. A bundled script (Go AST parser, ESLint rule, shell command) outputs a binary Pass or Fail.
  4. On failure, the LLM reads the structured error output and automatically fixes the code.
  5. The cycle repeats until the assertion passes.

We bound the non-deterministic output with deterministic rules across three layers:

Structural Assertions — A react-design-audit skill that bundles an ESLint script. It parses the generated component and fails if it finds raw CSS values, inline styles, or imports outside the design system. The LLM does not get to decide if it followed the style guide; the script tells it.

Architectural Assertions — A go-boundary-assert skill that uses Go's go/ast parser to walk the import declarations of every file in /internal/domain/. If it finds a single import path that is not part of the standard library, the skill exits with a non-zero code and prints the offending file and line number. The LLM cannot argue with an AST.

Execution Assertions — A go-tenant-test skill that runs a focused unit test. It injects a JWT with Tenant ID A, calls the service method, and asserts that the query is scoped exclusively to Tenant A. If the generated code accidentally queries across tenant boundaries, the test fails, and the agent sees the assertion diff.

# .agents/skills/go-boundary-assert/skill.sh
#!/bin/bash
set -euo pipefail

echo "🔍 Scanning domain layer for illegal imports..."

VIOLATIONS=$(go run ./tools/ast-boundary-checker \
  --target ./internal/domain/ \
  --allowed "errors,fmt,time,context,strings,strconv")

if [ -n "$VIOLATIONS" ]; then
  echo "❌ FAIL: Domain boundary violated"
  echo "$VIOLATIONS"
  exit 1
fi

echo "✅ PASS: Domain layer is clean"
exit 0

The beauty of this model is that it converts a probabilistic problem into an engineering problem. The LLM is free to be creative in its implementation. But the assertions are immovable walls. The code either passes or it does not. There is no negotiation.


Putting It Together

To apply this hybrid architecture to a real Go and React project, your repository adopts this shape:

my-saas-project/
├── AGENTS.md                         # Ultra-compressed architectural index
├── .agents/
│   └── skills/
│       ├── go-ddd-scaffold/          # Bash scripts to safely mkdir domain layers
│       │   └── skill.sh
│       ├── go-boundary-assert/       # AST checker enforcing clean architecture
│       │   ├── skill.sh
│       │   └── tools/
│       │       └── ast-boundary-checker.go
│       ├── react-design-audit/       # ESLint/PostCSS validator for design tokens
│       │   └── skill.sh
│       └── go-tenant-test/           # Multi-tenant isolation assertions
│           └── skill.sh
├── docs/
│   ├── ddd-glossary.md               # Referenced by AGENTS.md, read on demand
│   ├── api-contracts.yaml
│   └── error-codes.md
├── internal/                         # Go DDD backend
│   ├── domain/                       # Pure. Zero external imports.
│   ├── app/                          # Application services
│   └── infra/                        # Infrastructure implementations
└── web/                              # React frontend
    └── src/
        ├── theme/                    # Design system tokens
        └── components/               # UI components

The AGENTS.md gives the agent spatial awareness of this tree before it writes a single line. The Skills enforce the rules after it writes. Between these two layers, non-determinism is bounded—not eliminated, but caged within walls that the LLM cannot break through.


The Takeaway

AI agents are not going away. But treating them as autonomous black boxes that will magically follow your architecture is engineering negligence.

The solution is not to fight the non-determinism. It is to architect around it. Compress your standards into a passive, always-present context file. Then build deterministic assertion scripts that verify every output the agent produces. The LLM generates; the assertions validate; the cycle converges on correctness.

You do not need to trust the model. You need to trust your tests.