Skip to content

7. Deterministic guarantees: hooks, specialized agents, and external workflows

Some steps must happen — formatting, linting, type checks, secret scanning, tests — and "I asked the agent to remember" is not a strategy. This chapter is about the mechanisms that make those steps reliable: hooks when the harness offers them, specialized agents when it doesn't (or when judgment is needed), and external workflows (CI, GitHub Actions, git hooks) as the safety net. In practice you layer all three.

The same mechanisms also feed failures back to whoever is iterating: the agent itself in the inner loop (so it self-corrects without waiting for a human), and the human or upstream agent in the outer loop (so collaboration stays grounded in real signal, not in trust).

What you'll learn

  • The difference between "asking the agent to do X" and "making something else enforce X".
  • Three mechanisms for deterministic guarantees, and when each one fits.
  • How those mechanisms feed feedback into the inner loop (agent self-correction) and the outer loop (human/agent review).
  • Which tools support which mechanism today.
  • Common hooks worth setting up on day one, and how to debug them.
  • Why hooks act as sensors, not just automators.

Instructions are hopes, guarantees are configuration

Every team that adopts an AI coding agent goes through the same cycle. Someone writes in AGENTS.md or CLAUDE.md: "Always run pnpm lint after editing TypeScript." For a while it works. Then the model changes, the context gets long, the task gets urgent, and the instruction quietly stops being followed. The agent ships code that fails CI.

The lesson is simple: a system prompt is a suggestion the model is free to ignore under pressure. If a step must always happen, it cannot live in prose — it has to live in configuration that something other than the model controls.

Three ways to guarantee a step happens

There isn't one mechanism — there are three, with different trade-offs:

Option Determinism Feedback latency Where it fits
Harness hooks High — the harness runs them, not the model Immediate (same turn) When your tool supports them (Claude Code, Codex CLI, partial in Aider)
Specialized agent Medium-high — a unit task the model almost always executes correctly Same turn or next turn When hooks aren't available, or when validation needs judgment, not just a command
External workflows (CI, Actions, pre-commit) Highest — lives outside the agent entirely Late (PR / push time) As a safety net, or when the agent isn't in the loop (e.g. Copilot coding agent)

Each one closes a different loop:

  • Hooks close the inner loop: failure becomes context the agent reads on its next step, mid-turn, with no human involvement.
  • Specialized agents also close the inner loop, but a turn later — the orchestrator delegates "run the validator" to a focused subagent and reads its output.
  • External workflows close the outer loop: the human (or an upstream orchestrator) sees the failure during review, and the next iteration starts from there.

You don't pick one. You layer them: hooks for what the harness supports, specialized agents for what hooks can't express, CI as the final net.

Tool support today

Tool Event hooks Specialized sub-agents External workflow integration
Claude Code Yes (PreToolUse, PostToolUse, Stop, SessionStart...) Yes Yes (any CI)
Codex CLI Yes, equivalent Yes Yes
Cursor Partial (commands, no general event hooks) Limited Yes
Aider Limited (--lint-cmd, --test-cmd at end-of-turn) No Yes
GitHub Copilot CLI Yes (PreToolUse, PostToolUse, configured in .github/hooks/) — GA Feb 2026 Yes — custom agents + sub-agents, plus /fleet for parallel subagents Yes
GitHub Copilot (VS Code) Agent hooks in preview Custom agents Yes

This space is moving fast — GitHub Copilot in particular went from "no hooks, no subagents" to "both, including parallel fleets" inside a few months. Re-check your tool's docs every few releases. If your tool still doesn't expose hooks, the determinism has to live somewhere — push it into specialized agents and external workflows.

The principle is the same across tools

The harness enforces, the model requests. Whether "the harness" is a hook, a delegated subagent, or a CI job is an implementation detail.

Why hooks (when you have them) are the strongest option

Hook events you'll actually use

Most harnesses expose a similar event surface. The useful ones are:

  • PreToolUse — fires before a tool call. Can block it. Use for policy ("don't let the agent edit /etc") and redaction.
  • PostToolUse — fires after a tool call succeeds. Use for formatting, linting, and running fast tests on changed files.
  • PreCommit / PrePush — classic git hooks, still valuable. Secret scanning lives here.
  • SessionStart / SessionEnd — good place to log, snapshot state, or print the current branch and dirty files so the agent starts grounded.
  • UserPromptSubmit — fires when the user sends a message, useful for injecting context or blocking obviously dangerous asks.

Day-one hooks

Before you tune prompts, set up these four. They pay for themselves within a day:

  1. Formatter on edit. Prettier, Black, gofmt, rustfmt. Removes a whole class of pointless diffs.
  2. Linter / typechecker at end-of-turn. ESLint, Ruff, golangci-lint, tsc, mypy. Catches the agent's favorite mistakes (unused imports, shadowed variables, broken types) — but run them once when the turn closes, not on every edit (see "Per-edit vs end-of-turn" below).
  3. Secret scanner pre-commit. gitleaks or trufflehog. The agent will eventually paste an API key somewhere.
  4. Affected tests. Run the tests that touch changed files. Fast feedback, not full CI.

A sample settings.json snippet

Here's a Claude Code-style hooks configuration. Cursor and Codex use different schemas but the shape is similar.

{
  "hooks": {
    "PostToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          {
            "type": "command",
            "command": "npx prettier --write \"$CLAUDE_FILE_PATHS\""
          }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "npx eslint . && npx tsc --noEmit"
          }
        ]
      }
    ],
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "./scripts/deny-dangerous-commands.sh"
          }
        ]
      }
    ]
  }
}

The PostToolUse hook runs Prettier on any file the agent just wrote — cheap, idempotent, no failures. The Stop hook runs ESLint and tsc once when the agent finishes its turn, producing a single coherent verification pass instead of noise after every edit. The PreToolUse hook inspects bash commands and exits non-zero to block things like rm -rf / or git push --force to main.

Per-edit vs end-of-turn: when to run what

Not every check belongs on every edit. The right rule of thumb:

  • Per-edit (PostToolUse) is for formatters: Prettier, Black, gofmt, rustfmt, ruff format. They're cheap, idempotent, can't really "fail", and they keep the agent from ever seeing badly-formatted code. Running them on every write is the correct default.
  • End-of-turn (Stop / SubagentStop) is for linters, typecheckers, and tests: ESLint, mypy, tsc, pytest. They're expensive, noisy, and frequently fail mid-refactor — a symbol may be temporarily missing, an import temporarily broken. Running them once at the end of the turn over everything that changed is faster and produces a single, coherent feedback signal.

The useful rule: format per-edit, validate at end-of-turn.

Heavy validation per-edit hurts more than it helps

Running tsc after every Edit wastes tokens, slows the loop, and floods the agent with transient errors from work-in-progress code. The agent then "fixes" things that weren't broken — they were just half-done. Wait until the turn closes.

Hooks as sensors: how the agent learns about failures

A hook that runs is only half the story. The other half is: does the agent find out it ran, and what it found?

The answer lives in the hook's exit code:

  • Exit 0 — silent. The agent continues as if nothing happened.
  • Non-zero exit (or stderr) — the harness typically injects the hook's stderr back into the agent's context as a system message. The agent sees it as if it were tool feedback, opens the file, reads the error, fixes it, and retries. The verification loop closes without human intervention.
  • PreToolUse hooks can be blocking: a non-zero exit aborts the action before it happens. This is how guardrails like "don't rm -rf outside the workspace" actually work.

This is the bridge to chapter 1's "verification before handing back". A well-designed hook isn't just an automator — it's a sensor that turns side-effects into feedback the agent can act on. The end-of-turn tsc failure becomes the next turn's first task. The agent self-corrects.

The anti-pattern: hooks that only log

A hook that pipes output to /tmp/agent-hooks.log and exits 0 is decoration. Nothing flows back into the agent's context, so failures are invisible until a human reads the log. If you want the agent to react, the failure must be visible to it.

A hook isn't valuable because it ran

It's valuable because the agent noticed it ran and acted on the result. Wire the exit code with intention.

Blocking vs warning

Not every hook should fail the tool call. Roughly:

  • Block on policy violations, secrets, dangerous commands, broken syntax.
  • Warn (log but don't fail) on style nits, slow tests, or advisory lint rules.

A hook that blocks too aggressively trains the agent — and the human — to look for bypasses. A hook that only warns gets ignored. Pick the right severity per rule.

--no-verify is a smell

If you find yourself or the agent reaching for git commit --no-verify, the correct response is almost never "bypass the hook." It's either "fix the underlying problem" or "the hook is wrong, fix the hook." Bypassing is how guardrails rot.

Debugging hook failures

When a hook fails, resist the urge to skip it. Instead:

  1. Run the hook command manually in your shell with the same inputs.
  2. Check that environment variables the harness injects (like $CLAUDE_FILE_PATHS) are what you expect.
  3. Make the command idempotent — hooks that fail on re-run are painful.
  4. Keep hooks fast. A 30-second hook on every edit will destroy the flow.

Log, don't guess

Pipe hook output to a logfile (>> /tmp/agent-hooks.log 2>&1) during setup. You'll debug in minutes instead of hours.

Layering the three options in practice

A realistic setup for a TypeScript service might look like:

  • Hooks — Prettier on every edit; ESLint + tsc on Stop; gitleaks on pre-commit. These cover the things that should never be in doubt and can be expressed as a single command.
  • Specialized agent — a migration-validator subagent the orchestrator calls before any database change. It runs the migration in a throwaway DB, checks for destructive operations, and returns a structured report. A hook can't express that judgment; a focused subagent almost always can.
  • External workflows — GitHub Actions on PR: full test suite, Trivy security scan, performance smoke tests. Catches what slipped through, and is what the human reviews against in the outer loop.

Each layer feeds the next loop. Hooks self-correct the agent in seconds. Specialized agents catch what hooks can't, in the same session. CI catches what the agent missed and surfaces it to the human, who either fixes it or asks the agent to.

The bridge to harness engineering

This is where "using an agent" becomes "engineering a harness." Once you have hooks, specialized agents, and CI wired together — and they all feed signal back into the loop they belong to — you stop worrying about whether the agent remembered the rules. The rules aren't the agent's job anymore. That shift of responsibility, from prompt to plumbing, is the single biggest maturity jump a team makes.

Key takeaway

Determinism doesn't come from one mechanism. Hooks cover what the harness can express. Specialized agents cover what hooks can't. External workflows are the net underneath both. Each one feeds either the agent's inner loop or the human's outer loop — and a step that doesn't feed some loop isn't a guarantee, it's decoration.