The Maturity Model
The model separates three things people usually conflate: the Spine (what mode you work in), the Axes (how deep each capability runs), and the Scope (who or what is being assessed).
THE SPINE × THE AXES @ THE SCOPE (what mode you work in) (how deep each capability) (who/what is assessed)
L1 Prompting Verification Individual (workstation) L2 Prompt Engineering Context hygiene Codebase (repo) L3 Context Engineering Autonomy / leash Team (org) L4 Agentic Engineering Learning / compounding L5 Agentic Orchestration Cost & governanceThe Spine — five modes of working with AI
Section titled “The Spine — five modes of working with AI”Each level is defined by the one new skill that distinguishes it from the level below. Most people are “spiky” — high on a familiar repo, low on an unfamiliar one. Your level is your reliable default under pressure, not your best day.
L1 — Prompting · “Ask and accept”
Section titled “L1 — Prompting · “Ask and accept””You type a request and take what comes back. In coding this is “vibe coding”: you accept diffs unread and iterate by pasting errors back in. Real value — it raises the floor — but no repeatability, no review, no trust. Fine for throwaways, not production.
L2 — Prompt Engineering · “Reusable, structured asks”
Section titled “L2 — Prompt Engineering · “Reusable, structured asks””You write prompts deliberately: roles, few-shot examples, output-format constraints. You save and reuse prompts, and you read the diff before accepting. Still stateless — each prompt is an island with no memory, tools, or system around it.
L3 — Context Engineering · “Build the system that feeds the model”
Section titled “L3 — Context Engineering · “Build the system that feeds the model””You stop optimizing the string and start engineering the system around the model — curating the optimal set of tokens at each step. The guiding rule: the smallest set of high-signal tokens that maximize the odds of the outcome, because context is finite even with huge windows. A maintained CLAUDE.md, a curated tool/MCP set, deliberate context management. This is the hinge: prompt engineering is fine for demos; context engineering is what gets deployed.
L4 — Agentic Engineering · “Delegate, verify, stay in the loop”
Section titled “L4 — Agentic Engineering · “Delegate, verify, stay in the loop””The model now acts in a loop — uses tools, takes multi-step actions toward a goal — and you supervise. The job shifts from author to compute allocator + reviewer. The distinguishing skill is running agents against specs + verifiable success criteria, reviewing every diff, and never letting quality become optional. The milestone is inverting from mostly hand-writing code to mostly delegating-and-verifying it — without dropping quality.
L5 — Agentic Orchestration · “Coordinate fleets, close the loops”
Section titled “L5 — Agentic Orchestration · “Coordinate fleets, close the loops””You run multiple agents (orchestrator-worker, parallel subagents with isolated context), wire feedback loops (evals, LLM/agent-as-judge), and build learning loops (memory write-back, compounding instructions). The skill is decomposition, managing context boundaries across agents, and building systems that improve without you re-engineering them each cycle. Multi-agent burns far more tokens — maturity includes knowing when not to.
The Axes — five capabilities that deepen at every level
Section titled “The Axes — five capabilities that deepen at every level”The Spine tells you which mode you work in. The Axes tell you how well.
| Axis | What it measures | Immature → Mature |
|---|---|---|
| Verification (the differentiator) | Can the work check itself? | Eyeball it → tests/build/screenshot the agent runs itself → eval suites & LLM/agent-as-judge → end-state evals on held-out sets |
| Context hygiene | Signal-to-noise of what the model sees | Kitchen-sink session → clear between tasks → curated CLAUDE.md + just-in-time retrieval → subagents isolate exploration → compaction & structured notes |
| Autonomy / leash | Division of responsibility, set by risk | Human-in-the-loop → on-the-loop (monitor + intervene) → off-the-loop (autonomous, monitored). A longer leash is earned through proven reliability, set per-decision by risk — never global. |
| Learning / compounding | Does the system get better on its own? | Same mistakes repeat → corrections added to CLAUDE.md once → shared, git-tracked rules updated weekly → memory/eval results feed back automatically |
| Cost & governance | Is it efficient, safe, and in-bounds? | No idea of cost → cost-per-task awareness → token-efficient tool design → tier/permission discipline, secret scanning, sandboxed autonomy |
The Scopes — three things you can assess
Section titled “The Scopes — three things you can assess”The same Spine × Axes apply to three subjects. The Self-Assessment page has the full rubrics.
- Individual (workstation) — Where is this person on their craft journey? Signals are about habits and setup: do they keep a CLAUDE.md, review diffs, run agents against specs and verify, and what leash have they safely earned?
- Codebase (repo) — Is this repo ready for agents to work in it effectively? The most measurable scope — mostly binary, file-existence checks (CLAUDE.md, tests + CI, linter/formatter/types, docs, one-command setup, secret scanning, an eval harness). A repo’s score is the % of criteria passed.
- Team (org) — Does the organization amplify or waste its AI capability? A documented AI stance, healthy/AI-accessible internal data, a quality internal platform of shared harnesses, and whether evals/learnings feed back into shared assets.
The model on one card
Section titled “The model on one card”┌─────────────────────────────────────────────────────────────────────────────┐│ AGENTIC ENGINEERING MATURITY ││ ││ SPINE L1 Prompting → L2 Prompt Eng → L3 Context Eng → ││ L4 Agentic Eng → L5 Agentic Orchestration ││ (maturity = choosing the right level for the task) ││ ││ AXES Verification* · Context hygiene · Autonomy/Leash · ││ Learning/Compounding · Cost & Governance ││ (*can't exceed L3 on the Spine with bottom-tier Verification) ││ ││ SCOPE Individual (habits) · Codebase (file signals) · Team (DORA + trust) ││ ││ RULE You don't climb to autonomy. You earn it through verification. │└─────────────────────────────────────────────────────────────────────────────┘