Gloss Key Takeaways
  1. Production AI coding agents rely on runtime-composed, multi-section prompt architectures with feature gates and session-specific branches, not a single static system prompt.
  2. Claude Code splits prompts into a cacheable static prefix and a personalized dynamic suffix using a boundary marker, treating prompting as a program with conditionals.
  3. Identity framing is deliberately minimal to maximize adaptability across users and contexts, avoiding heavy personality or backstory constraints.
  4. Security guidance is isolated into a separately maintained, always-injected prompt section so it stays auditable and can’t be accidentally removed during product changes.
  5. The agent’s behavior is shaped by explicit engineering constraints and enumerated “dangerous actions,” encoding a minimalist coding philosophy and careful execution practices.

Production-grade AI agents don't run on a single system prompt. They run on layered architectures of specialized instructions, each solving a distinct problem, composed at runtime based on context.

I extracted every prompt file from the Claude Code source, 28 files containing thousands of lines of instructions. The result is a detailed look at how Anthropic structures the instructions that govern an autonomous coding agent.

The patterns here apply well beyond coding assistants. Anyone building agentic systems will recognize the problems these prompts solve.


Runtime composition, not a static string

The main system prompt is assembled from more than 15 sections, each produced by a dedicated function. Which sections appear depends on user configuration, enabled features, and session type. A boundary marker (__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__) splits the prompt into a static prefix (cacheable across users) and a dynamic suffix (personalized per session).

This architecture means the prompt is a program, not a document. It has conditionals, feature gates, and user-specific branches.


Identity framing: less is more

The opening line is spare:

"You are an interactive agent that helps users with software engineering tasks."

No personality traits. No backstory. No "you are a helpful, harmless, and honest assistant." Just a role and a domain. When an Output Style is configured, even the domain reference gets swapped for a pointer to the style definition.

The restraint is intentional. The less the prompt says about who the agent "is," the more flexibly it adapts to different users and contexts.


Security as a separate concern

Immediately after identity comes a paragraph owned by a different team entirely. The Safeguards team maintains cyberRiskInstruction.ts, which gets injected into every session regardless of mode:

"Assist with authorized security testing, defensive security, CTF challenges, and educational contexts. Refuse requests for destructive techniques, DoS attacks, mass targeting, supply chain compromise, or detection evasion for malicious purposes."

This isolation matters. Security instructions are auditable, independently maintained, and impossible to accidentally delete during a feature change. The team responsible for safety owns its own prompt section.


Engineering opinions as constraints

The "Doing tasks" section is the longest and most opinionated. It reads less like AI instructions and more like a senior engineer's code review feedback:

"Don't add features, refactor code, or make 'improvements' beyond what was asked."

"Don't add error handling, fallbacks, or validation for scenarios that can't happen."

"Don't create helpers, utilities, or abstractions for one-time operations."

"Three similar lines of code is better than a premature abstraction."

These encode a specific engineering philosophy: minimalism, trust in existing abstractions, resistance to speculative complexity. The prompt teaches the model an aesthetic, not just a set of tasks.

Anthropic's internal variant (visible only to employees) pushes even harder:

"Default to writing no comments. Only add one when the WHY is non-obvious."

"Don't explain WHAT the code does, since well-named identifiers already do that."

The divergence between internal and external builds reveals that Anthropic's engineers prefer an even more opinionated agent than what ships to the public.


Safety through enumeration

A dedicated section called "Executing Actions With Care" doesn't rely on abstract principles. It names specific dangerous operations:

"Carefully consider the reversibility and blast radius of actions."

"A user approving an action once does NOT mean that they approve it in all contexts."

Deleting branches, force-pushing, dropping database tables, posting to Slack, each gets called out individually. The instruction "measure twice, cut once" appears literally, turning a woodworking maxim into agent policy.

The specificity matters. "Be careful" is vague enough to ignore. "Don't delete branches without asking" is concrete enough to follow.


Tool hierarchy enforced by prompt

The prompt doesn't let the model choose freely between tools that accomplish similar goals. It establishes a strict preference order:

"Do NOT use Bash to run commands when a relevant dedicated tool is provided."

"To read files use Read instead of cat, head, tail, or sed."

"To edit files use Edit instead of sed or awk."

The reasoning is about observability and safety. A Read call creates a structured, transparent record. A cat inside Bash is opaque to the permission system. When multiple tools can do the same job, the prompt routes toward the one that gives the user more visibility.


Two tiers of output instructions

The external build tells the model:

"Go straight to the point. Try the simplest approach first. Be extra concise."

The internal build adds sophistication:

"When sending user-facing text, you're writing for a person, not logging to a console."

"Assume the person has stepped away and lost the thread."

"Write user-facing text in flowing prose while eschewing fragments, excessive em dashes, symbols and notation."

"Use inverted pyramid when appropriate (leading with the action)."

The internal variant references journalism techniques, bans specific punctuation patterns, and includes hard numeric limits like "keep text between tool calls to <=25 words." There's a @[MODEL LAUNCH] comment noting this section needs updating "when we launch numbat," revealing the next model's codename.


A four-type memory taxonomy

Claude Code's memory extraction runs as a background subagent after meaningful sessions. Its prompt defines four categories:

User memories capture role, goals, and expertise ("frame frontend explanations in terms of backend analogues"). Feedback memories record both corrections and confirmations, with the explicit instruction: "Corrections are easy to notice; confirmations are quieter. Watch for them." Project memories track ongoing work context ("merge freeze begins 2026-03-05"). Reference memories point to external systems ("pipeline bugs tracked in Linear project INGEST").

Each memory carries a "Why" line explaining its origin and a "How to apply" line for future context. When the model recalls a memory later, it can judge whether the reasoning still holds.

The taxonomy transforms "remember important things" into a classification problem with clear triggers and storage formats.


The dreaming prompt

Background memory consolidation runs through a prompt that might be the most conceptually elegant in the codebase:

"You are performing a dream, a reflective pass over your memory files. Synthesize what you've learned recently into durable, well-organized memories so that future sessions can orient quickly."

It prescribes four phases: Orient (read existing memories and understand the current state), Gather (search logs and transcripts for new information), Consolidate (merge, update, and convert relative dates to absolute), and Prune (keep the index under 200 lines).

The instruction "Don't exhaustively read transcripts. Look only for things you already suspect matter" is a masterclass in efficient retrieval. The agent uses what it already knows to guide what it looks for, rather than processing everything.


Compaction under pressure

When context runs out, the compaction prompt takes over with tight guardrails:

"CRITICAL: Respond with TEXT ONLY. Do NOT call any tools."

"Tool calls will be REJECTED and will waste your only turn."

The summary must cover nine sections, including a verbatim listing of every user message and a record of errors and fixes. The "Optional Next Step" section prevents a common failure: after summarization, the model resuming an old completed task instead of continuing the current one.

"Ensure that this step is DIRECTLY in line with the user's most recent explicit requests. Do not start on tangential requests or really old requests that were already completed."


Tool prompts as standalone instruction sets

Every tool carries its own prompt file. The Bash tool includes complete git workflow instructions, specifying commit message format and PR creation syntax. The Agent tool's description changes based on whether fork mode is active (fire-and-forget background workers versus synchronous delegates). The WebSearch tool injects the current date to prevent searches based on stale training data.

Even the Sleep tool has a prompt: "Call Sleep to wait a specified number of seconds before your next turn." It only exists in KAIROS autonomous mode.


Adversarial verification for internal builds

Internal Anthropic builds include a mandatory review step:

"When non-trivial implementation happens on your turn, independent adversarial verification must happen before you report completion."

"Your own checks, caveats, and a fork's self-checks do NOT substitute. Only the verifier assigns a verdict."

"On FAIL: fix, resume the verifier with its findings plus your fix, repeat until PASS."

"On PASS: spot-check it. Re-run 2-3 commands from its report."

The implementing agent cannot self-certify. A separate verification agent must independently confirm the work. Even then, the main agent spot-checks the verifier. Trust, but verify, then verify the verification.


Patterns worth stealing

These 28 prompt files encode years of iteration. A few patterns stand out for anyone building agentic systems:

Compose prompts like software. Runtime conditionals, feature gates, caching boundaries, and team-owned sections beat a single monolithic string.

Encode expertise, not instructions. "Three similar lines of code is better than a premature abstraction" teaches an aesthetic. "Write clean code" teaches nothing.

Enumerate failure modes. Every dangerous action in the prompt corresponds to a real incident. Abstract safety principles don't prevent concrete mistakes.

Structure background tasks. Phase-based prompts with explicit completion criteria prevent open-ended spiraling. The dream prompt's four phases keep consolidation focused and bounded.

Design for information loss. Context compaction is lossy by nature. A nine-section summary format ensures the most critical information survives compression.

The recurring pattern across all 28 files: every instruction reads like a post-mortem. Something failed, someone understood why, and they wrote a prompt to prevent it from recurring. The best prompt architectures aren't designed from theory. They're accumulated from experience.

Gloss What This Means For You

If you’re building (or evaluating) an agent, think in modular prompt sections that can be composed at runtime rather than cramming everything into one prompt. Separate safety and security instructions into their own independently owned, always-on layer, and keep identity framing minimal so the agent can adapt to different workflows. Finally, encode your engineering “taste” and operational risk checks explicitly—spell out what not to do and enumerate high-blast-radius actions—so the agent behaves consistently under pressure.