Gloss Key Takeaways
  1. The real shift with coding agents is moving from manually prompting to designing automated loops that prompt, evaluate results, and repeat.
  2. In last year’s agent workflows, humans acted as the scheduler/while-condition; the goal now is to stop being “the thing in the loop” and instead write the loop.
  3. “Prompt engineering isn’t dead”—it’s been compiled into reusable prompt files and programs that replay them until a condition is met.
  4. A minimal loop can be as simple as a while-true bash script that reruns an agent against a fixed prompt/spec, but that alone is a demo, not a reliable system.
  5. A trustworthy loop must have an external, objective check (tests, builds, health checks) to decide “done,” not the model’s self-assessment, and it must have stopping/spending controls to avoid runaway cost.

A bash one-liner taped to the edge of a monitor above an empty desk chair, the terminal still running

Boris Cherny built Claude Code. He deleted his IDE in November. In a recent month, 100 percent of his contributions to Claude Code itself, 259 pull requests, were written by Claude Code. His description of how he works now is the line I keep coming back to: "I don't prompt Claude anymore. I have loops that are running. They're the ones prompting Claude and figuring out what to do. My job is to write loops."

Peter Steinberger compressed the same idea into a post that cleared two million views: you shouldn't be prompting coding agents anymore, you should be designing loops that prompt your agents. The replies turned into an argument about what that even means, and the most-quoted answer was "nobody knows but him and boris."

Plenty of people know. I have been running these for client work for months, and the concept fits in one sentence. A loop is a small program that prompts the agent, reads what came back, decides whether the work is done, and if not, prompts again. The interesting part is what that does to your job, and the two properties that separate a loop you can trust from a loop that quietly burns money.

The job moved up one level

Think about what you were actually doing in an agent session last year. The agent wrote the code, but you were still the scheduler. You read the output, judged whether it was good enough, and typed the next instruction. You were the component that decided "keep going" or "done." A human, sitting in a chair, performing the function of a while-condition.

That is the thing in the loop, and that is the thing to stop being.

The shift Cherny and Steinberger describe is an altitude change, the same one we have made before. We stopped writing assembly and wrote compilers. We stopped racking servers and wrote Terraform. Now we stop writing the code and write the thing that writes the code. The keystrokes leave, the judgment stays. Someone still decides what to build, what done means, and what the loop is allowed to spend getting there. That someone is you, just no longer in real time, one prompt at a time.

This is why I find the "prompt engineering is dead" framing useless. Prompting did not die, it got compiled. You write the prompt once, into a file, and a program replays it until a condition holds.

The whole idea is one line of bash

If the concept still feels abstract, run it once. This is the "ralph loop," popularized by Geoffrey Huntley:

while :; do cat PROMPT.md | claude -p --dangerously-skip-permissions; done

That is everything. Pipe the same prompt file into Claude Code in headless mode, let it work until it exits, start again. Each pass is a fresh session with no memory of the last one, so the agent reorients by reading the spec, the checklist, and the commits the previous pass left behind. No framework, no state machine. A while-true wrapped around a coding agent.

Obvious caution: that flag does exactly what it says, so this belongs in a throwaway worktree or a container, never your main checkout. And yes, the trivial thing works. A team at a Y Combinator hackathon shipped six repositories overnight on a ralph loop for about 297 dollars in API costs. Huntley built a small programming language with one.

But look at what the one-liner is missing. It never checks whether the work is done, and it never stops. Those two absences are not details, they are the entire engineering problem, and everything that separates a demo from something you can leave running while you sleep.

A loop is only as good as its check

A loop that writes code and never verifies it is a machine for producing confident mistakes at increasing speed. So the first property of a trustworthy loop: an external check decides when the work is done. Never the model's opinion of its own work.

The distinction is sharp in practice. "The agent says the refactor looks solid" is not a check. "npm test exits 0" is. "docker build succeeds, the container starts, and the health check returns 200" is. The loop continues on objective failure and halts on objective success, and the model's self-assessment never enters the decision.

Claude Code's /goal command is the productized version of this, and it is the front door I point people to now. You state a condition, the agent works in turns, and after each turn a separate small model, Haiku by default, reads the transcript and returns a plain yes or no. The model doing the work is not the model deciding it is done, which is the right instinct.

But there is a catch that decides whether your /goal actually works, and it took me one wasted afternoon to internalize. The judge does not run your tests or read your repo. It only reads the transcript. So your condition must be something the agent's own output can prove. "All tests in test/auth pass" works, because the agent runs the tests and the result lands in the conversation as a receipt for the judge to read. "The auth is solid" never resolves, because nothing in the transcript settles it. The real verification is still a real command exiting non-zero on failure. The judge just reads the receipt.

Before you run any loop, run the check yourself once and break something on purpose to confirm it fails. If the check cannot tell pass from fail, neither can the loop, and you have built an expensive random walk.

A loop is only as safe as its stops

The second property is less glamorous and more important: hard stops. Three of them, and I add all three before anything runs unattended.

A turn cap, because runaways happen. --max-turns 20 on the command, plus "or stop after 20 turns" stated in the condition itself.

A no-progress cap, because stuck happens more often than runaway. The simplest version is a wrapper or Stop hook that compares git diff --stat between passes and halts when nothing has changed for three rounds. An agent that is looping without progress will happily rephrase the same failed attempt forever, and every rephrase costs money.

A budget ceiling, because the other two can both fail. This one lives outside the loop entirely, a hard dollar limit on the workspace in your provider's console, set once. Uber capped engineers at 1,500 dollars per person per tool per month after burning through its annual AI budget in four months. The unbounded loop is a concrete hazard, a forgotten terminal turning into a four-figure invoice overnight.

The romantic pitch for loops is a thousand agents building your company while you sleep. The production reality is that most of the engineering effort goes into making sure the thing halts. I have come to read that as a feature. A loop with a real check and real stops is an engine. A loop without them is a billing event.

Where I would start

Not with orchestration. The frontier crowd is running loops that supervise other loops, twenty or thirty agents under a coordinator, and it makes for great screenshots. You do not need it, and starting there teaches you nothing about the two properties that matter.

Start with one bounded task where done is objective. A flaky test, a lint cleanup, a small migration. Prove the check by hand on a throwaway branch. Then one command: claude --max-turns 15 "/goal all tests in test/auth pass, npm test exits 0, or stop after 15 turns". Watch the first run end to end, watch the judge say "not done" and watch what the agent does with the failure. Set the spend cap before the first unattended run, not after the first surprise.

Then notice what changed. You did not type the fix. You specified the finish line, built the referee, and bounded the cost. That is the whole job now: deciding what done means, and making it checkable by something that is not the model. The loop is plumbing. The check and the stops are the work.

Cherny is right that this makes good engineers matter more, not less. The agent absorbed the keystrokes. What is left is exactly the part that was always hard, knowing what to build and how you would know it is built. The loop just forces you to write that down precisely enough that a program can act on it, which, thirty years in, is the most honest definition of engineering I have.

Gloss What This Means For You

If you’re using coding agents, start treating your workflow like automation: write a prompt/spec once, wrap the agent in a loop, and make the loop responsible for deciding whether to continue. Anchor “done” to external signals like test suites, successful builds, or deployed health checks rather than the agent’s confidence. Add guardrails—timeouts, budgets, and safe sandboxes—so you can let the loop run unattended without silently burning money or breaking your main checkout.