Just another unique way to voice out.: The Coming Loop: Harness vs. Judgment in Agentic Coding

Armin Ronacher — Flask, Jinja, Rye, uv, the kind of resume that gets taken seriously when he opens his mouth on agentic engineering — published a short essay this morning called The Coming Loop. It is, in order, a description, a confession, and a warning. The description is what an agentic coding harness is. The confession is that he doesn't yet trust himself to work that way. The warning is that the working method he doesn't trust is going to win anyway.

The post will be widely misread as either Luddite ("loops are bad") or capitulationist ("loops are inevitable, so we should just do them"). Neither reading is right. Ronacher has the diagnosis exactly right and the prescription exactly wrong, and the gap between the two is the actual story.

The two-loop frame

Ronacher splits the agent loop into two. The inner loop is what every coding agent already does: the model calls a tool, reads the result, calls another, eventually emits a final answer. The outer loop is the harness: code that watches the inner loop, decides whether its "I'm done" is actually done, and if it isn't, injects a new message, opens a fresh session, hands the task to a different machine, or keeps the same session alive. The outer loop is what Boris Cherny is talking about when he says — quoted at the top of Ronacher's essay — "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops." The inner loop is the part where the model gets to be a model; the outer loop is the part where someone decides what the model is for. Ronacher's claim is that the outer loop is becoming the dominant abstraction, and that this is a serious change.

Five angles worth your attention

1. The ultracode problem is a structural symptom

Ronacher's specific complaint is that Claude Code with Fable — Anthropic's longer-horizon run mode — produces code he doesn't like. The reason isn't aesthetic. It's structural: the model is asked to work uninterrupted for thirty minutes, and in that window it accumulates a stack of small local defenses. A model that sees a malformed input adds a try/except. Twice adds a try/except with a fallback. A third time adds a type-checker plugin. The line that Ronacher borrows from Karpathy — that models are "mortally terrified of exceptions" — is the same complaint in a different register.

The structural problem is that each iteration of the loop only sees the last failure. The harness sees the whole run. The fix the model picks is shaped by what the model is, not what the system needs. Put that behind a loop and you get a system that is locally defensive and globally fragile. The code "works" in the sense that no individual request returns an error, but the invariants have rotted. If you have shipped anything substantial with a coding agent in the last six months, you have almost certainly seen this.

2. The harness is the new compiler

The standard answer to "the code is bad" is "the human reviews it." In a harness-operated loop, the human is not reviewing every line — the human is reviewing the loop. The harness is what decides when work is done, when a session is dead, when to escalate. The human has become a meta-reviewer: someone who reads the spec for the loop, not the output of the loop.

The historical analogy is the compiler. The 1970 assembler programmer reviewed every instruction. The 1990 C programmer reviewed the source. The 2010 Python programmer reviewed the function. Each jump was a step away from the artifact, on the input side. The harness is a similar jump on the output side. The programmer is no longer reviewing the thing the machine produced; the programmer is reviewing the thing the machine will produce when it runs on this input. The artifact is one indirection further away.

That is the change, and it is permanent. Most of the "we need to keep humans in the loop" rhetoric of the last six months is framed as "human reviews machine output." That is not the loop we are entering. The human is reviewing machine output produced by another machine that reviews machine output. The intermediate "human review" step is being consumed by an automation layer — a point this blog's own post on the bigger-models-hallucinate trilemma makes from a different angle.

3. The Pi pattern is more general than it looks

Ronacher is generous about Pi — the assistant-and-harness people are building on top of — but the pattern is the same everywhere. A queue of tasks, a machine that picks one up, a machine that judges whether the work is done, a machine that decides what to do next if it isn't. The pattern is not specific to coding: it is the same in research agents (run a subagent, judge the output, run another), in data engineering (run a pipeline, check the schema, re-run with a fix), in security (run a scan, triage findings, re-run with a different rule set), in software testing (run a fuzz, write a regression, re-run).

The harness abstraction is not an AI abstraction. It is a work management abstraction that has been waiting for a substrate cheap enough to instantiate it. The substrate just got cheap. Anything that was always done by a human supervisor — judging, retrying, escalating — is now a candidate for being done by code.

4. The "done" signal is the part that breaks

In the inner loop, the model says "I'm done" and a human reviews. In the outer loop, the model says "I'm done" and another machine reviews. Ronacher's worry is exactly right: when both sides of the conversation are machines, "done" stops meaning "the human is satisfied" and starts meaning "the verifier's predicate returned true." That is a strictly smaller — and strictly more reproducible — definition of done.

The harness future is not going to be a future of fewer definitions of done; it is going to be a future of more, each of them narrower. A "tests pass" verifier. A "type-check" verifier. A "lint clean" verifier. A "no secrets in diff" verifier. The harness is going to be the place where all of these predicates live. The thing that will be lost is the unifying definition of done — the one in a human head that answers "is this the right thing to ship?" That definition doesn't get automated. It gets omitted by the architecture.

5. The career question changes shape

This is where the post misses. Ronacher frames his unease as a personal matter of taste and comprehension; he is being too modest. The thing he is really describing is a change in who reviews engineers. In a harness world, code is reviewed first by a verifier (cheap, fast, narrow) and only then by a human (expensive, slow, broad). The verifier is not optional — the harness needs it to function. The engineer is now optimizing for two reviewers. The verifier's predicates become a language you have to learn.

That is a real career change. The next generation of senior engineers will be people good at writing code that satisfies narrow verifiers while still being right in the broader sense. The current generation is good at satisfying broad human review. The skills don't transfer cleanly, and the corollary is uncomfortable: the people who are good at this now may not be the people who were good at engineering — which is, for what it's worth, the same inversion the local-models-vs-frontier economics story describes from the cost side.

The original take

Ronacher's diagnosis is correct. The harness is becoming the dominant abstraction, the verifier is replacing the reviewer, and the inner loop's "done" is being consumed by a machine that doesn't share your definition of done. He is right that present-day hands-off harnesses produce worse code than what we shipped last autumn, and that the failure mode is amplification of local fixes — a structural property of the loop, not of any individual model.

Where I disagree is the prescription. He frames the response as a personal matter of whether to adopt the loop and a collective matter of how to retain judgment while we do. Both framings are wrong because the loop is not something you can opt out of. The loop is not a tool; it is a layer of the stack. You are already inside it, at a different level. If you don't write the harness for your own work, the harness will be written by someone else — the platform, the IDE, the framework, the team you inherit.

The career move the harness future actually rewards is to learn to be the person who writes the verifiers. The people in demand in 2027 are the ones who can write a verifier that says "this code is the right shape, not just the right type." The people in trouble are the ones who insist that the only legitimate review is human review, because the harness is not going to wait for them.

Ronacher is right that the future is uncomfortable. He is wrong that the discomfort is the story. The story is the redistribution of who gets to define done. The human reviewer is not being replaced by the harness; the human reviewer is being demoted to one verifier among many, and the harness is the new reviewer of record. That is a worse outcome for the people who were good at being the reviewer, a better outcome for the people who were good at writing the verifier, and a neutral outcome for the code itself, which has never cared who reviewed it.

What this means for you

If you ship code for a living: the verifier is coming for your review process. Start writing the predicates that will judge your code. If you don't, the IDE will.
If you are a senior engineer in 2026: the skill about to be worth the most is the ability to specify, in a form a machine can evaluate, what right looks like.
If you are a tech lead: the question for your team is who on it writes the verifier. If the answer is "nobody," a vendor will.
If you write essays like Ronacher's: the right follow-up is "here is the verifier I would write for the kind of code I want to ship."

What to do this week

# 1. Read the essay (it's short, ~5 minutes)
curl -sL --compressed "https://lucumr.pocoo.org/2026/6/23/the-coming-loop/" | lynx -stdin -dump

# 2. Pick one piece of code you shipped in the last month that
#    a coding agent touched. Count the local defenses.
#    (try/except, isinstance checks, redundant null guards, etc.)
#    If the count is high, the agent's inner loop wrote it.
#    If the count is rising over time, a harness is writing it.

# 3. Write down, in English, what "done" means for that piece
#    of code. Then write down, in code or pseudo-code, the
#    predicate a verifier would check. If the two don't match,
#    the verifier will over-trust or under-trust your work.

# 4. If you maintain a project: add ONE verifier predicate to
#    your CI that is not "tests pass" or "linter clean." It
#    should encode something a human reviewer would notice.
#    Examples:
#    - "no public function returns Optional without a docstring"
#    - "no dependency added with fewer than 1k stars"
#    - "no commit > N lines without a justification in the body"

# 5. Re-read Ronacher's essay. Notice that the question he ends
#    on is "how do we not abdicate judgment?" The honest answer
#    is: by writing the verifiers. Judgment that lives only in
#    a human head is judgment that will be omitted by the next
#    architecture.

Disclosure

Drafted with AI assistance from MiniMax-M3 under editorial direction. Primary source: Armin Ronacher's The Coming Loop essay, fetched 2026-06-23 via curl -sL --compressed returning 26,058 bytes of HTML. The essay was read in full and the post paraphrases its claims rather than quoting at length. Two short quotes are reproduced verbatim: the Boris Cherny epigraph ("I don't prompt Claude anymore...") and the "mortally terrified of exceptions" line that Ronacher attributes to Karpathy. The Karpathy line's original source is not given in Ronacher's essay and I have not independently verified it; reproduced as Ronacher uses it. The "ultracode," "Fable," "Pi," and "Claude Code" references are direct from the essay. The disagreement with Ronacher's prescription is my own editorial position, not a paraphrase of his. The two internal links point to prior posts; both URLs returned HTTP 200 when re-fetched 2026-06-23.

Sources

Armin Ronacher, The Coming Loop (primary): https://lucumr.pocoo.org/2026/6/23/the-coming-loop/ — verified live on 2026-06-23 via curl -sL --compressed returning 26,058 bytes of HTML. Published 23 June 2026. License: Creative Commons Attribution-NonCommercial 4.0 International.
Related tutorialoflife.blogspot.com post on the trilemma that explains why bigger models can regress on narrow tasks: Bigger Models Hallucinate More. The Trilemma Explains. — verified live, returned 200.
Related tutorialoflife.blogspot.com post on the local-models-vs-frontier cost story: Your Local Model Is Faster Than Google and Cheaper Than OpenAI — verified live, returned 200.

Just another unique way to voice out.

Tuesday, June 23, 2026

The Coming Loop: Harness vs. Judgment in Agentic Coding