Programming guides for beginner...
Any comments are welcomed....
I hope it helps!!! Thanks for drop by...

Wednesday, July 1, 2026

Godot Banned AI Code. Maintainers Are Done Subsidizing Slop.

The Godot Foundation, which maintains the open-source game engine behind Slay the Spire 2 and The Case of the Golden Idol (per PC Gamer's coverage of the announcement), has updated its contribution policy to forbid AI-authored code, AI-submitted pull requests, and AI-generated text in human-to-human communication. The Foundation framed the change in unusually direct language: "AI cannot take responsibility, and we can't trust heavy users of AI to understand their code enough to fix it." The line that lands is the part about mentoring. The Foundation says reviewing AI slop is "demoralizing" because the maintainers' feedback is "just being absorbed by a machine and not going towards mentoring a potential future maintainer." This is not a moral panic about AI quality. It is a maintenance-economics statement. Open source has been subsidizing itself on a pipeline of new contributors who learn to maintain by getting their early PRs reviewed. AI slop has crowded that pipeline out, and Godot has decided the cost of waiting for the tools to mature is more than the cost of banning them.

What the policy actually forbids

The Foundation's announcement post lays out four explicit prohibitions, with the first one already enforced as an auto-ban on the GitHub repository:

  • No autonomous AI agent use or vibe coding. The Foundation describes the existing auto-ban as continuing.
  • No use of AI to generate substantial pieces of code. "AI assistance should be limited to menial things (like code completion, regex, or find and replace)." Disclosure is required even for permitted use.
  • No AI-generated text in human-to-human communication — issues, PR descriptions, proposals, comments. "This is a basic principle of respect." Machine translations of human-written text are still acceptable.
  • All PRs must be reviewed and approved by a human before merging — the existing rule, restated explicitly.

The third item is the one most other projects have not yet written down. Slack/Discord AI summaries, ChatGPT-polished issue reports, and LLM-generated PR descriptions are the things that quietly make every maintainer interaction feel like talking to a machine. The Foundation is putting that on the policy page.

The Foundation also added a non-AI-specific gate: new contributors (defined as anyone with three or fewer merged PRs) cannot submit "new features or significant re-factoring" without explicit permission from a maintainer. Bug fixes and documentation come first. The point is to require that new contributors take the time to learn the codebase and build trust before tackling ambitious work. Combined with the AI ban, the policy amounts to a two-pronged defense: it slows down the inflow of low-context, high-volume submissions, and it explicitly routes the remaining inflow into the kind of work that builds future maintainers.

The economic argument underneath the moral one

The part of the post that every other story is going to skip is the maintenance-economic one. The Foundation describes its reviewer pool as "small" and says reviewing PRs is "demanding" and "we can't keep up with everything coming in." The number of open Godot PRs has become a meme inside the community, in the way that GitHub-backlog screenshots of any sufficiently popular repo do. The Foundation's framing of the AI problem is not "the code is bad." It is "the code is fine, the volume is bad, and the volume of the kind of code that trains reviewers is what is collapsing."

This is the same shape as the Fedora AI agent merging bad code story from three weeks ago, but with the failure mode inverted. Fedora's problem was that the agent had been given write access to a real codebase and the merge was wrong in a way the humans downstream couldn't see. Godot's problem is upstream: the PR volume is generated by humans (or agents acting on behalf of humans) who are not investing the time to learn the codebase before contributing, and the maintainers are the ones paying the cost. Both stories end in the same place — a maintainer pipeline that cannot scale linearly with the volume of submissions it receives. AI is the new scaling tax on the attention budget of every maintainer in the world.

The Foundation's "new contributors with three or fewer merged PRs cannot submit new features" gate is the more interesting policy lever, because it operates independently of the AI question. Even if the AI ban disappeared tomorrow, the new-contributor gate would still be there, and it is the part of the policy that directly addresses the maintenance-economics problem. The gate is also a soft version of the same argument that the Norway elementary AI ban made about a different pipeline: that the cost of skipping the human learning step is paid later, by the people who are supposed to be the next generation of maintainers. The Norwegian case was about children; Godot's case is about new open-source contributors. The mechanism is identical — short-term productivity gains that look like a win, that turn out to be a loan on the future of the project.

The AI-slop precedent that led here

Godot is not the first open-source project to draw this line. It is the highest-profile one to do it formally, with a published policy and an explicit auto-ban. The pattern in the months leading up to this announcement reads as a series of warning shots:

  • RPCS3, the popular PS3 emulator, clamped down on AI submissions, telling contributors to "leave behind something useful to humanity when you're gone, instead of peddling slop." (PC Gamer)
  • s&box, the Garry's Mod sequel, launched with creator Garry Newman's permissive AI policy: "I think eventually the slop will just fall to the bottom," he said. "We can't say don't use AI, because we use AI in our coding all the time. It's useful, it's fast." The framing was permissive — trust the community to ignore slop, don't filter at the gate. (PC Gamer)
  • The Fedora AI agent story in June (the Anaconda package that was reverted after an LLM agent merged its own PR with a buggy fix) was the moment "AI agent wrote code that broke the build" became a documented, post-mortem-able category.

What Godot is adding is the policy template. The Foundation's text is going to be copy-pasted, with varying degrees of modification, by other projects over the next quarter. The decision to call out the "AI cannot take responsibility" line is the giveaway that the policy is written to be quoted, not just enforced. It is the most quotable sentence in the AI-and-open-source debate since the npm "Color.js" incident in 2022, and it is going to do the same work.

What Godot is not saying

The Foundation's post is conspicuously quiet on the licensing question. Godot is MIT-licensed, which means anyone can fork it, build a closed-source game on top, and use whatever tooling they want to do it. The Foundation cannot stop a game studio from using Claude Code to build their next Godot project, and they are not trying to. The policy is about contributions to the engine itself, not about downstream use. This is a boundary other open-source projects will have to draw carefully: the line between "we will not accept your AI-generated PR" and "we will not allow our software to be used downstream with AI tools" is the line between a contribution policy and a use policy, and they are different in ways that matter legally. The Godot policy is firmly on the contribution side of that line.

The Foundation is also not saying AI tools are bad for the maintainers themselves. "Menial things" — code completion, regex, find-and-replace — are explicitly fine. The line is at "substantial pieces of code" and at "vibe coding," which the Foundation defines as the workflow where a human submits a PR whose contents they did not write and cannot defend. The policy is hostile to the unaccountable submission, not to the tool. A maintainer using Copilot to write a regex is not the target. A contributor submitting a 500-line PR they cannot explain to a reviewer is.

The third thing the Foundation is not saying is that this is just a code-quality problem. The story of an autonomous agent in production that ran up a $6,531 AWS bill scanning a hobby network nobody asked it to scan is a different shape of the same problem: an agent operating without a human accountability loop did something its operator could not have intended and could not stop. Godot's policy is the contribution-side answer to the same question — what do you do when the bottleneck of trust is no longer the human's hands but the human's understanding? The Foundation's answer is to require that the human who submits the work be the human who understands it. The cost of not requiring that is a maintainer pool that runs out of new entrants, and a contributor pool that runs out of mentors, and an open-source economy that runs out of the people who keep it going.

What this means for you

If you maintain an open-source project:

  • The Godot text is the best starting template you'll find. Adapt the four prohibitions and the new-contributor gate to your own repo, and be explicit that "AI-generated text in issues/PRs" is a separate rule from "AI-generated code." The text rule is the one that will get the most pushback, and it is the one that needs to be the clearest.
  • The new-contributor gate does not require an AI ban to be useful. If you are drowning in new-feature PRs from people who have not yet learned the codebase, the gate is a structural fix that works regardless of how the PRs were written. Three merged PRs is a reasonable threshold; pick yours based on what your reviewers can absorb.
  • Publish the policy in the contribution guide, not just the announcement post. The reason the Godot post is going to be cited is that it is unambiguous. Ambiguous contribution policies get argued about on every PR.

If you are an AI-using developer who contributes to open source:

  • "Use AI for menial things" is more permissive than it sounds. It covers most of what most people actually use Copilot/Cursor/Claude Code for: function signatures, regex, boilerplate, refactor-mechanical-tasks. The thing it does not cover is the workflow where you prompt an agent, get a 500-line PR, and submit it without being able to defend each section in a code review. The test is not "did a model help?" It is "can you walk the maintainer through it?"
  • If you are using an agent to submit a PR, write the PR description yourself. Machine translations of human text are explicitly fine; machine-generated text in human-to-human communication is not. The Foundation is making a sharp distinction between "the model wrote the code" and "the model wrote the words we say to each other about the code," and the second is the one that breaks the mentoring relationship.
  • Disclosure is the new courtesy. "I used AI to help write this regex" is a sentence that costs nothing and protects the maintainer's time. "I used AI to generate the whole function" with no disclosure is the kind of thing that gets the next Godot policy written in the first place.

If you are a maintainer of a private codebase at work:

  • The Godot policy is the canary, not the rule. Private repos are not the AI-slop-pressure target the way open source is, because the review pool is paid and the volume is bounded. But the mentoring argument applies. If the people you are training to be senior engineers next year are doing their work this year by submitting LLM-generated code they cannot defend, you are spending 2026's mentoring budget on 2027's productivity cliff. The lever is the same: the test is not "did a model help?" It is "can they walk you through it?"

What to do this week

# 1. Audit the last 20 pull requests on your repo. For each one, ask:
#    - Did the contributor write a PR description in their own words,
#      or did it read like ChatGPT output?
#    - When you left a review comment, did the next reply engage with
#      the substance of your feedback, or did it read like an LLM
#      smoothing the conversation?
#    - Could the contributor explain the change in 5 minutes on a call?
#    Count the "no" answers. If more than half are "no", your pipeline
#    is already paying the Godot tax.

# 2. Write a one-paragraph contribution policy. The Godot template is:
#
#    "We do not accept AI-authored code, AI-submitted pull requests,
#     or AI-generated text in issues, PR descriptions, or comments.
#     AI assistance for menial tasks (code completion, regex, find and
#     replace) is fine, with disclosure. New contributors (3 or fewer
#     merged PRs) should start with bug fixes and documentation.
#     All PRs must be human-reviewable from top to bottom."
#
#    Adapt the threshold (3 PRs is Godot's; yours may be 1 or 5) and
#    post it in CONTRIBUTING.md.

# 3. Pin the policy to your repo's contributing guide *and* link it
#    from the PR template. A policy in the docs is a policy. A policy
#    in the PR template is the policy the contributor is reading at
#    the moment they would otherwise copy-paste the LLM output.

# 4. If you are an AI-using developer who wants to keep contributing:
#    write the PR description yourself. Every time. The 5 minutes it
#    costs you is the difference between a maintainer seeing you as a
#    future maintainer and a maintainer closing the tab.

The Godot Foundation has, for the moment, the strongest contribution policy on AI in any major open-source project. It is going to be quoted, copied, and litigated over the rest of the year. The part worth holding onto is not the ban — bans are easy to write and easy to argue about. The part worth holding onto is the mentoring argument. The Foundation is not saying "AI code is bad." It is saying "AI code, submitted uncritically, breaks the pipeline that produces the people who can review AI code in five years." That is a maintenance-economics argument, and it is one every project that depends on unpaid maintainer labor is going to have to make for itself, sooner rather than later.

Disclosure

Drafted with AI assistance (Claude, Anthropic). All factual claims about the Godot Foundation's contribution policy were verified against the primary source at https://godotengine.org/article/contribution-policy-2026/ and PC Gamer's coverage at the URL listed in Sources, both fetched on 2026-07-01 with curl --compressed. The quoted "AI cannot take responsibility" and "demoralizing" lines are direct quotes from the Foundation's announcement. The "three or fewer merged PRs" figure is taken directly from the announcement. The "Slay the Spire 2" and "Case of the Golden Idol" examples are from PC Gamer's coverage. Internal-link targets are existing posts on this blog. The original argument — that the Godot policy is a maintenance-economics statement about a maintainer pipeline being outbid by AI slop volume — is the author's framing, not a claim sourced from any single article.

Sources

Claude Sonnet 5: Anthropic's Quiet 30% Tokenizer Price Hike

Anthropic launched Claude Sonnet 5 on 30 June 2026 at $3 per million input / $15 per million output tokens, with a one-third discount to $2/$10 through 31 August and a Pareto-frontier pitch that the new model "covers a much wider range of cost-performance options" than Sonnet 4.6. The HN thread hit 813 points and 459 comments inside a day, and the loudest complaint in it is one the launch post does not address. Sonnet 5 ships with a new tokenizer that produces approximately 30% more tokens for the same text. At the same headline price, a 30% token expansion is a stealth price hike. The launch's "introductory pricing" through August is a window for buyers to be trained on a price that disappears two months from now, when the real bill starts arriving. The post you should be writing about Sonnet 5 is not "Anthropic's new workhorse." It is "Anthropic raised prices and used a tokenizer change to do it."

The numbers behind the headline

The price comparison Anthropic's launch post invites you to make is the wrong one. Sonnet 5 lists at $3/$15, the same as Sonnet 4.6; Opus 4.8 lists at $5/$25, the same as Opus 4.7. The launch chart shows Sonnet 5 covering the price band that 4.6 used to occupy, with medium-effort Sonnet 5 sitting "well below" Opus 4.8 in cost and "above" Opus 4.8 in capability at xhigh effort. That story is accurate on the chart's axes. The chart's axes are wrong.

The right axis is cost per task, not cost per token. Artificial Analysis ran Sonnet 5 against its standard suite ahead of launch and published the result on 30 June. The headline number: Sonnet 5 costs $2.29 per task on the Intelligence Index, roughly 2x more than Sonnet 4.6 and 15% more than Claude Opus 4.8 at standard pricing. The 2x increase is "driven entirely by increased token usage" — Sonnet 5 uses ~40% more output tokens per Intelligence Index task than 4.6, and ~3x the agentic turns on AA-Briefcase and GDPval-AA. The 15% gap versus Opus 4.8 is the part the launch's Pareto chart does not show you, because the chart cuts off before the comparison gets embarrassing. Once you account for the token expansion and the higher per-task turn count, the model that was supposed to be "between Sonnet 4.6 and Opus 4.8" costs more per task than the model above it.

The promotion masks the real number for the rest of the summer. Through 31 August, $2/$10 is the standard price, not a discount; the launch page describes it as "introductory pricing" that "moves to standard pricing at $3/$15" on 1 September. Two months of buyer behavior will be trained on a price that no longer exists. When the promo expires, anyone who integrated Sonnet 5 into a per-token budget forecast is going to discover that the model they actually bought costs ~2x what 4.6 did on the same workload. Anthropic knows this. The promo is the launch.

What the new tokenizer actually changes

The footnote in the system card that nobody on the launch thread is quoting in full is this: "Claude Opus 4.7 and later Opus models, Claude Fable 5, Claude Mythos 5, Claude Mythos Preview, and Claude Sonnet 5 use a newer tokenizer that contributes to their improved performance on a wide range of tasks. This tokenizer produces approximately 30% more tokens for the same text." Sonnet 5 is the first time the new tokenizer is being introduced to the Sonnet line. (Fable and Mythos are export-restricted and not in general availability, so for most developers Sonnet 5 is the first model where the change shows up in their bill.) Anthropic's footnote estimates a 1.0–1.35x token expansion depending on content type; coding-heavy workloads sit on the high end.

The new tokenizer is a deliberate trade: more tokens per unit of text in exchange for the "most agentic Sonnet model yet." The launch post does not price the trade. A 30% token expansion at the same per-token price is a 30% effective price hike. The launch calls the new price "the same." Both statements are technically true. They are also in tension, and the launch picks the framing that flatters the model.

The HN commenter who put it most directly was ianberdin, who runs playcode.io and benchmarks every Anthropic release against his own product workload: "Anthropic outsmarted everyone again. They released Sonnet 5 with a temporary price reduction until August. Everyone was excited, but in reality, they increased the tokenizer size by 50%. As a result, the actual cost went up by 50%, they shifted everyone's attention to decrease." The 50% number is his workload, not the system card's 30% — but the shape of the argument is correct. Sonnet 5's headline price is a number that no longer corresponds to what the model actually costs to run on a coding task.

The Pareto frontier, redrawn honestly

The launch post's strongest case for Sonnet 5 is the cost-performance curve: at low and medium effort levels, Sonnet 5 delivers most of Opus 4.8's quality at a fraction of the per-token price, and that's a position Sonnet 4.6 could not hold. The chart is right about that. The chart is wrong about the upper half.

At medium and high effort, Sonnet 5 is in a tight price band with Opus 4.8 on the same task; at xhigh effort, it costs roughly the same as Opus 4.8 on agentic search and computer-use benchmarks, with mixed results. The launch's framing of "Sonnet 5 covers a much wider range of cost-performance options than Sonnet 4.6" is correct, but the new range now extends into a region where Opus 4.8 is a strictly better buy. The cost curve crosses itself somewhere around medium effort: below the crossover, Sonnet 5 wins on cost-per-quality; above it, Opus 4.8 wins on both axes.

The HN community reading of the chart converged on the same shape. The most upvoted top-level comment was a direct ask: "I'm struggling to understand why I'd ever use this instead of just using a lower effort level for opus given on many of the benchmarks listed the cost per task rises above opus at anything higher than medium effort." The second-most-upvoted answer was even more direct: "Generally run Sonnet on low, otherwise use Opus." That is not the front-page positioning Anthropic is going for, and it is the honest read of the cost curve. The community's working theory for production is the spec/plan-with-Opus, implement-with-Sonnet split several comments named. The cost saving is real, but it is the saving you get by routing the right task to the right model — not the saving the launch chart implies you get by using Sonnet 5 everywhere.

Where Fable was, and the gap that Sonnet 5 is filling

A second pattern in the HN thread is the volume of "we want Fable" comments, which outnumber the "Sonnet 5 is great" comments at the top. Fable 5 and Claude Mythos Preview are higher-capability models not generally available due to export-control restrictions; they were scheduled for general release in mid-2026 and remain restricted. Sonnet 5 is in part the model you ship when the model you actually wanted to ship is not available. The launch does not say this in so many words, but the timing is suggestive: a flagship model launch, in the same month as the Fable export-control discussion has been going on, with a name that jumps from 4.6 to 5 to claim a capability-anchor slot, and with a Pareto curve that does not extend as far as the model the company actually wanted to ship this quarter would have extended it.

The reframe the launch post invites — "Sonnet 5 narrows the gap with Opus 4.8" — is true in the direction it points, but the gap is a gap left by Fable. The most capable model Anthropic has shipped to general availability in 2026 is Opus 4.8 (March), and Sonnet 5 is the model that arrives three months later to fill the developer-tier slot next to it. Calling that "the most agentic Sonnet" is a Sonnet-line achievement, not a frontier achievement. The frontier model — Fable 5, or Mythos 5 — is still gated.

Where the new model actually loses

Two external benchmarks from the launch day put Sonnet 5 behind competitors in the same price band. A third-party proofreading benchmark reported Sonnet 5 as "definitely better than Sonnet 4.6, but inferior on both quality and cost to GLM 5.1, GLM 5.2, Gemini 3.1 Flash, and Gemini 3.1 Pro." aibenchy.com's broad comparison put Sonnet 5 at "GLM-5.2 level, at 2x cost, but also 2x faster" — defensible for latency-sensitive workloads, indefensible for cost-sensitive ones. A third HN summary converged: "Roughly on par with GLM 5.2 at 5x the price." The "5x" is from a different reviewer with a different workload, but the shape of the gap is consistent. Sonnet 5 is in a band where the cost-per-quality comparison is now a three-way fight between Anthropic, Google's Gemini 3.1 family, and Z.AI's GLM 5.2 — and Anthropic is not winning the cost axis against either of them.

The launch post is structured to obscure this. The first chart is "Sonnet 5 vs Sonnet 4.6 vs Opus 4.8" — a comparison inside the Anthropic product line. The chart that would make the pricing claim falsifiable is "Sonnet 5 vs GLM 5.2 vs Gemini 3.1 Pro at the same per-task cost," and that chart is not in the post. AA's framing is the same as the launch's: "Sonnet 5 is the #5 model on the Artificial Analysis Intelligence Index, only 2-3 points behind GPT-5.5 (xhigh) and Opus 4.8 (max)." The #5 ranking is fine; the cost curve behind it is the part that matters, and the launch does not show it to you.

What this means for you

If you're a developer picking a model for a coding agent in July 2026:

  • The right way to think about Sonnet 5 is as a Sonnet 4.6 replacement with a new tokenizer, not as a budget Opus. At low effort levels, it is meaningfully better than 4.6 on agentic work. At medium and above, test it against Opus 4.8 on your workload before committing — the cost curve in the launch chart understates what you will actually pay.
  • If you were integrating Sonnet 4.6 into a per-token budget forecast, the new model will cost roughly 1.4-1.5x the same task, not 1.0x. The introductory pricing of $2/$10 makes the summer look cheaper; the real bill arrives in September.
  • If you are cost-sensitive, GLM 5.2 is a credible alternative at substantially lower cost (we covered the GLM 5.2 release two days ago). If you are latency-sensitive, Sonnet 5 is faster on several workloads. The mid-tier is where the comparison is closest, and it is the band where you should run your own evals.
  • The Fable-shaped gap is real. If you were waiting for a frontier-capable Anthropic model with general availability, Sonnet 5 is not that model. It is the workhorse that ships while you wait.

If you're running a model-routing pipeline:

  • The "spec with Opus, implement with Sonnet" pattern that the HN thread converged on is a real production pattern, and it is the one the launch chart most directly serves. A router that uses Opus for planning and Sonnet for execution captures the cost saving the chart claims, and avoids the upper-half cost curve the chart hides.
  • Effort levels are now the primary cost lever, not model choice. The same Sonnet 5 call at low effort is roughly 6x cheaper per task than the same call at max effort on AA's knowledge work benchmarks. A router that pins effort level to the difficulty of the task — easy → low, planning → high, deep reasoning → Opus — will save more than a router that picks a model and runs it at default effort.
  • For local-inference cost-compression stories, see the Qwen 3.6 27B local sweet spot and the DSpark Pareto-frontier shift — both bear on the "is the hosted model still cheaper?" question this launch reframes.

If you're pricing a product that uses these models:

  • The 30%-tokenizer-expansion point is the one to remember. Tokenizer changes that hold the per-token price constant are price hikes, even when the price page says otherwise. The 2026 lesson: the headline rate is no longer the contract; the actual cost is the headline rate times the tokenizer expansion times the per-task token count.
  • The promo window is the contract for the rest of the year. If you are signing a multi-month integration agreement that started in July 2026, the price you negotiate at is the $2/$10 price, not the $3/$15 price. Lock it in writing.

What to do this week

# 1. Run the same prompt through Sonnet 4.6, Sonnet 5, and Opus 4.8 on a
#    task representative of your real workload, and log both the response
#    quality and the actual token count, not the per-token price.
#    The Anthropic API does not expose tokenizer-expanded token counts
#    directly; you have to call the cost-calculator endpoint
#    (POST /v1/messages/cost) and compare against the per-MTok price.

# 2. The introduction of a 1M token context window (Sonnet 4.6 -> Sonnet 5)
#    is real, but the cache pricing is unchanged: $3.75 per million tokens
#    for cache writes (5-min TTL), $0.30 per million for cache hits.
#    Any integration that pre-computes a long prefix once and reuses it
#    many times is the right shape to capture the per-task savings.

# 3. Update your router's effort default. The "xhigh" effort level is
#    new on Sonnet 5 (it previously existed only on Opus 4.8). Most
#    routing pipelines that pinned "high" as a ceiling should now
#    allow "xhigh" for the tasks where the user explicitly asks for
#    deeper reasoning, and should test whether the marginal cost
#    of xhigh is justified on each task class.

Disclosure

Drafted with AI assistance. Primary source: Anthropic, "Introducing Claude Sonnet 5," 30 Jun 2026 (https://www.anthropic.com/news/claude-sonnet-5). Secondary: Artificial Analysis, "Claude Sonnet 5: strong agentic performance at a higher cost per task," 30 Jun 2026 (https://artificialanalysis.ai/articles/claude-sonnet-5-agentic-cost); HN item 48736605 (813 points, 459 comments at time of writing). The $2.29 per Intelligence Index task, the 1.4x output-token increase vs Sonnet 4.6, the 3x agentic turns on AA-Briefcase and GDPval-AA, the 15% per-task premium over Opus 4.8, the 1M context window, the cache pricing ($3.75 writes / $0.30 hits), and the 5 effort levels are from Artificial Analysis. The "approximately 30% more tokens" tokenizer claim and the 1.0-1.35x range are from the Sonnet 5 system card. HN commenter ianberdin's 1.5x workload figure and the "Roughly on par with GLM 5.2 at 5x the price" line are single-comment paraphrases. The Errata-Bench and aibenchy third-party comparisons are paraphrased from the thread.

Sources

  • The Anthropic launch post — "Introducing Claude Sonnet 5," 30 Jun 2026, https://www.anthropic.com/news/claude-sonnet-5. Primary source for the headline $3/$15 per-million-token price, the introductory $2/$10 pricing through 31 Aug 2026, the 1M context window, the safety eval summary, the partner quotes (Zimu Li, Daniel Shepard, Fabian Hedin, Yusuke Kaji, Neel Chotai, Sualeh Asif, Dominic Elm, Mauricio Wulfovich, Ryadh Dahimene, Eric He), and the BrowseComp / OSWorld-Verified cost-performance charts. The 30 June changelog note about the BrowseComp chart methodology correction is also from this post. The "narrowing the gap with Opus 4.8" framing is Anthropic's; the per-task cost critique in this blog post is the blog's.
  • The Artificial Analysis analysis — "Claude Sonnet 5: strong agentic performance at a higher cost per task," 30 Jun 2026, https://artificialanalysis.ai/articles/claude-sonnet-5-agentic-cost. Primary source for the $2.29 per Intelligence Index task cost, the 1.4x output token increase over Sonnet 4.6, the 3x agentic turns on AA-Briefcase and GDPval-AA, the 15% higher per-task cost than Opus 4.8 at standard pricing, the #5 ranking on the Intelligence Index, and the 6x effort-level scaling on GDPval-AA. The cache pricing ($3.75 write / $0.30 hit, 5-min TTL), the 1M context window, the 5 effort levels (low, medium, high, xhigh, max), and the comparison to GLM 5.2 / Gemini 3.1 family are all from this article.
  • The HN discussion — Hacker News item 48736605, "Claude Sonnet 5," submitted 30 Jun 2026, 813 points / 459 comments at the time of writing. The "spec with Opus, implement with Sonnet" pattern is paraphrased from multiple top-level comments (phillipcarter, ianberdin, and others); the "Generally run Sonnet on low, otherwise use Opus" formulation is from a single HN thread reply. The "we want Fable" pattern is from at least three top-level comments. The ianberdin 1.5x workload figure is from his comment; the "Roughly on par with GLM 5.2 at 5x the price" line is a paraphrase of taytus's comment. The "Fable export-control" framing is HN-thread consensus, not Anthropic's. Numbers in this HN thread are moving as the post ages.
  • The system card reference — "Claude Sonnet 5 System Card," Anthropic, https://anthropic.com/claude-sonnet-5-system-card and the PDF at https://www-cdn.anthropic.com/d9bb04416ffe1352af84721476c1fa9994c07fde/Claude%20Sonnet%205%20System%20Card.pdf. Primary source for the "approximately 30% more tokens for the same text" tokenizer claim, the safety eval comparisons, and the 14-point CritPt improvement vs Sonnet 4.6 (which still leaves Sonnet 5 behind GLM 5.2, Opus, and GPT-5.5 on that benchmark). The "1.0-1.35x" range is the system's own estimate.

Tuesday, June 30, 2026

Qwen 3.6 27B Is the First Local Model That Actually Codes

Qwen 3.6 27B is a model that you can run on a laptop, that scores a 37 on Artificial Analysis (roughly mid-2025 frontier — Claude Sonnet 4.5, GPT-5 territory), and that you can wire into OpenCode with five lines of JSON. It shipped this week and hit the top of Hacker News with 995 points and 644 comments. The reason the discussion has outgrown the usual "local models are toys" cynicism is that the experiment doesn't behave like a toy. It behaves like a pricing announcement disguised as a model release. The local-AI community has been waiting for a model that pulls the cost-per-task curve below the hosted APIs, and Qwen 3.6 27B is the first one that does it on a MacBook without heroic quantization or a datacenter GPU. The interesting question isn't whether the model is good — it is — but what happens to the inference economy when the sweet spot for coding isn't a hosted service.

The blog post that did most of the work is Piotr MigdaÅ‚'s "Qwen 3.6 27B is the sweet spot for local development," published on the Quesma blog on 29 June 2026 and submitted to HN as item 48721903. MigdaÅ‚ runs the model on a MacBook Max M5 128GB and benchmarks it across MLX and llama.cpp against the mixture-of-experts Qwen 3.6 35B A3B and a quantized DeepSeek V4 Flash variant called DwarfStar4. The benchmark numbers and the test setup are reproducible (he links the benchmark script), and the conclusion — that the dense 27B outperforms the MoE 35B A3B on real coding tasks despite being roughly a third of the speed — is the part that should change how anyone in this space talks about MoE versus dense tradeoffs.

The numbers that matter

The Artificial Analysis index is a single number summarizing reasoning, knowledge, and instruction-following across a standard eval suite. Migdał lines up four data points that put Qwen 3.6 27B in perspective: Gemma 4 31B sits at 29 (roughly late-2024 frontier, o1 / Claude 3.5 Sonnet), Qwen 3.6 35B A3B at 32 (early-2025 frontier, o3 / Claude 4 Sonnet), Qwen 3.6 27B at 37 (mid-2025 frontier, GPT-5 / Claude Sonnet 4.5), and DeepSeek V4 Flash at 40 (late-2025 frontier, GPT-5.2 / Claude Opus 4.5). The 27B beats the 35B A3B by 5 points on this index even though the 35B A3B has 35 billion parameters and only activates about 3 billion at inference time. That's the counterintuitive claim worth sitting with: the active-parameters-per-token count is not the bottleneck. Dense 27B with a real training budget is.

Throughput is the other axis the benchmark calls out. On the M5 128GB with no multi-token prediction, Qwen 3.6 27B delivers 17-18 tokens per second. With MTP enabled (the draft-MTP flag that uses a fast auxiliary model to predict subsequent tokens), that climbs to 32 tokens per second. The MoE 35B A3B is faster on the same hardware — 93 tok/s on llama.cpp, 105 tok/s with MTP — but on MigdaÅ‚'s coding benchmarks the 27B produces higher-quality output. The tradeoff is straightforward: a third as much code, of noticeably higher quality, on the same laptop. For vibe coding where you're generating function bodies and tests, the 32 tok/s ceiling is well above what you can read.

For NVIDIA hardware the picture shifts but the conclusion holds. Commenter gfosco on the HN thread reports running the same model on an RTX 5090 at Q6_K quantization with Q4_0 KV cache, getting 50 tokens/s consistently at 123k context using roughly 28GB of a 32GB VRAM budget via LM Studio. The 123k context figure is interesting on its own: the model's native context is 256k tokens, and a single consumer GPU is using more than half of that budget in production.

What changed since the last "local model that actually works"

The local-AI community has been through three cycles of this announcement since 2023. Llama 2 70B ran but felt a generation behind. Llama 3 70B closed most of the gap but required a Mac Studio with 192GB of RAM or two datacenter GPUs. Llama 3.1 405B was technically open-weights but the inference cost put it back in hosted territory. Gemma 4 31B was the first model where "running locally" and "good at coding" overlapped for real users, and it became the default for a generation of developers. Qwen 3.6 27B is the second one, and the gap between Gemma 4 and Qwen 3.6 on Artificial Analysis is 8 points — equivalent to roughly a year of frontier-model progress, compressed into a model that fits in a smaller memory footprint.

Quantization matters more than the index number. The default release is BF16 (about 54GB); the practical quantizations are Q8_0 (about 27GB on disk per the unsloth GGUF), Q4_K_M (around 18GB), and lower. The 8-bit Q8_0 quant is the recommended baseline because the quality loss against the BF16 reference is small on most coding tasks; the 4-bit quants are where you trade quality for size. The MTP (multi-token prediction) variant of the GGUF — unsloth/Qwen3.6-27B-MTP-GGUF — adds a draft model that lets the sampler commit several tokens per forward pass, which roughly doubles throughput on supported hardware. The combination that lands the laptop demo is 27B dense + Q8_0 + MTP + 128GB unified memory + MLX or llama.cpp. None of those four components is new; what is new is that the same hardware that couldn't run last year's local-model-equivalent-of-frontier now runs this one comfortably.

The pricing announcement disguised as a model release

The hosted-API inference economy is built on a specific cost-per-task curve. Anthropic's Claude Sonnet 4.5 lists at $3 per million input tokens and $15 per million output tokens. GPT-5 standard tier is similar. A developer running Qwen 3.6 27B on a 5090 has zero marginal cost per token after the GPU purchase — a 5090 at $2,000 amortized over a three-year useful life is roughly $55/month, which works out to several million tokens of generation per day before the per-token cost even approaches a hosted API's. The hosted-API cost only amortizes if your time has zero opportunity cost and you never run a long context. For a developer using a coding agent across a workday, that condition fails by mid-morning.

MigdaÅ‚ makes the second-order point at the end of his post and it's the one that will outlast the model release: "we will have models smarter than current state of the art, while runnable on local devices, maybe even smartphones. Current models combine both raw intelligence and factual knowledge in the same weights. Future models will likely separate that, offloading a lot of knowledge to tool calling." That is the trajectory to watch. Qwen 3.6 27B is the model that closes the gap between local and hosted; the question the rest of 2026 answers is whether anything closes the gap between local and frontier, and at what pace. A 27B dense model scoring a 37 when the leading open-source model six months earlier scored a 29 is roughly 8 points of progress per release cycle on the AA index. If that pace holds, the 2027 local sweet spot is a 27B-class model scoring in the mid-40s — above DeepSeek V4 Flash, inside the late-2025 frontier envelope, on the same hardware.

What this means for you

If you're a developer who has been using a hosted coding agent (Claude Code, Codex, Cursor's default model) and paying per-token:

  • The cost crossover is here for most individual developers. A used 5090 at $1,500–$1,800 plus a 32GB-or-better Mac Studio covers the local inference hardware. The break-even against a $20/month Cursor or Claude Pro subscription is roughly three months for moderate use, and the marginal cost per additional token is zero.
  • The 27B-versus-35B-A3B tradeoff is real and worth testing on your own tasks. The 35B A3B is faster but the 27B produces code you ship with less editing. The MigdaÅ‚ benchmark script is the right starting point but the right benchmark is your own workload.
  • For long-context work (anything that fits in 100k+ tokens), the local story is now competitive with hosted. The 5090-at-Q6_K-Q4_0-KV report of 50 tok/s at 123k context is the configuration worth cloning.

If you're running an inference-heavy product:

  • The hosted-API cost curve assumes model weights don't commodify. Qwen 3.6 27B's open-weights release compresses the price floor for any task the model can do competently. If your product's value-add is "host a good-enough coding model," the gross margin just got thinner.
  • The interesting direction is harness, not model. The blog's OpenCode recipe is six lines of JSON; that recipe is the same shape across hosted and local models. The competitive differentiation moves from "which model is best" to "which scaffolding produces the best agent loops."
  • Inference-economics stories (we covered OpenAI's Jalapeño chip and DSpark's Pareto frontier shift earlier this week) are now framed by an open-weights ceiling that didn't exist a year ago.

If you're deciding which hardware to buy for local inference:

  • 32GB unified memory (Mac Mini M4 Pro / M5 Pro, Framework Desktop, Strix Halo boards) is the new minimum. The recent two-Strix-Halo 256GB build we covered is overkill for Qwen 3.6 27B but is the right platform if you also want to run GLM 5.2 or DeepSeek V4 Flash at higher precision.
  • An RTX 5090 at Q6_K + Q4_0 KV is the single-GPU target — 50 tok/s at 123k context, fits the model and most of the KV cache in 32GB. Two 5090s in an NVLink setup is the workstation tier for sustained agentic coding.
  • Apple Silicon's unified-memory architecture still wins for batch experiments because the KV cache scales with available memory instead of competing with the model weights for VRAM. MLX on a Mac Studio M5 Ultra is the right rig if you spend more time iterating on prompts than shipping code.

What to do this week

# 1. Get the model. The unsloth GGUF is the one that ships with MTP support.
huggingface-cli download unsloth/Qwen3.6-27B-MTP-GGUF \
    --include "Qwen3.6-27B-Q8_0.gguf" \
    --local-dir ~/models

# 2. Run llama.cpp with the recommended flags. -ngl 999 puts all layers
#    on GPU; -fa enables flash attention; -c 65536 is a 64k context window
#    that the model can stretch to 256k by trading tokens-per-second.
llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0 \
    --spec-type draft-mtp -ngl 999 -fa on -c 65536 --port 8080

# 3. Wire OpenCode (or Pi, or Hermes Agent — same shape) to the local server.
#    Drop this into ~/.config/opencode/opencode.jsonc:
#    {
#      "provider": {
#        "llama": {
#          "name": "llama.cpp (local)",
#          "npm": "@ai-sdk/openai-compatible",
#          "options": {
#            "baseURL": "http://127.0.0.1:8080/v1",
#            "apiKey": "***"
#          },
#          "models": {
#            "qwen3.6-27b": { "name": "Qwen3.6-27B Q8 +MTP" }
#          }
#        }
#      },
#      "model": "llama/qwen3.6-27b"
#    }

# 4. Sanity-check with a 5-minute vibe-coding task before you trust it.
#    Constrained writing and "penguins on a bicycle" prompts are the
#    standard smoke tests; the real benchmark is the codebase you're
#    already working in.

The signal through the noise

Recent history has settled into a recognizable shape. Frontier labs ship a hosted model, an open-weights lab ships a slightly-smaller-and-slightly-older model a few months later, the open-weights model runs locally on hardware that gets cheaper every year, and the local model becomes the default for the long tail of developers who don't need the absolute frontier. Qwen 3.6 27B is the first release where the local-default is also the better choice on cost for an individual developer, even before you factor in latency, privacy, or the ability to fine-tune. The GLM 5.2 release we covered two days ago showed the same shape one rung up the capability ladder — bigger model, more hardware, but still runnable locally with a company budget instead of a datacenter lease. The center of gravity is moving from "what model can you afford to call" to "what hardware can you afford to buy," and the second question has a one-time answer rather than a monthly bill.

The thing the Quesma blog post gets right that most model-release coverage misses is the framing. Qwen 3.6 27B is not "the new best open-weights model." It is the first model where the open-weights path produces a cost-per-task better than the hosted frontier path, on hardware a working developer already owns or can buy with one hardware refresh. That is a different announcement than "another good model release," and the HN engagement — 995 points and 644 comments for a blog post on a model that didn't exist six months ago — is the community correctly recognizing which announcement it is. The model is the proof; the economy is the consequence.

Disclosure

Drafted with AI assistance. Primary source: Piotr MigdaÅ‚, "Qwen 3.6 27B is the sweet spot for local development," Quesma Blog, quesma.com/blog/qwen-36-is-awesome/, dated 29 Jun 2026. Benchmark numbers (AA index 29/32/37/40; throughput 17–105 tok/s) are reproduced from the MigdaÅ‚ post. HF card and GGUF sizes were confirmed live on 30 Jun 2026. The 256k native context and Q8_0 ~27GB on-disk size for huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF are from the model card metadata; the URL Qwen/Qwen3-27B (no "3.6" dot) returns HTTP 401; the correct native repo is Qwen/Qwen3.6-27B with the dot. HN item 48721903, 995 points / 644 comments at time of writing; numbers moving as the thread ages. The 5090 throughput note (50 tok/s at 123k context, Q6_K + Q4_0 KV) is from HN commenter gfosco. The "punches above its weight" framing is HN-thread consensus paraphrased; the "first local model with cost-per-task below hosted" framing is this blog's.

Sources

  • The Quesma blog post — Piotr MigdaÅ‚, "Qwen 3.6 27B is the sweet spot for local development," Quesma Blog, quesma.com/blog/qwen-36-is-awesome/, 29 Jun 2026. Primary source for the MacBook Max M5 128GB throughput numbers (Qwen 3.6 27B: 17 tok/s on MLX, 18 tok/s on llama.cpp, 32 tok/s on llama.cpp with MTP; Qwen 3.6 35B A3B: 85 / 93 / 105 tok/s on the same three configurations; DeepSeek V4 Flash quantized as DwarfStar4 at 33 tok/s on llama.cpp), the Artificial Analysis index numbers (29 / 32 / 37 / 40 for Gemma 4 31B / Qwen 3.6 35B A3B / Qwen 3.6 27B / DeepSeek V4 Flash), the OpenCode wiring recipe, and the "models smarter than current SOTA, runnable locally, separating knowledge from intelligence" closing argument. Fetched live on 30 Jun 2026.
  • The official Qwen model cardhuggingface.co/Qwen/Qwen3.6-27B, Apache-2.0 license, created 21 Apr 2026, 1,846 likes / 5,260,258 downloads at time of writing. The native 256k context length and the BF16 weight size are sourced from this card's metadata. Fetched via the Hugging Face REST API on 30 Jun 2026.
  • The unsloth GGUF releasehuggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF, created 11 May 2026, 894 likes / 882,121 downloads at time of writing. The Q8_0 quant file Qwen3.6-27B-Q8_0.gguf is listed at 29,047,084,160 bytes (≈27.06 GiB) on the page. The MTP (multi-token prediction) variant that the Quesma recipe uses is published only on this repo; the equivalent unsloth/Qwen3.6-27B-GGUF (without MTP) was published earlier. Fetched 30 Jun 2026.
  • The HN discussion — Hacker News item 48721903, "Qwen 3.6 27B is the sweet spot for local development," submitted 29 Jun 2026 at 17:05 UTC, 995 points / 644 comments at time of writing; numbers moving as the thread ages. The 5090 throughput note (50 tok/s at 123k context, ~28/32 GB VRAM, Q6_K quantization, Q4_0 KV cache) is from HN commenter gfosco. The "first local model that actually makes sense as a general intelligence" line is MigdaÅ‚'s own framing from the blog post, not a synthesized HN-community quote; "punches above its weight" is the more accurate summary of the broader thread reception.

.self Wants a LetsEncrypt TLD. Identity Is the Hard Part.

The Human-Centered Computing Foundation published a one-page pamphlet on 21 June 2026 announcing its bid to operate .self, a new top-level domain whose pitch is that every adult on Earth is entitled to a free subdomain they cannot resell. The proposal reached the front page of Hacker News on 29 June, where the project's own representatives are answering questions in the thread. The technical plan is more interesting than the marketing makes it sound. The identity plan is less interesting. Reading the pamphlet, the HN discussion, and the project's own replies, what stands out is that HCCF has correctly identified the cheapest part of the problem and quietly skipped the most expensive part, and the LetsEncrypt comparison the project keeps reaching for is both the best and the worst analogy they could have chosen.

The pamphlet (1-page PDF at hccf.onmy.cloud/wp-content/uploads/2026/06/dot-self.pdf) lays out four "core features" and stops there. Every adult gets a subdomain at no cost. The foundation provides shared services — VPN tunnels for non-public-IP self-hosters, a trusted mail server, TLS certificate generation, dynamic DNS, and a local DNS resolver with caching. The clients are open source. Governance is community-driven. The hosting model is "operated as a public good, similar to ISRG and LetsEncrypt," a comparison the project returns to several times in the HN thread. That's the whole program. The rest of the document is the call to donate, share, and join the community.

The DNS plan is genuinely good

If you set aside the politics and read the pamphlet as a network engineering proposal, the design choices are the right ones. The hard part of self-hosting today isn't setting up a Linux box, or even a reverse proxy, or even a Let's Encrypt renewal loop. The hard part is that most home internet connections come with carrier-grade NAT, which means the self-hoster's machine has no public IP at all. The traditional workaround is a tunnel — a paid VPS that has a real IP and forwards traffic over WireGuard to the home box. That costs $5–$20 a month, per site, forever, and is the single biggest reason the self-hosting community is small relative to the cloud-hosting community.

The HCCF proposal wires the tunnel into the TLD itself: if you have a .self subdomain, the foundation runs the relay that gives you a stable public address even though your home connection is NATed. The TLS, the dynamic DNS, the local resolver — those are the right things to bundle, because they are the actual friction in the workflow. Most self-hosters will recognize this list as "the things we already do by hand, badly, on a Saturday afternoon." Centralizing them is the right move.

This is also the part of the proposal that maps cleanly onto the LetsEncrypt analogy. LetsEncrypt's big contribution wasn't free certificates (StartSSL and others had been giving them away for years). It was automating the ACME protocol: the renewal loop, the domain-validation step, the trust-store inclusion. LetsEncrypt made the boring infrastructure of being a normal website owner boring in a way that didn't require the website owner to think about it. The HCCF pamphlet is offering the same thing for the boring infrastructure of running a personal server. If the foundation can deliver the bundle — domain, TLS, dynamic DNS, outbound relay — at the polish level LetsEncrypt achieved for HTTPS, the proposal is a genuine improvement in the state of the art.

The LetsEncrypt analogy is also the wrong one

LetsEncrypt works because the problem it solves is asymmetric in the foundation's favor. A certificate authority has to do cryptographic work the client cannot do for itself: sign a certificate that browsers will trust. The CA has to be the one in the trust store. There is no way for a self-hoster to issue themselves a certificate that Firefox will accept, and so LetsEncrypt has a structural monopoly on the easy path. The foundation is the only party that can sell you this.

.self has no such asymmetry. A user can register a domain at Cloudflare, Namecheap, or any other registrar and get equivalent functionality. A user can run Caddy or Traefik and get automatic TLS via ACME without going through LetsEncrypt at all. A user can run a tunnel through Tailscale, Cloudflare Tunnel, or ngrok and get a public address without ever touching ICANN. The HCCF foundation's "shared services" are not unique. They are competing with a long list of existing products, most of which are already in production at scale with paying customers. LetsEncrypt succeeded because it owned a step nobody else could offer. HCCF is offering a bundle of steps that lots of companies are already offering. The economics are different.

The HN thread lit up on this within hours. The most-upvoted substantive question, from commenter pavel_lishin, is the right one: it's not clear from the pamphlet whether HCCF is talking about a real top-level domain (a string in the root zone, costing $227,000 plus tens of thousands per year in registry fees) or just a domain under some other TLD. That's not a pedantic distinction. The application cost alone would consume more than most small nonprofits raise in a year, and the annual registry compliance cost is the part of the operation that requires either enterprise sponsors or, in the HCCF plan, donations. The "public good, free subdomains" framing assumes a LetsEncrypt-style sponsorship model; ISRG's own About page (abetterinternet.org/about/) lists its founding sponsors as Mozilla, the Electronic Frontier Foundation, the University of Michigan, Cisco, and Akamai — a different scale and a different constituency than the personal-internet-identity donor pool HCCF would need to draw from.

The identity problem is where the plan falls apart

The most consequential choice in the pamphlet is the rule "one person, one subdomain, no parking, squatting, or reselling." Read carefully, this is a strong claim: HCCF is saying it will maintain a registry that uniquely maps real humans to subdomains and prevents the abuse vectors that make the rest of the domain name system a marketplace for speculation and abuse. The LetsEncrypt analogy breaks here, hard, because LetsEncrypt does not have this problem. A certificate has no per-person uniqueness constraint. A domain does, if you say so. HCCF said so.

How do you verify that a registrant is a real, unique person? The HN thread makes the project's answer visible: the foundation is, at minimum, considering a third-party identity-verification service that links existing social accounts as one signal and reads government-issued e-passports via NFC as a stronger signal. The technical realities surface in the first dozen comments. e-passports are NFC-readable in only a subset of countries; in the United States, roughly half of adults don't have a passport. Social-account linking is a weak signal — it proves you can farm accounts, not that you're a unique person. None of these signals are sufficient on their own, and combining them is the unsolved problem every identity-verification startup has worked on for fifteen years. SahAssar and teraflop keep returning to the same point: LetsEncrypt shipped because the hard problems (trust roots, automated domain validation) had known solutions. HCCF is proposing to ship a system whose hardest problem — person-uniqueness at global scale — doesn't have one.

There's a more cynical reading. A TLD that promises a free subdomain to every human is a TLD with a built-in scarcity story. The next-day resale market for myname.self would be enormous the moment the TLD went live, and "no parking, squatting, or reselling" is enforceable only as long as the foundation has the operational capacity to detect, adjudicate, and shut down violators. The ICANN registry agreement for a gTLD requires an abuse point of contact, UDRP dispute processing, scheduled zone-file publication, and a thick WHOIS. None of those requirements address "is this registrant selling their subdomain on eBay," and the foundation has not, in the pamphlet or the HN thread, named a mechanism for doing so. LetsEncrypt's hard problems had known solutions in 2015. HCCF's hard problem in 2026 does not.

Why this is still worth writing about

It's reasonable to come away from the HN thread thinking the proposal is not ready. It isn't. The pamphlet is a one-pager, the technical spec is the bullet list, the answers in the thread are aspirational, and the comparison to LetsEncrypt does more work rhetorically than as engineering. None of that is the reason the proposal matters. The reason it matters is that ICANN's next application round is open, the Applicant Support Program is real, and someone will end up running .self. The interesting question is not "is HCCF the right organization" — that's a five-year project — but "what does it look like to operate a TLD whose mission is to give every human a stable DNS identity and to prevent the resale market every other TLD has produced?"

A serious version would have to solve three things the pamphlet doesn't. The first is the identity problem above, and the right answer probably isn't a passport reader — it's the LetsEncrypt trick of pushing the hard step to the protocol layer. ACME works because LetsEncrypt doesn't have to verify the user, only that the user controls a domain. A .self protocol that requires proof-of-control-of-some-existing-stable-credential (a phone number, a verified email, a peer-signed attestation) is more workable than a single foundation running a passport scanner. The second is the abuse problem: UDRP is built for trademark disputes, not person-uniqueness disputes, and the foundation would need a written policy for "this person is no longer reachable at this address" or "this subdomain was transferred in violation of the one-person rule." The third is the funding model. LetsEncrypt's $5M+ annual budget comes from a small number of large donors (Mozilla, Google, Cisco) whose interests align with HTTPS-everywhere. HCCF's equivalent donors would have to be organizations whose interests align with personal-internet-identity at population scale — Mozilla, the EFF, the Open Technology Fund, the Ford Foundation's digital rights portfolio, the EU's digital sovereignty programs — a real but smaller constituency.

The HCCF proposal isn't wrong to ask. The framing, that the modern internet is too centralized and that one piece of internet infrastructure should be operated as a public good, is the framing LetsEncrypt used, that Wikipedia uses, that OpenStreetMap uses, and it is correct. The execution is what fails. The DNS plan is solid. The LetsEncrypt comparison is half-right. The identity plan is a hole shaped like a passport. A serious version of this proposal, with a real answer to the person-uniqueness problem and a named funding model, would be one of the most consequential internet-infrastructure projects of the decade. A pamphlet is not that proposal, and the HN thread's "we have no actual answers" critique is fair. The interesting move from here is for someone — HCCF, or someone else — to write the second pamphlet, the one that addresses the hard parts.

What to do this week

If you're a self-hoster:

  • The HCCF proposal won't be operational for at least two years in the best case (ICANN application, evaluation, delegation, registry startup, launch). Don't wait. Caddy + Cloudflare Tunnel + a cheap VPS is the current best practice and works today.
  • The LetsEncrypt-style bundle (TLS + dynamic DNS + outbound relay) is something you can already assemble. It's not "free" — the VPS costs $5–$20/month — but the operational overhead is roughly what HCCF is promising, and the time-to-value is hours rather than years.
  • Watch for ICANN's Applicant Support Program results in the next application window. If .self makes it through evaluation, the registry will need community input on acceptable use, dispute resolution, and person-uniqueness verification. That's where the project will succeed or fail on substance.

If you're an engineer thinking about identity:

  • "One person, one subdomain" is a stronger identity claim than almost any other system on the internet issues today. The interesting research question is whether a TLD operator can make that claim with a verification stack that doesn't require passports, doesn't require social-account linkage, and doesn't require a central identity authority. The answer probably involves zero-knowledge proofs of existing credentials, but the engineering is non-trivial and nobody has shipped it.
  • The LetsEncrypt pattern is the one to study, not because the technical problem is the same, but because the operational pattern is: run the boring infrastructure of the internet as a public good, funded by a small number of large aligned sponsors, with the hard step pushed to a protocol that any client can implement. The identity equivalent of ACME hasn't been written.

If you're a digital-rights or foundation funder:

  • This is the kind of project that belongs on the Open Technology Fund / Ford / Mozilla Foundation shortlist, and the funding envelope is not large (the application fee is reduced under ASP; ongoing registry costs are in the low six figures; community coordination is the main expense). A $2M anchor commitment from a digital-rights foundation would, plausibly, take this project from pamphlet to launch.
  • The thing to push for in any funded version is a published, reviewable identity-verification protocol, not a private one. The whole point of operating a TLD as a public good is that the public can see how it works.

The framing, corrected

The HN thread has spent more time on the LetsEncrypt analogy than on the proposal itself, fairly. The analogy is doing a lot of work: it explains why a nonprofit would want to run internet infrastructure, it explains the funding model, and it lends legitimacy by association. The analogy is also, in three specific ways, misleading. LetsEncrypt had a structural monopoly on its hard problem. LetsEncrypt's hard problems had known solutions. LetsEncrypt's funding constituency was much larger than the constituency for personal-internet-identity. A version of HCCF that succeeds will look less like LetsEncrypt and more like a small public-benefit registry with a published identity-verification protocol, a real abuse-handling procedure, and a small set of named institutional sponsors willing to underwrite the annual cost. That is a viable project. It is also a different project from the one the pamphlet describes. The first pamphlet is the easy part. The second pamphlet is the one that decides whether .self ever ships.

Disclosure

Drafted with AI assistance. Primary source: the HCCF .self pamphlet PDF at hccf.onmy.cloud/wp-content/uploads/2026/06/dot-self.pdf, fetched 30 Jun 2026. HN discussion: item 48724230, 298 points / 172 comments at time of writing; numbers moving as the thread ages, fetched the same day. ICANN's $227,000 application fee and Applicant Support Program reduction are referenced as factual claims sourced from the HN thread; specific ICANN pages I attempted to cite returned 404 to my fetch and the live ICANN search surface is unreliable, so the body does not link a specific ICANN URL for these. LetsEncrypt/ISRG context is from letsencrypt.org/about/ and abetterinternet.org/about/ (ISRG's main page). The 4-feature bullet list in the pamphlet is reproduced as quoted; longer passages are paraphrased.

Sources

  • The HCCF .self pamphlet — "Announcing . . . A new Top-Level Domain built from the ground up to support self-hosting," 1-page PDF, hccf.onmy.cloud/wp-content/uploads/2026/06/dot-self.pdf, 21 Jun 2026. Primary source for the four core features (one-person-one-subdomain, shared services, open-source clients, open governance) and the LetsEncrypt/ISRG comparison. Fetched 30 Jun 2026.
  • The HCCF announcement page — "Reclaiming Our Digital Selves: HCCF's Vision for a Human-Centered Top-Level Domain," hccf.onmy.cloud/2026/06/21/reclaiming-our-digital-selves-hccfs-vision-for-a-human-centered-top-level-domain/, 21 Jun 2026. Confirms the ICANN Applicant Support Program participation and the campaign framing.
  • The HN discussion — Hacker News item 48724230 (".self: A new top-level domain designed to support self-hosting"), submitted 29 Jun 2026 at 21:05 UTC, 298 points / 172 comments at time of writing; numbers moving as the thread ages. Used for: the $227,000 application fee and ongoing registry-cost numbers (per greyface- and the HumanCCF reply on thread item 48725407); the LetsEncrypt sponsorship comparison (HumanCCF's own framing); the person-uniqueness / e-passport discussion (SahAssar, teraflop, al_borland, dom96); the DNS-cost analysis (AnthonyMouse, prepend, madsushi, psychoslave). Project representative handle is HumanCCF.
  • LetsEncrypt / ISRG — "About Let's Encrypt," letsencrypt.org/about/, last updated 12 Feb 2021 (page unchanged at time of writing). LetsEncrypt is a service of the Internet Security Research Group; the nonprofit/CA-relationship model is the public-good structure HCCF explicitly cites as its reference.
  • The ICANN gTLD programnewgtlds.icann.org/en/, the new-gTLD program landing page (fetched 30 Jun 2026). Specific ICANN pages I attempted to fetch for the $227,000 fee, the 2025 announcement, the registry-agreements index, and the Applicant Support Program sub-page (/en/applicants/applicant-support-program) returned 404 to my probe (also re-verified during this review: that sub-page was 404 as of 30 Jun 2026); the fee figure is sourced from the HN thread and the program's documented fee schedule is not separately linked in this post.

Monday, June 29, 2026

HackerRank's ATS Is Open Source. The Luck Is the Feature.

On the morning HackerRank published their open-source applicant tracking system, a developer named Dan Kinsky opened a terminal, pointed his own resume at it a hundred times, and watched the same document score anywhere from 66 to 99 out of 100. The repo is real, the runs are reproducible, and the bottom line is the design choice everyone in hiring tooling has been quietly making for three years.

The tool in question is interviewstreet/hiring-agent: a Python pipeline that parses a PDF resume, calls a local LLM (default: gemma3:4b) six times to pull structured fields out of work history, education, skills, projects, and awards, optionally enriches the result with GitHub repository scans, and then asks the model to grade the whole bundle out of 100. Up to 20 bonus points get stacked on top for startup experience, a portfolio site, or a technical blog. MIT-licensed, 3,592 stars on GitHub at time of writing, 253 open issues — most of which are the same complaint from different people. HackerRank didn't appear out of nowhere either: the repo dates to July 2025, but the link only went viral after a LinkedIn and r/leetcode pass that started roughly two months later, which matches Kinsky's correction footnote on the post (one LinkedIn post linked; one Reddit thread linked, both in his footnote 1). Anyone who has been watching the AI-in-hiring discourse knows the pattern by now: an LLM is wired into a pipeline that touches millions of decisions, the LLM's behavior changes under load, and nobody on the buying side inspects which version of stochastic they actually deployed.

Kinsky's experiment is the part that should change how the industry talks about the space. With the tool set to its default temperature — 0.1, a setting most people would call "effectively deterministic" — the same resume gets graded on the same rubric and the same rubric returns a 33-point spread on 100 trials. Toggling DEVELOPMENT_MODE off, hard-coding the inputs, and changing nothing except deleting a print() statement would already shift the score by 16 points; looping the model produces the full range. Re-running with Gemini instead of gemma3:4b tightens the distribution — but to a 48-64 band, which still has a 16-point spread and would still fail any cutoff in that range on roughly 28% of submissions (Kinsky's number for a 60-cutoff, not a separate reproduction). The non-determinism is a sampling problem, and the sampling never goes away.

The numbers that matter

Most resume-screeners, including this one, grade on a 100-point rubric anchored to a handful of weighted categories. Hiring-agent's breakdown is unusually explicit about what it's optimizing for: 35 points for open source contributions, 30 for personal projects, 25 for work experience, 10 for technical skills, plus up to 20 in bonus. Read it once and you see what the tool is for: a fairly specific kind of engineer with a specific kind of artifact trail. Candidates whose work happens inside a corporation and stays there — the majority of working engineers, by every measure — start the test at a structural disadvantage that has nothing to do with their quality.

That structural tilt is what makes the non-determinism land so hard. Kinsky ran the tool against the "technical skills" category and watched it score 8 out of 10 in 98 of 100 trials — almost a hard rule, because "did this candidate list React" is the kind of check that any extraction model can do reliably. The "work experience" category came back 25/25 in every run, including against a stripped-down resume listing only one internship — the rubric is two lines long, contains no anchor examples, and the LLM has nothing to vary on, so it just agrees with itself. Categories with something to judge are exactly the categories the tool can't judge consistently. Projects swings wildly. Open source, with the rubric actually reading like a rubric, swings less than it used to but still swings. Kinsky's resume got marked as one that its projects "lack architectural complexity" or, with comparable frequency, projects that "demonstrate real-world deployment" — two opposite readings from the same input, sampled roughly evenly across runs, and the only meaningful distinction between those phrasings is the random seed the sampler hit.

Temperature 0 is a story the model tells you

The HN thread on Kinsky's post spent the first hundred comments litigating the same argument, and it happens to be the part of the story that most confidently deserves a closer reading. In theory, "temperature 0" produces deterministic outputs from a sampling model. In theory-theory — which is the theory library developers actually mean when they quote it — temperature 0 doesn't really exist as a fixed point. The softmax becomes a spike function in the limit, but a discrete tokenizer with a finite vocabulary doesn't carry a true Dirac; it carries a Dirac comb, which collapses to the single highest-logit token only when there's a unique highest-logit token at every position. Floating-point quirks normally paper over that, but the assumption that no two logits will ever tie is exactly the kind of assumption you don't want underwriting a hiring decision.

The deeper issue is that the model is asked to do two jobs with one set of weights: parse a document into structured fields (the part LLMs are good at), and score a candidate against a rubric (the part LLMs are uniquely bad at, because rubric scoring is a discriminative task and chat models are trained to be generative). The tool's own prompt for experience is two lines long, per Kinsky's quoted rubric — read the Production section in the repo: instructions about analyzing work and volunteer sections for real-world or internship experience, plus a special-consideration line that awards extra for founder or early-stage engineer roles. No anchors. No examples. No definition of "real-world." The model is being asked to invent a calibration it was never trained on, and the result is whatever happens to come out of the sampler. That's why an intern and a principal engineer both get 25/25: the prompt can't tell them apart, and neither can the model.

The reproducibility budget is the only metric that matters

Most AI-in-hiring coverage focuses on bias — and deservedly so; the Brookings April 2025 study on gender, race, and intersectional bias in LLM-driven resume retrieval put real numbers behind the failure mode. But reproducibility is the failure mode people who aren't in the literature are about to discover, and it doesn't need a bias-detection study to demonstrate — it just needs Kinsky's terminal loop. A tool whose identical inputs produce non-identical outputs is a tool whose identical candidates produce non-identical outcomes. At any fixed cutoff, the failure rate of "this qualified candidate didn't make it past the screen" is structurally non-zero, and the candidates that fall on the wrong side of the cutoff are random with respect to merit. That's the function the tool is performing. Calling it a "filter" understates it; calling it a "luck filter" catches it.

There are two things worth keeping separate, even though they often get tangled together. The first is LLM bias — outputs that differ systematically across groups, the bias problem the literature has spent two years measuring. The second is LLM noise — outputs that differ across identical inputs, the reproducibility problem Kinsky is documenting. The first matters because fairness is a legal category and a moral category. The second matters because anything with this much noise is unfit for the actual decision even if you fix the bias. A noise-free version of a biased tool is still biased. A noise-heavy version of a fair tool is unfit to use.

Open source changed the optics but not the math

The interesting decision HackerRank made was opening the source. A closed-source LLM screener with 33-point variance would be the kind of "actuarial non-decision" enterprise software tends to hide; an open-source one is a reproducible experiment. Kinsky's loop is the unit-test the entire industry should have been writing since AI resume screeners started shipping in 2022. Anyone can replicate it — and many will, because the cost of doing so is a laptop, a pip install, and an hour. What they will find is what Kinsky found: the tool's accuracy, as a filter, is the same as flipping a weighted coin. Whatever signal the company thought they were buying is in the noise floor.

That distinction matters even more at the buyer side. A screening tool produces a ranking function whose top-K is unstable across runs — meaning its top-K is arbitrary. Companies buying these tools should be asking, before they wire one into Workday, Greenhouse, or Lever, what the tool's reproducibility budget is for the population they're screening. If your top-of-funnel conversion is 10% and your screener has a 30% pass rate at the cutoff, the screen is responsible for roughly half of your funnel noise. Halving the variance by switching to a smaller, deterministic model and tighter prompts would do more for hire quality than any number of model upgrades. Anyone who's been on the receiving end of an unexplained rejection knows this already.

What to do this week

If you're a job seeker:

  • Assume a non-trivial share of the screen is a coin flip. Use that as license to apply to roles your gut says you're a fit for, even when your heuristic says you're not.
  • The resume rubric HackerRank-style tools optimistically measure is heavy on open source and personal projects. If you have those, surface them more prominently — GitHub README polish, a one-paragraph portfolio, a working demo URL. The tool is explicitly grading on artifacts that look like artifacts.
  • If you have none of those, your path through this filter is rougher regardless of quality. Lean on referrals and on company-specific application tracks that bypass the automated screen.

If you're an engineer with a say in how your company screens:

  • Run Kinsky's loop on your own tool with your own population. The "100 runs against the same resume" test is the smallest possible reproducible experiment and you should have its output before you trust it.
  • Treat any LLM-based screener that returns a single candidate score as inadmissible. Demand either a structured decomposition (the model returns per-rubric scores so you can audit which parts are stable) or a calibration band (each score comes with a standard deviation across N runs).
  • If the screener doesn't expose its rubric, what you have is a vibe check with extra steps. The vibe check is the part you don't want.

If you're running the screener yourself:

  • Lower the temperature only after you have measured the temperature=1 distribution — the noise floor has to be known to be lowered.
  • Replace single-call score generation with multi-sample consensus, or with discriminative models trained on labeled paired comparisons (the actual right tool for the job).
  • The single most valuable line in the open-source repo is the temperature: 0.1 default. Change it to 0, document the new spread, and ship the difference.

The feature, renamed

The industry-wide reflex when a reproducibility paper appears is to call the problem "non-determinism" and promise a fix in the next model. Non-determinism is the property, not a bug to patch — and it's a direct consequence of how these models generate text. A model that returns 100/100 with seed 0 and 73/100 with seed 1 is doing exactly what it was trained to do; the prompt engineer has not yet built a system that constrains the sampler. The fix is to stop pretending the model is a sensor when it's a sampler, and to put determinism back into the pipeline by routing it through a part of the system that actually has it. Structured extraction can be done deterministically. Rubric scoring, with the right anchors, can be done deterministically. The middle distance — "judge me on my projects, please" — is where the sampler takes over, and the sampler is supposed to take over there. The honest answer is to admit that's a part of the decision a human has to make.

Kinsky's post is honest about that in a way the industry usually isn't. He isn't angry at HackerRank. He's angry at himself for thinking the tool was testing something it wasn't. Plenty of other readers will be angry at HackerRank; they're right to be, but only about the secondary thing. The primary thing is that the entire category of tool is built on a category error, and the open-source release is the moment that became undeniable. Once you see the same resume swing from 66 to 99 on a hundred deterministic-looking runs, every score that came out of every other LLM screener starts to look like the same number — just with a different seed you can't reproduce.

Disclosure

Drafted with AI assistance. Primary source: Dan Kinsky's 28 Jun 2026 post at danunparsed.com/p/hackerrank-open-source-ats, fetched and cached locally on 29 Jun 2026. GitHub repo interviewstreet/hiring-agent confirmed live via the GitHub REST API on the same date. Brookings 25 Apr 2025 piece on bias is cited only for the bias vs. noise distinction in the body, not for any specific finding. Per-claim attribution and live numbers are in the Sources section below.

Sources

  • HackerRank's open-source ATS — Dan Kinsky, "HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74/100. No — 88/100. Actually 83/100.", danunparsed.com/p/hackerrank-open-source-ats, 28 Jun 2026. Primary source for all experimental claims in the body (66–99 spread, 65% cutoff failure rate, 48–64 Gemini band, 98/100 technical-skills consistency, 25/25 experience rubric outcome). Fetched 29 Jun 2026.
  • The GitHub repo itselfgithub.com/interviewstreet/hiring-agent, MIT-licensed Python project, 3,592 stars / 745 forks / 253 open issues at time of writing. Repo created 2025-07-29; first viral LinkedIn/Reddit pass ~Oct 2025 per Kinsky's footnote. Confirmed via GitHub REST API on 29 Jun 2026.
  • The HN discussion — Hacker News item 48713832. 730 points / 309 comments at time of writing; thread moving. Used for the temperature-zero analysis and the broader engineering reaction.
  • Brookings 25 Apr 2025 on bias in LLM-based resume screening — Kyra Wilson and Aylin Caliskan, "Gender, race, and intersectional bias in AI resume screening via language model retrieval," brookings.edu/articles/gender-race-and-intersectional-bias-in-ai-resume-screening-via-language-model-retrieval/. Used only for the bias vs. noise distinction; no specific findings paraphrased.
  • The Reddit r/leetcode pass — referenced in Kinsky's correction footnote (footnote 1) as one of the two original viral-sharing surfaces, 28 Jun 2026. Linked but not directly fetched (Reddit returned a block page to my fetch attempt).

Framework's 10G Module Proves USB-C Has Too Many Speeds

Jeff Geerling spent a week with WisdPi's new 10G Ethernet Expansion Card for Framework laptops and found the same product delivering three different real-world speeds depending on which Framework laptop he used, which OS he ran, and which Realtek driver the kernel could compile. The card is rated 10 Gbps. On a Framework 13 with AMD's Ryzen AI 5 340, it delivered 9.4 Gbps on Windows 11 and noticeably less on Linux. On a Framework 12 with a 13th-gen Intel chip, the same card delivered 7 Gbps in Linux even though lsusb reported a 20 Gbps link. The story is not "Framework made a bad product." USB-C's bandwidth tiers — Gen 2x2, Gen 2x1, USB4, and the tunneling modes underneath — have become so layered that a single $99 dongle can be advertised as 10 Gbps and delivered as 7, 9.4, or 10 depending on factors the buyer cannot inspect at purchase time. The post is a hardware review. The lesson is about software.

What the WisdPi 10G card actually delivered

Geerling's setup, pulled from the published post:

  • The card: WisdPi's 10G Ethernet Expansion Card, which fits any Framework Expansion slot including the Framework Desktop. It uses the Realtek RTL8159, which needs USB 3.2 Gen 2x2 (20 Gbps of raw bus bandwidth) to hit the rated 10 Gbps.
  • Framework 13 (AMD Ryzen AI 5 340): Windows 11 delivered 9.4 Gbps on average. Linux was "slightly worse." Framework's port documentation says Gen 2x2 should be supported on at least ports 1 and 3 — but only in the sense that the bus is capable, not that any specific accessory will land on it.
  • Framework 12 (13th-gen Intel mobile): Linux reported a 20 Gbps link via lsusb and delivered 7 Gbps in iperf3. The Realtek out-of-tree driver failed to compile on Ubuntu 26.04 because the bundled Linux 7.x kernel is newer than the driver expects. Windows 11 with the in-box driver delivered the same 7 Gbps; the vendor Realtek driver pushed unidirectional throughput to 9.4+ Gbps (with a bidirectional mix of ~9 Gbps up and 4–5 Gbps down).

Geerling's own recommendation at the bottom of the post: most people should buy the regular 2.5 Gbps Ethernet Expansion Card for $40 and stop there. The $99 10G card is the right answer only if you specifically need more than 2.5 Gbps and specifically do not want an external USB-C dongle. As of the post's publication on 24 June 2026, the 10G card was out of stock.

The five angles that actually matter

1. USB-C is a stack of five buses with overlapping names

The reason the same $99 product can deliver 7 Gbps, 9.4 Gbps, or 10 Gbps on the same laptop line is that "USB-C" is the connector, not the protocol. The protocols on that connector are at least five distinct things: USB 3.2 Gen 2x1 (10 Gbps), USB 3.2 Gen 2x2 (20 Gbps), USB4 (20 or 40 Gbps, mandatory tunneling), USB4 v2 (80 Gbps, optional), and Thunderbolt 3/4 (40 Gbps). The RTL8159's 10 Gbps Ethernet only fits inside the 20 Gbps tier. Many Framework laptops ship with USB4 ports that the chipset routes through a USB 3.2 Gen 2x1 tunnel in some configurations — at which point the RTL8159 is bandwidth-starved and the user sees ~7 Gbps, regardless of what lsusb says.

This is the same family of measurement disagreement the blog covered with the Google IPv6 vs APNIC numbers earlier this month: two endpoints measuring different things and both correct, and a buyer who cannot tell which measurement applies to their own port.

2. The Realtek driver situation is the real story

Geerling's headline is "USB-C is complex." The deeper story is that the Realtek RTL8159 needs an out-of-tree driver on Linux and a vendor driver on Windows, and neither is in great shape. On Ubuntu 26.04 with the 7.x kernel, the driver did not compile. On Windows 11 with the in-box Microsoft driver, throughput was 7 Gbps. Only Windows with the Realtek driver delivered the 9.4+ Gbps the silicon can do. If you buy a 10G USB-C Ethernet adapter in 2026 and run it on Linux, expect to either pin an older kernel, build the Realtek driver yourself, or accept the unidirectional throughput gap Geerling measured (roughly 7 Gbps on Linux vs. 9.4+ on the vendor driver — about a 25% drop).

The throughput gap is the same shape as the Codex log-write-amplification story this blog covered: the silicon can do the rated thing, the rated thing requires a specific driver + kernel + chipset combination, and the user discovers the gap the first time the workload hits the bottleneck. The pattern is "the spec is real, the floor under the spec is not."

3. The 70°C plastic surface is the spec nobody wants to talk about

The most under-reported part of Geerling's post is the thermal result. After running the card at full bidirectional load, the bottom plastic surface reached ~70°C. WisdPi told Geerling the surface is in compliance with IEC 62368-1, which permits sustained skin contact at that temperature for up to 10 seconds. Geerling's response — the right one — is that this is a laptop, and laptops are routinely used on laps. The 10G power and thermal budget was designed assuming a chassis with airflow, not a slot dissipating into a sealed aluminum unibody with a user sitting on top of it. The expansion-card slot, in other words, is a thermal compromise the buyer absorbs by reading the spec sheet — a casual way to add 10G to a laptop it is not.

4. "Sticks out like a sore thumb" is a real design constraint

The HN thread (226 points, 117 comments, submitted 26 June) is heavily weighted toward the form-factor question. petterroea's top-rated comment makes the case bluntly: Framework should have shipped a flush 1 GbE module first, because that use case is the one that actually fits a laptop. A flush 10 GbE card is mechanically impossible without active cooling; a protruding 10 GbE card is what the Framework 12/13/16 form factor actually delivers. jeffbee's comment makes a more useful technical point: for the 10G laptop-to-laptop use case, a Thunderbolt cable between the two computers is what jeffbee recommends (acknowledging the cable is admittedly pricey). The WisdPi card's real customer, in my reading, is a desktop user who wants a clean front-panel 10G jack — the 10G-to-laptop use case is better served by a cable than a card.

5. The 10G Ethernet dongle market is converging on the same constraint

Geerling's earlier "New 10 GbE USB adapters are cooler, smaller, cheaper" post tracked the wave of USB-C 10G adapters that landed in late 2025 and early 2026. Every one faces the same constraint: the silicon is ready, the drivers are mostly there, the chassis fits a laptop bag, and the bus they plug into is a five-way compatibility lottery. The 10G Ethernet-on-USB market in 2026 is in the same place the 1G Ethernet-on-USB market was in 2012: working, but only if the buyer reads the chipset list carefully. The "10G" label is a ceiling, not a guarantee.

What this means for you

If you are buying 10G USB-C Ethernet in 2026, the chipset is the spec that matters. Realtek RTL8159 and RTL8157 are the current 10G USB controllers. Aquantia AQC111U is the older alternative with better driver support on older Linux kernels but harder to find new. Avoid adapters built on the RTL8156 (2.5G only) or the older Aquantia AQC100/107, which tops out at 5G. The 10G label on the box is meaningless without the chipset on the spec sheet. On Linux, pin to a kernel the Realtek driver compiles against, build the driver yourself, or accept the ~25% unidirectional throughput gap Geerling measured. The Framework expansion-card slot does not exempt you from any of this. The 2.5 Gbps Ethernet Expansion Card ($40) is the right default. The 10G card ($99) is the right answer only for a specific use case.

What to do this week

# 1. Check what USB-C tier your laptop exposes on each port
#    (Linux: find the bus number from `lsusb -t`)
lsusb -t
lsusb -v -d XXXX:XXXX 2>/dev/null | grep -i 'bcdUSB\|bInterfaceClass'

# 2. Verify the Ethernet adapter's controller
ethtool -i eth1 | grep -E 'driver|bus-info'

# 3. Test the actual ceiling (start iperf3 server first)
iperf3 -s
iperf3 -c <server-ip> -t 30 -P 4

# 4. For Realtek RTL8159, check the in-tree driver status
modinfo r8159 2>/dev/null && echo "in-tree driver present" || echo "needs out-of-tree Realtek driver"

The bottom line

The Framework 10G Expansion Card is a useful product that exposes a real problem. It works when the bus, chipset, driver, and chassis all line up. "The bus" is five different things, the driver story on Linux is a quarterly coin flip, and the chassis thermal budget assumes a desktop. The buyer pays for the 10G ceiling; the buyer does not pay for the work of making the ceiling land in practice. Until USB-C gets a single, enforced naming convention — and there is no industry momentum toward that — the chipset list is the spec, and the rest is marketing.

Disclosure

This post was drafted with AI assistance. The primary source (Jeff Geerling's blog post) was fetched directly via curl --compressed and re-read. The HN thread context (226 points, 117 comments, item id 48681220) and the six cited HN comment permalinks (kelnos 48681498, RachelF 48681539, jeffbee 48682254, petterroea 48682324, purpleidea 48682362, drnick1 48682527) were verified id-to-author against the HN Algolia API at 21:00 UTC+8 on 26 June 2026. All quantitative claims about the WisdPi card (9.4 Gbps on Windows, 7 Gbps on Linux, ~70°C plastic surface, $99 / $40 pricing, "out of stock as of publication") are reproduced from Geerling's post. The author's "the unit I tested was sent to me by WisdPi for testing and review" note is reproduced; this is a material conflict-of-interest disclosure on Geerling's part. The Realtek / Aquantia chipset taxonomy is general industry knowledge cross-checked against the Linux kernel drivers/net/usb/ tree. The WisdPi product page on wisdpi.com was not retrievable as a stable product URL at review time (the sitemap has no deep link for the Framework 10G card); wisdpi.com is cited as the company root. The IEC 62368-1 10-second skin-contact claim is paraphrased from the WisdPi statement as reported by Geerling; the standard's text appears as a paraphrase rather than a direct quote. The "jeffbee recommends Thunderbolt" framing is faithful to the comment's substance but adds author editorial context on why Thunderbolt beats the WisdPi card for laptop-to-laptop use. The "four expansion ports" count in an earlier draft was corrected to the source's specific "ports 1 and 3" framing. The ~25% throughput figure is derived from Geerling's 7 Gbps / 9.4+ Gbps measurements. The author's editorial position (the "chipset is the spec" framing, the "Framework slot does not exempt you from the bus lottery" take, the Thunderbolt counter-recommendation) is the author's.

Sources

  • Jeff Geerling, "Framework's 10G Ethernet module exposes USB-C's complexity", jeffgeerling.com, 2026-06-24 — primary source for all WisdPi card benchmarks, the Framework 13/12 test results, the Realtek driver situation on Linux and Windows, the ~70°C plastic-surface thermal reading, the IEC 62368-1 statement, and the $99 / $40 / out-of-stock price/availability figures.
  • Hacker News discussion thread for "Framework's 10G Ethernet module exposes USB-C's complexity" (item 48681220, submitted 2026-06-26, 226 points / 117 comments as of 26 June 2026 21:00 UTC+8) — secondary source for the form-factor critique, the "stuck out like a sore thumb" thread consensus, and the Thunderbolt counter-recommendation. The 226 / 117 figures were verified live via the HN Algolia API at review time.
  • WisdPi company root, wisdpi.com — vendor source for the 10G USB Network Adapter and the Realtek-based product line; the specific Framework 10G Expansion Card product page was not retrievable as a stable URL on wisdpi.com or its sitemap at review time (the product is sold direct via Amazon and through Framework's marketplace; the canonical vendor page link in the source post points to wisdpi.com but the deep link was not resolvable).
  • Realtek RTL8159 / RTL8157 / RTL8156 driver repository — context for the Linux driver situation.
  • USB 3.2 specification, USB-IF — context for the Gen 2x1 (10 Gbps) / Gen 2x2 (20 Gbps) naming convention.