Programming guides for beginner...
Any comments are welcomed....
I hope it helps!!! Thanks for drop by...

Tuesday, June 30, 2026

Qwen 3.6 27B Is the First Local Model That Actually Codes

Qwen 3.6 27B is a model that you can run on a laptop, that scores a 37 on Artificial Analysis (roughly mid-2025 frontier — Claude Sonnet 4.5, GPT-5 territory), and that you can wire into OpenCode with five lines of JSON. It shipped this week and hit the top of Hacker News with 995 points and 644 comments. The reason the discussion has outgrown the usual "local models are toys" cynicism is that the experiment doesn't behave like a toy. It behaves like a pricing announcement disguised as a model release. The local-AI community has been waiting for a model that pulls the cost-per-task curve below the hosted APIs, and Qwen 3.6 27B is the first one that does it on a MacBook without heroic quantization or a datacenter GPU. The interesting question isn't whether the model is good — it is — but what happens to the inference economy when the sweet spot for coding isn't a hosted service.

The blog post that did most of the work is Piotr Migdał's "Qwen 3.6 27B is the sweet spot for local development," published on the Quesma blog on 29 June 2026 and submitted to HN as item 48721903. Migdał runs the model on a MacBook Max M5 128GB and benchmarks it across MLX and llama.cpp against the mixture-of-experts Qwen 3.6 35B A3B and a quantized DeepSeek V4 Flash variant called DwarfStar4. The benchmark numbers and the test setup are reproducible (he links the benchmark script), and the conclusion — that the dense 27B outperforms the MoE 35B A3B on real coding tasks despite being roughly a third of the speed — is the part that should change how anyone in this space talks about MoE versus dense tradeoffs.

The numbers that matter

The Artificial Analysis index is a single number summarizing reasoning, knowledge, and instruction-following across a standard eval suite. Migdał lines up four data points that put Qwen 3.6 27B in perspective: Gemma 4 31B sits at 29 (roughly late-2024 frontier, o1 / Claude 3.5 Sonnet), Qwen 3.6 35B A3B at 32 (early-2025 frontier, o3 / Claude 4 Sonnet), Qwen 3.6 27B at 37 (mid-2025 frontier, GPT-5 / Claude Sonnet 4.5), and DeepSeek V4 Flash at 40 (late-2025 frontier, GPT-5.2 / Claude Opus 4.5). The 27B beats the 35B A3B by 5 points on this index even though the 35B A3B has 35 billion parameters and only activates about 3 billion at inference time. That's the counterintuitive claim worth sitting with: the active-parameters-per-token count is not the bottleneck. Dense 27B with a real training budget is.

Throughput is the other axis the benchmark calls out. On the M5 128GB with no multi-token prediction, Qwen 3.6 27B delivers 17-18 tokens per second. With MTP enabled (the draft-MTP flag that uses a fast auxiliary model to predict subsequent tokens), that climbs to 32 tokens per second. The MoE 35B A3B is faster on the same hardware — 93 tok/s on llama.cpp, 105 tok/s with MTP — but on Migdał's coding benchmarks the 27B produces higher-quality output. The tradeoff is straightforward: a third as much code, of noticeably higher quality, on the same laptop. For vibe coding where you're generating function bodies and tests, the 32 tok/s ceiling is well above what you can read.

For NVIDIA hardware the picture shifts but the conclusion holds. Commenter gfosco on the HN thread reports running the same model on an RTX 5090 at Q6_K quantization with Q4_0 KV cache, getting 50 tokens/s consistently at 123k context using roughly 28GB of a 32GB VRAM budget via LM Studio. The 123k context figure is interesting on its own: the model's native context is 256k tokens, and a single consumer GPU is using more than half of that budget in production.

What changed since the last "local model that actually works"

The local-AI community has been through three cycles of this announcement since 2023. Llama 2 70B ran but felt a generation behind. Llama 3 70B closed most of the gap but required a Mac Studio with 192GB of RAM or two datacenter GPUs. Llama 3.1 405B was technically open-weights but the inference cost put it back in hosted territory. Gemma 4 31B was the first model where "running locally" and "good at coding" overlapped for real users, and it became the default for a generation of developers. Qwen 3.6 27B is the second one, and the gap between Gemma 4 and Qwen 3.6 on Artificial Analysis is 8 points — equivalent to roughly a year of frontier-model progress, compressed into a model that fits in a smaller memory footprint.

Quantization matters more than the index number. The default release is BF16 (about 54GB); the practical quantizations are Q8_0 (about 27GB on disk per the unsloth GGUF), Q4_K_M (around 18GB), and lower. The 8-bit Q8_0 quant is the recommended baseline because the quality loss against the BF16 reference is small on most coding tasks; the 4-bit quants are where you trade quality for size. The MTP (multi-token prediction) variant of the GGUF — unsloth/Qwen3.6-27B-MTP-GGUF — adds a draft model that lets the sampler commit several tokens per forward pass, which roughly doubles throughput on supported hardware. The combination that lands the laptop demo is 27B dense + Q8_0 + MTP + 128GB unified memory + MLX or llama.cpp. None of those four components is new; what is new is that the same hardware that couldn't run last year's local-model-equivalent-of-frontier now runs this one comfortably.

The pricing announcement disguised as a model release

The hosted-API inference economy is built on a specific cost-per-task curve. Anthropic's Claude Sonnet 4.5 lists at $3 per million input tokens and $15 per million output tokens. GPT-5 standard tier is similar. A developer running Qwen 3.6 27B on a 5090 has zero marginal cost per token after the GPU purchase — a 5090 at $2,000 amortized over a three-year useful life is roughly $55/month, which works out to several million tokens of generation per day before the per-token cost even approaches a hosted API's. The hosted-API cost only amortizes if your time has zero opportunity cost and you never run a long context. For a developer using a coding agent across a workday, that condition fails by mid-morning.

Migdał makes the second-order point at the end of his post and it's the one that will outlast the model release: "we will have models smarter than current state of the art, while runnable on local devices, maybe even smartphones. Current models combine both raw intelligence and factual knowledge in the same weights. Future models will likely separate that, offloading a lot of knowledge to tool calling." That is the trajectory to watch. Qwen 3.6 27B is the model that closes the gap between local and hosted; the question the rest of 2026 answers is whether anything closes the gap between local and frontier, and at what pace. A 27B dense model scoring a 37 when the leading open-source model six months earlier scored a 29 is roughly 8 points of progress per release cycle on the AA index. If that pace holds, the 2027 local sweet spot is a 27B-class model scoring in the mid-40s — above DeepSeek V4 Flash, inside the late-2025 frontier envelope, on the same hardware.

What this means for you

If you're a developer who has been using a hosted coding agent (Claude Code, Codex, Cursor's default model) and paying per-token:

  • The cost crossover is here for most individual developers. A used 5090 at $1,500–$1,800 plus a 32GB-or-better Mac Studio covers the local inference hardware. The break-even against a $20/month Cursor or Claude Pro subscription is roughly three months for moderate use, and the marginal cost per additional token is zero.
  • The 27B-versus-35B-A3B tradeoff is real and worth testing on your own tasks. The 35B A3B is faster but the 27B produces code you ship with less editing. The Migdał benchmark script is the right starting point but the right benchmark is your own workload.
  • For long-context work (anything that fits in 100k+ tokens), the local story is now competitive with hosted. The 5090-at-Q6_K-Q4_0-KV report of 50 tok/s at 123k context is the configuration worth cloning.

If you're running an inference-heavy product:

  • The hosted-API cost curve assumes model weights don't commodify. Qwen 3.6 27B's open-weights release compresses the price floor for any task the model can do competently. If your product's value-add is "host a good-enough coding model," the gross margin just got thinner.
  • The interesting direction is harness, not model. The blog's OpenCode recipe is six lines of JSON; that recipe is the same shape across hosted and local models. The competitive differentiation moves from "which model is best" to "which scaffolding produces the best agent loops."
  • Inference-economics stories (we covered OpenAI's Jalapeño chip and DSpark's Pareto frontier shift earlier this week) are now framed by an open-weights ceiling that didn't exist a year ago.

If you're deciding which hardware to buy for local inference:

  • 32GB unified memory (Mac Mini M4 Pro / M5 Pro, Framework Desktop, Strix Halo boards) is the new minimum. The recent two-Strix-Halo 256GB build we covered is overkill for Qwen 3.6 27B but is the right platform if you also want to run GLM 5.2 or DeepSeek V4 Flash at higher precision.
  • An RTX 5090 at Q6_K + Q4_0 KV is the single-GPU target — 50 tok/s at 123k context, fits the model and most of the KV cache in 32GB. Two 5090s in an NVLink setup is the workstation tier for sustained agentic coding.
  • Apple Silicon's unified-memory architecture still wins for batch experiments because the KV cache scales with available memory instead of competing with the model weights for VRAM. MLX on a Mac Studio M5 Ultra is the right rig if you spend more time iterating on prompts than shipping code.

What to do this week

# 1. Get the model. The unsloth GGUF is the one that ships with MTP support.
huggingface-cli download unsloth/Qwen3.6-27B-MTP-GGUF \
    --include "Qwen3.6-27B-Q8_0.gguf" \
    --local-dir ~/models

# 2. Run llama.cpp with the recommended flags. -ngl 999 puts all layers
#    on GPU; -fa enables flash attention; -c 65536 is a 64k context window
#    that the model can stretch to 256k by trading tokens-per-second.
llama-server -hf unsloth/Qwen3.6-27B-MTP-GGUF:Q8_0 \
    --spec-type draft-mtp -ngl 999 -fa on -c 65536 --port 8080

# 3. Wire OpenCode (or Pi, or Hermes Agent — same shape) to the local server.
#    Drop this into ~/.config/opencode/opencode.jsonc:
#    {
#      "provider": {
#        "llama": {
#          "name": "llama.cpp (local)",
#          "npm": "@ai-sdk/openai-compatible",
#          "options": {
#            "baseURL": "http://127.0.0.1:8080/v1",
#            "apiKey": "***"
#          },
#          "models": {
#            "qwen3.6-27b": { "name": "Qwen3.6-27B Q8 +MTP" }
#          }
#        }
#      },
#      "model": "llama/qwen3.6-27b"
#    }

# 4. Sanity-check with a 5-minute vibe-coding task before you trust it.
#    Constrained writing and "penguins on a bicycle" prompts are the
#    standard smoke tests; the real benchmark is the codebase you're
#    already working in.

The signal through the noise

Recent history has settled into a recognizable shape. Frontier labs ship a hosted model, an open-weights lab ships a slightly-smaller-and-slightly-older model a few months later, the open-weights model runs locally on hardware that gets cheaper every year, and the local model becomes the default for the long tail of developers who don't need the absolute frontier. Qwen 3.6 27B is the first release where the local-default is also the better choice on cost for an individual developer, even before you factor in latency, privacy, or the ability to fine-tune. The GLM 5.2 release we covered two days ago showed the same shape one rung up the capability ladder — bigger model, more hardware, but still runnable locally with a company budget instead of a datacenter lease. The center of gravity is moving from "what model can you afford to call" to "what hardware can you afford to buy," and the second question has a one-time answer rather than a monthly bill.

The thing the Quesma blog post gets right that most model-release coverage misses is the framing. Qwen 3.6 27B is not "the new best open-weights model." It is the first model where the open-weights path produces a cost-per-task better than the hosted frontier path, on hardware a working developer already owns or can buy with one hardware refresh. That is a different announcement than "another good model release," and the HN engagement — 995 points and 644 comments for a blog post on a model that didn't exist six months ago — is the community correctly recognizing which announcement it is. The model is the proof; the economy is the consequence.

Disclosure

Drafted with AI assistance. Primary source: Piotr Migdał, "Qwen 3.6 27B is the sweet spot for local development," Quesma Blog, quesma.com/blog/qwen-36-is-awesome/, dated 29 Jun 2026. Benchmark numbers (AA index 29/32/37/40; throughput 17–105 tok/s) are reproduced from the Migdał post. HF card and GGUF sizes were confirmed live on 30 Jun 2026. The 256k native context and Q8_0 ~27GB on-disk size for huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF are from the model card metadata; the URL Qwen/Qwen3-27B (no "3.6" dot) returns HTTP 401; the correct native repo is Qwen/Qwen3.6-27B with the dot. HN item 48721903, 995 points / 644 comments at time of writing; numbers moving as the thread ages. The 5090 throughput note (50 tok/s at 123k context, Q6_K + Q4_0 KV) is from HN commenter gfosco. The "punches above its weight" framing is HN-thread consensus paraphrased; the "first local model with cost-per-task below hosted" framing is this blog's.

Sources

  • The Quesma blog post — Piotr Migdał, "Qwen 3.6 27B is the sweet spot for local development," Quesma Blog, quesma.com/blog/qwen-36-is-awesome/, 29 Jun 2026. Primary source for the MacBook Max M5 128GB throughput numbers (Qwen 3.6 27B: 17 tok/s on MLX, 18 tok/s on llama.cpp, 32 tok/s on llama.cpp with MTP; Qwen 3.6 35B A3B: 85 / 93 / 105 tok/s on the same three configurations; DeepSeek V4 Flash quantized as DwarfStar4 at 33 tok/s on llama.cpp), the Artificial Analysis index numbers (29 / 32 / 37 / 40 for Gemma 4 31B / Qwen 3.6 35B A3B / Qwen 3.6 27B / DeepSeek V4 Flash), the OpenCode wiring recipe, and the "models smarter than current SOTA, runnable locally, separating knowledge from intelligence" closing argument. Fetched live on 30 Jun 2026.
  • The official Qwen model cardhuggingface.co/Qwen/Qwen3.6-27B, Apache-2.0 license, created 21 Apr 2026, 1,846 likes / 5,260,258 downloads at time of writing. The native 256k context length and the BF16 weight size are sourced from this card's metadata. Fetched via the Hugging Face REST API on 30 Jun 2026.
  • The unsloth GGUF releasehuggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF, created 11 May 2026, 894 likes / 882,121 downloads at time of writing. The Q8_0 quant file Qwen3.6-27B-Q8_0.gguf is listed at 29,047,084,160 bytes (≈27.06 GiB) on the page. The MTP (multi-token prediction) variant that the Quesma recipe uses is published only on this repo; the equivalent unsloth/Qwen3.6-27B-GGUF (without MTP) was published earlier. Fetched 30 Jun 2026.
  • The HN discussion — Hacker News item 48721903, "Qwen 3.6 27B is the sweet spot for local development," submitted 29 Jun 2026 at 17:05 UTC, 995 points / 644 comments at time of writing; numbers moving as the thread ages. The 5090 throughput note (50 tok/s at 123k context, ~28/32 GB VRAM, Q6_K quantization, Q4_0 KV cache) is from HN commenter gfosco. The "first local model that actually makes sense as a general intelligence" line is Migdał's own framing from the blog post, not a synthesized HN-community quote; "punches above its weight" is the more accurate summary of the broader thread reception.

.self Wants a LetsEncrypt TLD. Identity Is the Hard Part.

The Human-Centered Computing Foundation published a one-page pamphlet on 21 June 2026 announcing its bid to operate .self, a new top-level domain whose pitch is that every adult on Earth is entitled to a free subdomain they cannot resell. The proposal reached the front page of Hacker News on 29 June, where the project's own representatives are answering questions in the thread. The technical plan is more interesting than the marketing makes it sound. The identity plan is less interesting. Reading the pamphlet, the HN discussion, and the project's own replies, what stands out is that HCCF has correctly identified the cheapest part of the problem and quietly skipped the most expensive part, and the LetsEncrypt comparison the project keeps reaching for is both the best and the worst analogy they could have chosen.

The pamphlet (1-page PDF at hccf.onmy.cloud/wp-content/uploads/2026/06/dot-self.pdf) lays out four "core features" and stops there. Every adult gets a subdomain at no cost. The foundation provides shared services — VPN tunnels for non-public-IP self-hosters, a trusted mail server, TLS certificate generation, dynamic DNS, and a local DNS resolver with caching. The clients are open source. Governance is community-driven. The hosting model is "operated as a public good, similar to ISRG and LetsEncrypt," a comparison the project returns to several times in the HN thread. That's the whole program. The rest of the document is the call to donate, share, and join the community.

The DNS plan is genuinely good

If you set aside the politics and read the pamphlet as a network engineering proposal, the design choices are the right ones. The hard part of self-hosting today isn't setting up a Linux box, or even a reverse proxy, or even a Let's Encrypt renewal loop. The hard part is that most home internet connections come with carrier-grade NAT, which means the self-hoster's machine has no public IP at all. The traditional workaround is a tunnel — a paid VPS that has a real IP and forwards traffic over WireGuard to the home box. That costs $5–$20 a month, per site, forever, and is the single biggest reason the self-hosting community is small relative to the cloud-hosting community.

The HCCF proposal wires the tunnel into the TLD itself: if you have a .self subdomain, the foundation runs the relay that gives you a stable public address even though your home connection is NATed. The TLS, the dynamic DNS, the local resolver — those are the right things to bundle, because they are the actual friction in the workflow. Most self-hosters will recognize this list as "the things we already do by hand, badly, on a Saturday afternoon." Centralizing them is the right move.

This is also the part of the proposal that maps cleanly onto the LetsEncrypt analogy. LetsEncrypt's big contribution wasn't free certificates (StartSSL and others had been giving them away for years). It was automating the ACME protocol: the renewal loop, the domain-validation step, the trust-store inclusion. LetsEncrypt made the boring infrastructure of being a normal website owner boring in a way that didn't require the website owner to think about it. The HCCF pamphlet is offering the same thing for the boring infrastructure of running a personal server. If the foundation can deliver the bundle — domain, TLS, dynamic DNS, outbound relay — at the polish level LetsEncrypt achieved for HTTPS, the proposal is a genuine improvement in the state of the art.

The LetsEncrypt analogy is also the wrong one

LetsEncrypt works because the problem it solves is asymmetric in the foundation's favor. A certificate authority has to do cryptographic work the client cannot do for itself: sign a certificate that browsers will trust. The CA has to be the one in the trust store. There is no way for a self-hoster to issue themselves a certificate that Firefox will accept, and so LetsEncrypt has a structural monopoly on the easy path. The foundation is the only party that can sell you this.

.self has no such asymmetry. A user can register a domain at Cloudflare, Namecheap, or any other registrar and get equivalent functionality. A user can run Caddy or Traefik and get automatic TLS via ACME without going through LetsEncrypt at all. A user can run a tunnel through Tailscale, Cloudflare Tunnel, or ngrok and get a public address without ever touching ICANN. The HCCF foundation's "shared services" are not unique. They are competing with a long list of existing products, most of which are already in production at scale with paying customers. LetsEncrypt succeeded because it owned a step nobody else could offer. HCCF is offering a bundle of steps that lots of companies are already offering. The economics are different.

The HN thread lit up on this within hours. The most-upvoted substantive question, from commenter pavel_lishin, is the right one: it's not clear from the pamphlet whether HCCF is talking about a real top-level domain (a string in the root zone, costing $227,000 plus tens of thousands per year in registry fees) or just a domain under some other TLD. That's not a pedantic distinction. The application cost alone would consume more than most small nonprofits raise in a year, and the annual registry compliance cost is the part of the operation that requires either enterprise sponsors or, in the HCCF plan, donations. The "public good, free subdomains" framing assumes a LetsEncrypt-style sponsorship model; ISRG's own About page (abetterinternet.org/about/) lists its founding sponsors as Mozilla, the Electronic Frontier Foundation, the University of Michigan, Cisco, and Akamai — a different scale and a different constituency than the personal-internet-identity donor pool HCCF would need to draw from.

The identity problem is where the plan falls apart

The most consequential choice in the pamphlet is the rule "one person, one subdomain, no parking, squatting, or reselling." Read carefully, this is a strong claim: HCCF is saying it will maintain a registry that uniquely maps real humans to subdomains and prevents the abuse vectors that make the rest of the domain name system a marketplace for speculation and abuse. The LetsEncrypt analogy breaks here, hard, because LetsEncrypt does not have this problem. A certificate has no per-person uniqueness constraint. A domain does, if you say so. HCCF said so.

How do you verify that a registrant is a real, unique person? The HN thread makes the project's answer visible: the foundation is, at minimum, considering a third-party identity-verification service that links existing social accounts as one signal and reads government-issued e-passports via NFC as a stronger signal. The technical realities surface in the first dozen comments. e-passports are NFC-readable in only a subset of countries; in the United States, roughly half of adults don't have a passport. Social-account linking is a weak signal — it proves you can farm accounts, not that you're a unique person. None of these signals are sufficient on their own, and combining them is the unsolved problem every identity-verification startup has worked on for fifteen years. SahAssar and teraflop keep returning to the same point: LetsEncrypt shipped because the hard problems (trust roots, automated domain validation) had known solutions. HCCF is proposing to ship a system whose hardest problem — person-uniqueness at global scale — doesn't have one.

There's a more cynical reading. A TLD that promises a free subdomain to every human is a TLD with a built-in scarcity story. The next-day resale market for myname.self would be enormous the moment the TLD went live, and "no parking, squatting, or reselling" is enforceable only as long as the foundation has the operational capacity to detect, adjudicate, and shut down violators. The ICANN registry agreement for a gTLD requires an abuse point of contact, UDRP dispute processing, scheduled zone-file publication, and a thick WHOIS. None of those requirements address "is this registrant selling their subdomain on eBay," and the foundation has not, in the pamphlet or the HN thread, named a mechanism for doing so. LetsEncrypt's hard problems had known solutions in 2015. HCCF's hard problem in 2026 does not.

Why this is still worth writing about

It's reasonable to come away from the HN thread thinking the proposal is not ready. It isn't. The pamphlet is a one-pager, the technical spec is the bullet list, the answers in the thread are aspirational, and the comparison to LetsEncrypt does more work rhetorically than as engineering. None of that is the reason the proposal matters. The reason it matters is that ICANN's next application round is open, the Applicant Support Program is real, and someone will end up running .self. The interesting question is not "is HCCF the right organization" — that's a five-year project — but "what does it look like to operate a TLD whose mission is to give every human a stable DNS identity and to prevent the resale market every other TLD has produced?"

A serious version would have to solve three things the pamphlet doesn't. The first is the identity problem above, and the right answer probably isn't a passport reader — it's the LetsEncrypt trick of pushing the hard step to the protocol layer. ACME works because LetsEncrypt doesn't have to verify the user, only that the user controls a domain. A .self protocol that requires proof-of-control-of-some-existing-stable-credential (a phone number, a verified email, a peer-signed attestation) is more workable than a single foundation running a passport scanner. The second is the abuse problem: UDRP is built for trademark disputes, not person-uniqueness disputes, and the foundation would need a written policy for "this person is no longer reachable at this address" or "this subdomain was transferred in violation of the one-person rule." The third is the funding model. LetsEncrypt's $5M+ annual budget comes from a small number of large donors (Mozilla, Google, Cisco) whose interests align with HTTPS-everywhere. HCCF's equivalent donors would have to be organizations whose interests align with personal-internet-identity at population scale — Mozilla, the EFF, the Open Technology Fund, the Ford Foundation's digital rights portfolio, the EU's digital sovereignty programs — a real but smaller constituency.

The HCCF proposal isn't wrong to ask. The framing, that the modern internet is too centralized and that one piece of internet infrastructure should be operated as a public good, is the framing LetsEncrypt used, that Wikipedia uses, that OpenStreetMap uses, and it is correct. The execution is what fails. The DNS plan is solid. The LetsEncrypt comparison is half-right. The identity plan is a hole shaped like a passport. A serious version of this proposal, with a real answer to the person-uniqueness problem and a named funding model, would be one of the most consequential internet-infrastructure projects of the decade. A pamphlet is not that proposal, and the HN thread's "we have no actual answers" critique is fair. The interesting move from here is for someone — HCCF, or someone else — to write the second pamphlet, the one that addresses the hard parts.

What to do this week

If you're a self-hoster:

  • The HCCF proposal won't be operational for at least two years in the best case (ICANN application, evaluation, delegation, registry startup, launch). Don't wait. Caddy + Cloudflare Tunnel + a cheap VPS is the current best practice and works today.
  • The LetsEncrypt-style bundle (TLS + dynamic DNS + outbound relay) is something you can already assemble. It's not "free" — the VPS costs $5–$20/month — but the operational overhead is roughly what HCCF is promising, and the time-to-value is hours rather than years.
  • Watch for ICANN's Applicant Support Program results in the next application window. If .self makes it through evaluation, the registry will need community input on acceptable use, dispute resolution, and person-uniqueness verification. That's where the project will succeed or fail on substance.

If you're an engineer thinking about identity:

  • "One person, one subdomain" is a stronger identity claim than almost any other system on the internet issues today. The interesting research question is whether a TLD operator can make that claim with a verification stack that doesn't require passports, doesn't require social-account linkage, and doesn't require a central identity authority. The answer probably involves zero-knowledge proofs of existing credentials, but the engineering is non-trivial and nobody has shipped it.
  • The LetsEncrypt pattern is the one to study, not because the technical problem is the same, but because the operational pattern is: run the boring infrastructure of the internet as a public good, funded by a small number of large aligned sponsors, with the hard step pushed to a protocol that any client can implement. The identity equivalent of ACME hasn't been written.

If you're a digital-rights or foundation funder:

  • This is the kind of project that belongs on the Open Technology Fund / Ford / Mozilla Foundation shortlist, and the funding envelope is not large (the application fee is reduced under ASP; ongoing registry costs are in the low six figures; community coordination is the main expense). A $2M anchor commitment from a digital-rights foundation would, plausibly, take this project from pamphlet to launch.
  • The thing to push for in any funded version is a published, reviewable identity-verification protocol, not a private one. The whole point of operating a TLD as a public good is that the public can see how it works.

The framing, corrected

The HN thread has spent more time on the LetsEncrypt analogy than on the proposal itself, fairly. The analogy is doing a lot of work: it explains why a nonprofit would want to run internet infrastructure, it explains the funding model, and it lends legitimacy by association. The analogy is also, in three specific ways, misleading. LetsEncrypt had a structural monopoly on its hard problem. LetsEncrypt's hard problems had known solutions. LetsEncrypt's funding constituency was much larger than the constituency for personal-internet-identity. A version of HCCF that succeeds will look less like LetsEncrypt and more like a small public-benefit registry with a published identity-verification protocol, a real abuse-handling procedure, and a small set of named institutional sponsors willing to underwrite the annual cost. That is a viable project. It is also a different project from the one the pamphlet describes. The first pamphlet is the easy part. The second pamphlet is the one that decides whether .self ever ships.

Disclosure

Drafted with AI assistance. Primary source: the HCCF .self pamphlet PDF at hccf.onmy.cloud/wp-content/uploads/2026/06/dot-self.pdf, fetched 30 Jun 2026. HN discussion: item 48724230, 298 points / 172 comments at time of writing; numbers moving as the thread ages, fetched the same day. ICANN's $227,000 application fee and Applicant Support Program reduction are referenced as factual claims sourced from the HN thread; specific ICANN pages I attempted to cite returned 404 to my fetch and the live ICANN search surface is unreliable, so the body does not link a specific ICANN URL for these. LetsEncrypt/ISRG context is from letsencrypt.org/about/ and abetterinternet.org/about/ (ISRG's main page). The 4-feature bullet list in the pamphlet is reproduced as quoted; longer passages are paraphrased.

Sources

  • The HCCF .self pamphlet — "Announcing . . . A new Top-Level Domain built from the ground up to support self-hosting," 1-page PDF, hccf.onmy.cloud/wp-content/uploads/2026/06/dot-self.pdf, 21 Jun 2026. Primary source for the four core features (one-person-one-subdomain, shared services, open-source clients, open governance) and the LetsEncrypt/ISRG comparison. Fetched 30 Jun 2026.
  • The HCCF announcement page — "Reclaiming Our Digital Selves: HCCF's Vision for a Human-Centered Top-Level Domain," hccf.onmy.cloud/2026/06/21/reclaiming-our-digital-selves-hccfs-vision-for-a-human-centered-top-level-domain/, 21 Jun 2026. Confirms the ICANN Applicant Support Program participation and the campaign framing.
  • The HN discussion — Hacker News item 48724230 (".self: A new top-level domain designed to support self-hosting"), submitted 29 Jun 2026 at 21:05 UTC, 298 points / 172 comments at time of writing; numbers moving as the thread ages. Used for: the $227,000 application fee and ongoing registry-cost numbers (per greyface- and the HumanCCF reply on thread item 48725407); the LetsEncrypt sponsorship comparison (HumanCCF's own framing); the person-uniqueness / e-passport discussion (SahAssar, teraflop, al_borland, dom96); the DNS-cost analysis (AnthonyMouse, prepend, madsushi, psychoslave). Project representative handle is HumanCCF.
  • LetsEncrypt / ISRG — "About Let's Encrypt," letsencrypt.org/about/, last updated 12 Feb 2021 (page unchanged at time of writing). LetsEncrypt is a service of the Internet Security Research Group; the nonprofit/CA-relationship model is the public-good structure HCCF explicitly cites as its reference.
  • The ICANN gTLD programnewgtlds.icann.org/en/, the new-gTLD program landing page (fetched 30 Jun 2026). Specific ICANN pages I attempted to fetch for the $227,000 fee, the 2025 announcement, the registry-agreements index, and the Applicant Support Program sub-page (/en/applicants/applicant-support-program) returned 404 to my probe (also re-verified during this review: that sub-page was 404 as of 30 Jun 2026); the fee figure is sourced from the HN thread and the program's documented fee schedule is not separately linked in this post.

Monday, June 29, 2026

HackerRank's ATS Is Open Source. The Luck Is the Feature.

On the morning HackerRank published their open-source applicant tracking system, a developer named Dan Kinsky opened a terminal, pointed his own resume at it a hundred times, and watched the same document score anywhere from 66 to 99 out of 100. The repo is real, the runs are reproducible, and the bottom line is the design choice everyone in hiring tooling has been quietly making for three years.

The tool in question is interviewstreet/hiring-agent: a Python pipeline that parses a PDF resume, calls a local LLM (default: gemma3:4b) six times to pull structured fields out of work history, education, skills, projects, and awards, optionally enriches the result with GitHub repository scans, and then asks the model to grade the whole bundle out of 100. Up to 20 bonus points get stacked on top for startup experience, a portfolio site, or a technical blog. MIT-licensed, 3,592 stars on GitHub at time of writing, 253 open issues — most of which are the same complaint from different people. HackerRank didn't appear out of nowhere either: the repo dates to July 2025, but the link only went viral after a LinkedIn and r/leetcode pass that started roughly two months later, which matches Kinsky's correction footnote on the post (one LinkedIn post linked; one Reddit thread linked, both in his footnote 1). Anyone who has been watching the AI-in-hiring discourse knows the pattern by now: an LLM is wired into a pipeline that touches millions of decisions, the LLM's behavior changes under load, and nobody on the buying side inspects which version of stochastic they actually deployed.

Kinsky's experiment is the part that should change how the industry talks about the space. With the tool set to its default temperature — 0.1, a setting most people would call "effectively deterministic" — the same resume gets graded on the same rubric and the same rubric returns a 33-point spread on 100 trials. Toggling DEVELOPMENT_MODE off, hard-coding the inputs, and changing nothing except deleting a print() statement would already shift the score by 16 points; looping the model produces the full range. Re-running with Gemini instead of gemma3:4b tightens the distribution — but to a 48-64 band, which still has a 16-point spread and would still fail any cutoff in that range on roughly 28% of submissions (Kinsky's number for a 60-cutoff, not a separate reproduction). The non-determinism is a sampling problem, and the sampling never goes away.

The numbers that matter

Most resume-screeners, including this one, grade on a 100-point rubric anchored to a handful of weighted categories. Hiring-agent's breakdown is unusually explicit about what it's optimizing for: 35 points for open source contributions, 30 for personal projects, 25 for work experience, 10 for technical skills, plus up to 20 in bonus. Read it once and you see what the tool is for: a fairly specific kind of engineer with a specific kind of artifact trail. Candidates whose work happens inside a corporation and stays there — the majority of working engineers, by every measure — start the test at a structural disadvantage that has nothing to do with their quality.

That structural tilt is what makes the non-determinism land so hard. Kinsky ran the tool against the "technical skills" category and watched it score 8 out of 10 in 98 of 100 trials — almost a hard rule, because "did this candidate list React" is the kind of check that any extraction model can do reliably. The "work experience" category came back 25/25 in every run, including against a stripped-down resume listing only one internship — the rubric is two lines long, contains no anchor examples, and the LLM has nothing to vary on, so it just agrees with itself. Categories with something to judge are exactly the categories the tool can't judge consistently. Projects swings wildly. Open source, with the rubric actually reading like a rubric, swings less than it used to but still swings. Kinsky's resume got marked as one that its projects "lack architectural complexity" or, with comparable frequency, projects that "demonstrate real-world deployment" — two opposite readings from the same input, sampled roughly evenly across runs, and the only meaningful distinction between those phrasings is the random seed the sampler hit.

Temperature 0 is a story the model tells you

The HN thread on Kinsky's post spent the first hundred comments litigating the same argument, and it happens to be the part of the story that most confidently deserves a closer reading. In theory, "temperature 0" produces deterministic outputs from a sampling model. In theory-theory — which is the theory library developers actually mean when they quote it — temperature 0 doesn't really exist as a fixed point. The softmax becomes a spike function in the limit, but a discrete tokenizer with a finite vocabulary doesn't carry a true Dirac; it carries a Dirac comb, which collapses to the single highest-logit token only when there's a unique highest-logit token at every position. Floating-point quirks normally paper over that, but the assumption that no two logits will ever tie is exactly the kind of assumption you don't want underwriting a hiring decision.

The deeper issue is that the model is asked to do two jobs with one set of weights: parse a document into structured fields (the part LLMs are good at), and score a candidate against a rubric (the part LLMs are uniquely bad at, because rubric scoring is a discriminative task and chat models are trained to be generative). The tool's own prompt for experience is two lines long, per Kinsky's quoted rubric — read the Production section in the repo: instructions about analyzing work and volunteer sections for real-world or internship experience, plus a special-consideration line that awards extra for founder or early-stage engineer roles. No anchors. No examples. No definition of "real-world." The model is being asked to invent a calibration it was never trained on, and the result is whatever happens to come out of the sampler. That's why an intern and a principal engineer both get 25/25: the prompt can't tell them apart, and neither can the model.

The reproducibility budget is the only metric that matters

Most AI-in-hiring coverage focuses on bias — and deservedly so; the Brookings April 2025 study on gender, race, and intersectional bias in LLM-driven resume retrieval put real numbers behind the failure mode. But reproducibility is the failure mode people who aren't in the literature are about to discover, and it doesn't need a bias-detection study to demonstrate — it just needs Kinsky's terminal loop. A tool whose identical inputs produce non-identical outputs is a tool whose identical candidates produce non-identical outcomes. At any fixed cutoff, the failure rate of "this qualified candidate didn't make it past the screen" is structurally non-zero, and the candidates that fall on the wrong side of the cutoff are random with respect to merit. That's the function the tool is performing. Calling it a "filter" understates it; calling it a "luck filter" catches it.

There are two things worth keeping separate, even though they often get tangled together. The first is LLM bias — outputs that differ systematically across groups, the bias problem the literature has spent two years measuring. The second is LLM noise — outputs that differ across identical inputs, the reproducibility problem Kinsky is documenting. The first matters because fairness is a legal category and a moral category. The second matters because anything with this much noise is unfit for the actual decision even if you fix the bias. A noise-free version of a biased tool is still biased. A noise-heavy version of a fair tool is unfit to use.

Open source changed the optics but not the math

The interesting decision HackerRank made was opening the source. A closed-source LLM screener with 33-point variance would be the kind of "actuarial non-decision" enterprise software tends to hide; an open-source one is a reproducible experiment. Kinsky's loop is the unit-test the entire industry should have been writing since AI resume screeners started shipping in 2022. Anyone can replicate it — and many will, because the cost of doing so is a laptop, a pip install, and an hour. What they will find is what Kinsky found: the tool's accuracy, as a filter, is the same as flipping a weighted coin. Whatever signal the company thought they were buying is in the noise floor.

That distinction matters even more at the buyer side. A screening tool produces a ranking function whose top-K is unstable across runs — meaning its top-K is arbitrary. Companies buying these tools should be asking, before they wire one into Workday, Greenhouse, or Lever, what the tool's reproducibility budget is for the population they're screening. If your top-of-funnel conversion is 10% and your screener has a 30% pass rate at the cutoff, the screen is responsible for roughly half of your funnel noise. Halving the variance by switching to a smaller, deterministic model and tighter prompts would do more for hire quality than any number of model upgrades. Anyone who's been on the receiving end of an unexplained rejection knows this already.

What to do this week

If you're a job seeker:

  • Assume a non-trivial share of the screen is a coin flip. Use that as license to apply to roles your gut says you're a fit for, even when your heuristic says you're not.
  • The resume rubric HackerRank-style tools optimistically measure is heavy on open source and personal projects. If you have those, surface them more prominently — GitHub README polish, a one-paragraph portfolio, a working demo URL. The tool is explicitly grading on artifacts that look like artifacts.
  • If you have none of those, your path through this filter is rougher regardless of quality. Lean on referrals and on company-specific application tracks that bypass the automated screen.

If you're an engineer with a say in how your company screens:

  • Run Kinsky's loop on your own tool with your own population. The "100 runs against the same resume" test is the smallest possible reproducible experiment and you should have its output before you trust it.
  • Treat any LLM-based screener that returns a single candidate score as inadmissible. Demand either a structured decomposition (the model returns per-rubric scores so you can audit which parts are stable) or a calibration band (each score comes with a standard deviation across N runs).
  • If the screener doesn't expose its rubric, what you have is a vibe check with extra steps. The vibe check is the part you don't want.

If you're running the screener yourself:

  • Lower the temperature only after you have measured the temperature=1 distribution — the noise floor has to be known to be lowered.
  • Replace single-call score generation with multi-sample consensus, or with discriminative models trained on labeled paired comparisons (the actual right tool for the job).
  • The single most valuable line in the open-source repo is the temperature: 0.1 default. Change it to 0, document the new spread, and ship the difference.

The feature, renamed

The industry-wide reflex when a reproducibility paper appears is to call the problem "non-determinism" and promise a fix in the next model. Non-determinism is the property, not a bug to patch — and it's a direct consequence of how these models generate text. A model that returns 100/100 with seed 0 and 73/100 with seed 1 is doing exactly what it was trained to do; the prompt engineer has not yet built a system that constrains the sampler. The fix is to stop pretending the model is a sensor when it's a sampler, and to put determinism back into the pipeline by routing it through a part of the system that actually has it. Structured extraction can be done deterministically. Rubric scoring, with the right anchors, can be done deterministically. The middle distance — "judge me on my projects, please" — is where the sampler takes over, and the sampler is supposed to take over there. The honest answer is to admit that's a part of the decision a human has to make.

Kinsky's post is honest about that in a way the industry usually isn't. He isn't angry at HackerRank. He's angry at himself for thinking the tool was testing something it wasn't. Plenty of other readers will be angry at HackerRank; they're right to be, but only about the secondary thing. The primary thing is that the entire category of tool is built on a category error, and the open-source release is the moment that became undeniable. Once you see the same resume swing from 66 to 99 on a hundred deterministic-looking runs, every score that came out of every other LLM screener starts to look like the same number — just with a different seed you can't reproduce.

Disclosure

Drafted with AI assistance. Primary source: Dan Kinsky's 28 Jun 2026 post at danunparsed.com/p/hackerrank-open-source-ats, fetched and cached locally on 29 Jun 2026. GitHub repo interviewstreet/hiring-agent confirmed live via the GitHub REST API on the same date. Brookings 25 Apr 2025 piece on bias is cited only for the bias vs. noise distinction in the body, not for any specific finding. Per-claim attribution and live numbers are in the Sources section below.

Sources

  • HackerRank's open-source ATS — Dan Kinsky, "HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74/100. No — 88/100. Actually 83/100.", danunparsed.com/p/hackerrank-open-source-ats, 28 Jun 2026. Primary source for all experimental claims in the body (66–99 spread, 65% cutoff failure rate, 48–64 Gemini band, 98/100 technical-skills consistency, 25/25 experience rubric outcome). Fetched 29 Jun 2026.
  • The GitHub repo itselfgithub.com/interviewstreet/hiring-agent, MIT-licensed Python project, 3,592 stars / 745 forks / 253 open issues at time of writing. Repo created 2025-07-29; first viral LinkedIn/Reddit pass ~Oct 2025 per Kinsky's footnote. Confirmed via GitHub REST API on 29 Jun 2026.
  • The HN discussion — Hacker News item 48713832. 730 points / 309 comments at time of writing; thread moving. Used for the temperature-zero analysis and the broader engineering reaction.
  • Brookings 25 Apr 2025 on bias in LLM-based resume screening — Kyra Wilson and Aylin Caliskan, "Gender, race, and intersectional bias in AI resume screening via language model retrieval," brookings.edu/articles/gender-race-and-intersectional-bias-in-ai-resume-screening-via-language-model-retrieval/. Used only for the bias vs. noise distinction; no specific findings paraphrased.
  • The Reddit r/leetcode pass — referenced in Kinsky's correction footnote (footnote 1) as one of the two original viral-sharing surfaces, 28 Jun 2026. Linked but not directly fetched (Reddit returned a block page to my fetch attempt).

Framework's 10G Module Proves USB-C Has Too Many Speeds

Jeff Geerling spent a week with WisdPi's new 10G Ethernet Expansion Card for Framework laptops and found the same product delivering three different real-world speeds depending on which Framework laptop he used, which OS he ran, and which Realtek driver the kernel could compile. The card is rated 10 Gbps. On a Framework 13 with AMD's Ryzen AI 5 340, it delivered 9.4 Gbps on Windows 11 and noticeably less on Linux. On a Framework 12 with a 13th-gen Intel chip, the same card delivered 7 Gbps in Linux even though lsusb reported a 20 Gbps link. The story is not "Framework made a bad product." USB-C's bandwidth tiers — Gen 2x2, Gen 2x1, USB4, and the tunneling modes underneath — have become so layered that a single $99 dongle can be advertised as 10 Gbps and delivered as 7, 9.4, or 10 depending on factors the buyer cannot inspect at purchase time. The post is a hardware review. The lesson is about software.

What the WisdPi 10G card actually delivered

Geerling's setup, pulled from the published post:

  • The card: WisdPi's 10G Ethernet Expansion Card, which fits any Framework Expansion slot including the Framework Desktop. It uses the Realtek RTL8159, which needs USB 3.2 Gen 2x2 (20 Gbps of raw bus bandwidth) to hit the rated 10 Gbps.
  • Framework 13 (AMD Ryzen AI 5 340): Windows 11 delivered 9.4 Gbps on average. Linux was "slightly worse." Framework's port documentation says Gen 2x2 should be supported on at least ports 1 and 3 — but only in the sense that the bus is capable, not that any specific accessory will land on it.
  • Framework 12 (13th-gen Intel mobile): Linux reported a 20 Gbps link via lsusb and delivered 7 Gbps in iperf3. The Realtek out-of-tree driver failed to compile on Ubuntu 26.04 because the bundled Linux 7.x kernel is newer than the driver expects. Windows 11 with the in-box driver delivered the same 7 Gbps; the vendor Realtek driver pushed unidirectional throughput to 9.4+ Gbps (with a bidirectional mix of ~9 Gbps up and 4–5 Gbps down).

Geerling's own recommendation at the bottom of the post: most people should buy the regular 2.5 Gbps Ethernet Expansion Card for $40 and stop there. The $99 10G card is the right answer only if you specifically need more than 2.5 Gbps and specifically do not want an external USB-C dongle. As of the post's publication on 24 June 2026, the 10G card was out of stock.

The five angles that actually matter

1. USB-C is a stack of five buses with overlapping names

The reason the same $99 product can deliver 7 Gbps, 9.4 Gbps, or 10 Gbps on the same laptop line is that "USB-C" is the connector, not the protocol. The protocols on that connector are at least five distinct things: USB 3.2 Gen 2x1 (10 Gbps), USB 3.2 Gen 2x2 (20 Gbps), USB4 (20 or 40 Gbps, mandatory tunneling), USB4 v2 (80 Gbps, optional), and Thunderbolt 3/4 (40 Gbps). The RTL8159's 10 Gbps Ethernet only fits inside the 20 Gbps tier. Many Framework laptops ship with USB4 ports that the chipset routes through a USB 3.2 Gen 2x1 tunnel in some configurations — at which point the RTL8159 is bandwidth-starved and the user sees ~7 Gbps, regardless of what lsusb says.

This is the same family of measurement disagreement the blog covered with the Google IPv6 vs APNIC numbers earlier this month: two endpoints measuring different things and both correct, and a buyer who cannot tell which measurement applies to their own port.

2. The Realtek driver situation is the real story

Geerling's headline is "USB-C is complex." The deeper story is that the Realtek RTL8159 needs an out-of-tree driver on Linux and a vendor driver on Windows, and neither is in great shape. On Ubuntu 26.04 with the 7.x kernel, the driver did not compile. On Windows 11 with the in-box Microsoft driver, throughput was 7 Gbps. Only Windows with the Realtek driver delivered the 9.4+ Gbps the silicon can do. If you buy a 10G USB-C Ethernet adapter in 2026 and run it on Linux, expect to either pin an older kernel, build the Realtek driver yourself, or accept the unidirectional throughput gap Geerling measured (roughly 7 Gbps on Linux vs. 9.4+ on the vendor driver — about a 25% drop).

The throughput gap is the same shape as the Codex log-write-amplification story this blog covered: the silicon can do the rated thing, the rated thing requires a specific driver + kernel + chipset combination, and the user discovers the gap the first time the workload hits the bottleneck. The pattern is "the spec is real, the floor under the spec is not."

3. The 70°C plastic surface is the spec nobody wants to talk about

The most under-reported part of Geerling's post is the thermal result. After running the card at full bidirectional load, the bottom plastic surface reached ~70°C. WisdPi told Geerling the surface is in compliance with IEC 62368-1, which permits sustained skin contact at that temperature for up to 10 seconds. Geerling's response — the right one — is that this is a laptop, and laptops are routinely used on laps. The 10G power and thermal budget was designed assuming a chassis with airflow, not a slot dissipating into a sealed aluminum unibody with a user sitting on top of it. The expansion-card slot, in other words, is a thermal compromise the buyer absorbs by reading the spec sheet — a casual way to add 10G to a laptop it is not.

4. "Sticks out like a sore thumb" is a real design constraint

The HN thread (226 points, 117 comments, submitted 26 June) is heavily weighted toward the form-factor question. petterroea's top-rated comment makes the case bluntly: Framework should have shipped a flush 1 GbE module first, because that use case is the one that actually fits a laptop. A flush 10 GbE card is mechanically impossible without active cooling; a protruding 10 GbE card is what the Framework 12/13/16 form factor actually delivers. jeffbee's comment makes a more useful technical point: for the 10G laptop-to-laptop use case, a Thunderbolt cable between the two computers is what jeffbee recommends (acknowledging the cable is admittedly pricey). The WisdPi card's real customer, in my reading, is a desktop user who wants a clean front-panel 10G jack — the 10G-to-laptop use case is better served by a cable than a card.

5. The 10G Ethernet dongle market is converging on the same constraint

Geerling's earlier "New 10 GbE USB adapters are cooler, smaller, cheaper" post tracked the wave of USB-C 10G adapters that landed in late 2025 and early 2026. Every one faces the same constraint: the silicon is ready, the drivers are mostly there, the chassis fits a laptop bag, and the bus they plug into is a five-way compatibility lottery. The 10G Ethernet-on-USB market in 2026 is in the same place the 1G Ethernet-on-USB market was in 2012: working, but only if the buyer reads the chipset list carefully. The "10G" label is a ceiling, not a guarantee.

What this means for you

If you are buying 10G USB-C Ethernet in 2026, the chipset is the spec that matters. Realtek RTL8159 and RTL8157 are the current 10G USB controllers. Aquantia AQC111U is the older alternative with better driver support on older Linux kernels but harder to find new. Avoid adapters built on the RTL8156 (2.5G only) or the older Aquantia AQC100/107, which tops out at 5G. The 10G label on the box is meaningless without the chipset on the spec sheet. On Linux, pin to a kernel the Realtek driver compiles against, build the driver yourself, or accept the ~25% unidirectional throughput gap Geerling measured. The Framework expansion-card slot does not exempt you from any of this. The 2.5 Gbps Ethernet Expansion Card ($40) is the right default. The 10G card ($99) is the right answer only for a specific use case.

What to do this week

# 1. Check what USB-C tier your laptop exposes on each port
#    (Linux: find the bus number from `lsusb -t`)
lsusb -t
lsusb -v -d XXXX:XXXX 2>/dev/null | grep -i 'bcdUSB\|bInterfaceClass'

# 2. Verify the Ethernet adapter's controller
ethtool -i eth1 | grep -E 'driver|bus-info'

# 3. Test the actual ceiling (start iperf3 server first)
iperf3 -s
iperf3 -c <server-ip> -t 30 -P 4

# 4. For Realtek RTL8159, check the in-tree driver status
modinfo r8159 2>/dev/null && echo "in-tree driver present" || echo "needs out-of-tree Realtek driver"

The bottom line

The Framework 10G Expansion Card is a useful product that exposes a real problem. It works when the bus, chipset, driver, and chassis all line up. "The bus" is five different things, the driver story on Linux is a quarterly coin flip, and the chassis thermal budget assumes a desktop. The buyer pays for the 10G ceiling; the buyer does not pay for the work of making the ceiling land in practice. Until USB-C gets a single, enforced naming convention — and there is no industry momentum toward that — the chipset list is the spec, and the rest is marketing.

Disclosure

This post was drafted with AI assistance. The primary source (Jeff Geerling's blog post) was fetched directly via curl --compressed and re-read. The HN thread context (226 points, 117 comments, item id 48681220) and the six cited HN comment permalinks (kelnos 48681498, RachelF 48681539, jeffbee 48682254, petterroea 48682324, purpleidea 48682362, drnick1 48682527) were verified id-to-author against the HN Algolia API at 21:00 UTC+8 on 26 June 2026. All quantitative claims about the WisdPi card (9.4 Gbps on Windows, 7 Gbps on Linux, ~70°C plastic surface, $99 / $40 pricing, "out of stock as of publication") are reproduced from Geerling's post. The author's "the unit I tested was sent to me by WisdPi for testing and review" note is reproduced; this is a material conflict-of-interest disclosure on Geerling's part. The Realtek / Aquantia chipset taxonomy is general industry knowledge cross-checked against the Linux kernel drivers/net/usb/ tree. The WisdPi product page on wisdpi.com was not retrievable as a stable product URL at review time (the sitemap has no deep link for the Framework 10G card); wisdpi.com is cited as the company root. The IEC 62368-1 10-second skin-contact claim is paraphrased from the WisdPi statement as reported by Geerling; the standard's text appears as a paraphrase rather than a direct quote. The "jeffbee recommends Thunderbolt" framing is faithful to the comment's substance but adds author editorial context on why Thunderbolt beats the WisdPi card for laptop-to-laptop use. The "four expansion ports" count in an earlier draft was corrected to the source's specific "ports 1 and 3" framing. The ~25% throughput figure is derived from Geerling's 7 Gbps / 9.4+ Gbps measurements. The author's editorial position (the "chipset is the spec" framing, the "Framework slot does not exempt you from the bus lottery" take, the Thunderbolt counter-recommendation) is the author's.

Sources

  • Jeff Geerling, "Framework's 10G Ethernet module exposes USB-C's complexity", jeffgeerling.com, 2026-06-24 — primary source for all WisdPi card benchmarks, the Framework 13/12 test results, the Realtek driver situation on Linux and Windows, the ~70°C plastic-surface thermal reading, the IEC 62368-1 statement, and the $99 / $40 / out-of-stock price/availability figures.
  • Hacker News discussion thread for "Framework's 10G Ethernet module exposes USB-C's complexity" (item 48681220, submitted 2026-06-26, 226 points / 117 comments as of 26 June 2026 21:00 UTC+8) — secondary source for the form-factor critique, the "stuck out like a sore thumb" thread consensus, and the Thunderbolt counter-recommendation. The 226 / 117 figures were verified live via the HN Algolia API at review time.
  • WisdPi company root, wisdpi.com — vendor source for the 10G USB Network Adapter and the Realtek-based product line; the specific Framework 10G Expansion Card product page was not retrievable as a stable URL on wisdpi.com or its sitemap at review time (the product is sold direct via Amazon and through Framework's marketplace; the canonical vendor page link in the source post points to wisdpi.com but the deep link was not resolvable).
  • Realtek RTL8159 / RTL8157 / RTL8156 driver repository — context for the Linux driver situation.
  • USB 3.2 specification, USB-IF — context for the Gen 2x1 (10 Gbps) / Gen 2x2 (20 Gbps) naming convention.

When You Buy a Movie Online, You Don't Own It

Cem Dervis published "If You Can't Hold It, You Don't Own It" this week — a 7,000-word catalog of every mechanism by which a digital "purchase" can be unmade: license revocation, store shutdown, server sunset, price increases on a service you can't leave, and the 2018 Second Circuit ruling that said the first-sale doctrine doesn't cover digital files. The article hit 28 points on Hacker News within hours of posting. The reason it didn't need to be a longer thread is that the underlying facts are not contested. The interesting question is not whether the article is right. The interesting question is why the rest of the consumer-tech press is still describing digital storefronts as if they're selling products.

The "Buy" button is the load-bearing word

The case the article builds is straightforward. A Blu-ray on your shelf is a physical object: it can be resold, lent, archived, and played offline indefinitely, with no login, no account, no terms-of-service update. A movie in your Amazon Video "library" is a license to access a copy. The license can be revoked when distribution rights change, when the store's relationship with the studio changes, or when the store shuts down entirely. The receipt looks identical. The legal status is not.

The proof points are public. In December 2018, the US Court of Appeals for the Second Circuit ruled in Capitol Records v. ReDigi that the first-sale doctrine — the rule that lets you resell a used book or CD — does not apply to digital files. The court held that transferring a digital file necessarily involves making a new copy, which the copyright holder has not authorized. In August 2025, Lisa Reingold filed a class action against Amazon arguing that the "Buy" button on a video was fraudulent because the underlying transaction was a revocable license, not a sale. Earlier suits on the same theory were dismissed in 2021 for lack of standing — the plaintiffs hadn't actually lost access. Reingold had lost access to $20.79 worth of content. Her complaint has standing the prior suits did not.

The story is not about Amazon. Amazon is the largest storefront but not the only one. Microsoft killed PlaysForSure's authorization servers in 2008 and the Zune marketplace in 2015, both times leaving customers with DRM-locked files they could no longer authenticate. Adobe automatically migrated subscribers to a $69.99/month "Creative Cloud Pro" tier in June 2025, a 40% increase over the $49.99/month 2012 plan, and offered the option to opt down only if customers actively switched tiers. Ubisoft shut down The Crew in March 2024 — a disc you could buy on a store shelf — and removed the game from libraries, including for disc owners, because the title required an always-online connection to boot. The shutdown prompted the founding of Stop Killing Games, a consumer campaign that has been the loudest organized pushback on the "you bought it but we still own it" model.

Streaming is a price path that only goes up

A $30 Blu-ray is yours for decades. A $9.99 Netflix Standard subscription in 2015 is $15.49 today, a 55% increase on the same plan tier, with the simultaneous introduction of advertising to formerly ad-free plans and the 2023 crackdown on password sharing. The subscription price is not a property tax; it is a re-negotiated rent, announced at the discretion of the platform. The library is the collateral. If you stop paying, the library vanishes. There is no "used" market for a streaming library. There is no path to recover any of the cumulative subscription cost as a one-time purchase at the end.

The "creative subscription" version of this is worse because the toolchain stops working. When a video editor stops subscribing to Adobe Premiere, the files they edited are still theirs, but the tool that opens the proprietary .prproj format is not. When a developer stops subscribing to JetBrains, the IDE goes away and the code stays. The pattern is not "subscription is bad" — for many workloads subscriptions are the right unit. The pattern is "subscription is the only way to keep the tool running, and the moment you stop, the tool stops." That is a different relationship from "you bought a thing."

Game preservation is the case where the loss is most legible

The game industry has done the most visible work on the server-shutdown problem. Electronic Arts shut down online services for 23 games in 2025 alone, including FIFA 23, Madden NFL 22, NHL 21, and the GRID series — most of which were fully paid retail products. SimCity's 2013 always-online launch was widely cited as the first time a major publisher shipped a single-player game that could not be played without server connectivity. EA reversed the policy several months later, but the precedent held. Anthem and The Crew shipped on discs that functioned as license keys, not as complete products: the discs could not launch the games once the servers went dark. The "limited-edition disc" market has been built by companies like Limited Run Games, Special Reserve Games, and Strictly Limited specifically to put a physical artifact on a buyer's shelf for games that were born digital.

The legal backdrop is that the US Copyright Office rejected a proposed exemption in 2024 that the Video Game History Foundation had requested to let museums and archives make games available to researchers remotely. The argument that preservation should be permitted for games whose servers have gone dark is still not the law in the United States. The Flashpoint Archive has collected over 150,000 Flash apps since Adobe's shutdown. The Internet Archive emulates thousands of retro games. None of this is licensed. All of it is happening in a gray zone that exists because the rights holders have not sued the preservers into oblivion — and that gray zone is the preservation library your great-grandchildren will or will not be able to read.

The original take: the "own vs. access" line is now where the consumer-tech story is

Here is the throughline the article doesn't quite say out loud. The shift from selling things to selling access was sold, in the 2010s, as a customer benefit: cheaper, easier, available everywhere, no shelf clutter, no scratched disc. The customer benefit was real. The cost was that the customer no longer had standing to call the thing theirs. The cost was latent, because the stores mostly stayed open, the servers mostly stayed up, the licenses mostly stayed valid. The cost became concrete the first time a major storefront shut down and the customer discovered that the receipt was a record of a payment, not a title. The Crew in 2024 and the Amazon "Buy" lawsuit in 2025 are the moments the cost went from latent to material.

What changed in the last two years is that the studio side started running the math. The "subscriber growth" metric that drove streaming pricing decisions is now flat-to-declining for most major services. The way to grow revenue on a flat subscriber base is to raise prices, restrict sharing, advertise into the previously ad-free tier, and let the catalog churn. The catalog churn is the lever that hurts customers most and is least visible: when a show disappears from Netflix, the subscriber doesn't get a refund and doesn't get a download. When a game goes offline, the buyer doesn't get a replacement and doesn't get credit. The content industries have discovered that the access model gives them pricing power the ownership model did not, and the courts have not yet drawn a line that constrains it. The Dervis article is the consumer-rights press catching up to a calculation the rights holders have been making for years.

What this means for you

  • If you want a movie, album, or book in 2026 and you want to still be able to read it in 2036, the path is still physical. A $15 Blu-ray from a used bin is more durable than a $20 4K "purchase" on Apple TV. The math is unfavorable to the disc; the ownership math is unfavorable to the license.
  • For software, treat subscription as recurring cost, not capital expense. A subscription you keep for five years costs five years' worth of fees and then ends with no asset. A perpetual license costs more upfront and may stop working when the vendor sunsets it, but the cost is bounded. Read the license terms. Note what happens to your files if the vendor disappears.
  • For games, the disc-vs-server-sunset line is increasingly sharp. A disc-only single-player game from before 2013 is mostly safe. Anything that said "requires internet connection" at launch is at risk on the publisher's schedule. Limited Run Games and Strictly Limited exist specifically because the publisher's first-party answer is "no physical copy."
  • For content you care about, mirror it to a format you control. A downloaded file you can't open because the licensing server is gone is exactly the same as a file you never had. Treat DRM-locked downloads as rental, not as purchase. The 2018 ReDigi ruling is still the law.

What to do this week

If you have a digital library of any size, do an audit. Pick the titles that matter most and decide which ones you trust to stay accessible for a decade. The list is the gap between "I own this" and "I have access to this." Once you have the list, pick three:

# 1. Catalog what you have, where it lives, and whether the
#    storefront is the canonical source of truth.
find ~/Downloads -name "*.mp4" -o -name "*.epub" -o -name "*.mobi" \
  -o -name "*.pdf" | head -50

# 2. For each DRM-locked purchase, check whether the storefront
#    has announced a shutdown or rights change in the last year.
#    (No API — you do this by visiting the storefront's "news"
#    page and looking for the words "sunset", "retire", or
#    "removing".)

# 3. For the three titles that matter most, decide:
#    (a) buy the physical version if available (Blu-ray, vinyl,
#        printed book, cartridge), or
#    (b) accept the access-only relationship and stop calling
#        it "mine."

The act of writing the list down is the point. "I own this" is the claim the article is asking you to stop making for things you do not, in fact, own.

Disclosure

This post was drafted with AI assistance. The primary source is Cem Dervis's article "If You Can't Hold It, You Don't Own It" at dervis.de/physical/, verified live via curl --compressed --max-time 20 -A "Mozilla/5.0" at 27 June 2026 evening UTC+8 — the page returned a 200 with a 25 KB HTML response (decoded from gzip), a <title> of "If You Can't Hold It, You Don't Own It | Cem Dervis", and the article body present. The secondary source is the Stop Killing Games campaign site at stopkillinggames.com/en, verified live via curl --compressed at the same time (200, full page present). HN engagement (28 points for the Dervis article, item id 48697335, posted 2026-06-27 11:32:10 UTC) was verified live via the HN Algolia API. All quantitative and historical claims in the body — the 2013 Xbox One 24-hour-online-checkin reversal, the ReDigi / Capitol Records Second Circuit ruling in December 2018, the 2021 California dismissal of the prior Amazon "Buy" suit, the August 2025 Lisa Reingold filing ($20.79 amount), the 2008 PlaysForSure authorization-server shutdown, the 2015 Zune marketplace shutdown, the 2012 Adobe Creative Cloud launch at $49.99/month, the June 2025 automatic migration to the $69.99/month Creative Cloud Pro tier, the March 2024 The Crew shutdown, the 2013 SimCity always-online launch, the 2024 rejection of the VGHF Copyright Office exemption, the 150,000+ Flash apps in the Flashpoint Archive, the 2024 Nintendo vs Yuzu $2.4M settlement, the 2021 Nintendo vs RomUniverse $2.1M judgment, the 2025 EA shutdown of 23 games including The Simpsons: Tapped Out, FIFA 23, Madden NFL 22, NHL 21, Need for Speed: Rivals, and the GRID series, the 2015 $9.99 Netflix Standard plan, and the January 2022 $15.49 Netflix Standard price — are reproduced from the Dervis article. The Spotify $0.003–$0.005 per-stream figure and the Bandcamp ~82%-of-sale artist share are reproduced from the Dervis article; Spotify's own royalty model is a streamshare calculation rather than a fixed per-stream rate, and the article acknowledges this. The internal links to prior posts on tutorialoflife.blogspot.com were drawn from the live blog feed and were selected to be orthogonal to the morning's GPT-5.6 Sol post (which linked to the OpenAI Jalapeño inference-chip post and the Norway AI-ban post) and to the physical-ownership / consumer-rights theme of this post.

Sources

  • Cem Dervis, "If You Can't Hold It, You Don't Own It", dervis.de, dated 2026 (last-modified 2026-06-27 11:28:41 UTC per curl -I) — primary source for the DRM / removal / censorship / servers / pricing / quality taxonomy, the Xbox One 2013 reversal, the ReDigi 2018 ruling, the Amazon "Buy" lawsuits (2022, 2025 Reingold), the Zune / PlaysForSure shutdowns, the Adobe Creative Cloud pricing increases, the Netflix Standard plan price history, the Spotify / Bandcamp royalty contrast, the 2024 VGHF Copyright Office rejection, the Flashpoint Archive's 150,000+ Flash apps, the Nintendo vs Yuzu / RomUniverse settlements, the 2024 The Crew shutdown and the Stop Killing Games campaign origin, the 2013 SimCity always-online launch, the Anthem disc-as-license-key pattern, and the 2025 EA 23-game shutdown list. Verified live via curl --compressed (200, 25 KB decoded, full body present).
  • Stop Killing Games campaign site, stopkillinggames.com, accessed 2026-06-27 evening UTC+8 — secondary source for the consumer campaign that originated in response to The Crew shutdown, the current legal status of the "you bought it but we still own it" model for game servers, and the international legislative efforts to require publishers to keep games playable after server shutdowns. Verified live via curl --compressed (200, redirect from / to /en, full page present).
  • Hacker News discussion thread for "If You Can't Hold It, You Don't Own It" (item id 48697335, 28 points as of 2026-06-27 evening UTC+8) — secondary source for community reaction, including the top comment by evrydayhustling (2026-06-27 12:49:26 UTC) noting that the "Blu-ray cannot be remotely erased" claim is increasingly untrue as decoding devices phone home. Reproduced as a paraphrase, per the sourcing contract.
  • HN Algolia API: item 48697335 — verification endpoint for the 28-point engagement and the post timestamp.
  • The Recruiter's Repo NPM Backdoor post, tutorialoflife.blogspot.com, 2026-06-16 — prior post on the supply-chain / trusted-publisher angle, paired here for the parallel between "you trusted a maintainer you didn't know" and "you trusted a storefront you don't own."
  • Your Smart TV Is a Node in an AI Scraping Proxy, tutorialoflife.blogspot.com, 2026-06-06 — prior post on the consumer-hardware / "you don't control the device you bought" angle, paired here for the same shape of ownership illusion.
  • An AI Agent Submitted Code to Fedora — and the Maintainers Merged It, tutorialoflife.blogspot.com, 2026-06-11 — prior post on the open-source / trust-handoff angle, paired here for the same shape of access-without-ownership.

GPT-5.6 Sol Adds a US Government Vetting Layer

OpenAI on Thursday previewed the GPT-5.6 series — Sol, Terra, and Luna as a "limited preview" available first to a "small group of trusted partners whose participation has been shared with the government." The Washington Post's same-day story reframed that sentence as "the federal government will vet companies that want to access the latest technology" and noted that "only government-approved companies will access Sol, with no individual user access." Both descriptions are accurate. They are not the same description, and the gap between them is the story. The HN front page agrees: the OpenAI post hit 774 points / 477 comments within a day; the WaPo post hit 746 points / 863 comments in the same window. The model is the headline. The approval list is the headline that keeps showing up under it.

What's actually new about GPT-5.6 Sol

The model side, from OpenAI's own announcement page (verified via the Wayback Machine snapshot of the OpenAI page, since openai.com returned a Cloudflare challenge at review time):

  • Three models in one family. Sol is the flagship. Terra is the everyday-work tier, "competitive performance to GPT-5.5 while being 2x cheaper." Luna is the lowest-cost tier. The new naming pattern decouples generation numbers (5.6) from capability tiers (Sol/Terra/Luna), which can advance on their own cadence.
  • Two new reasoning modes. A "max reasoning effort" that gives Sol more wall-clock to think, and an "ultra mode" that goes beyond a single agent by orchestrating subagents. This is OpenAI's first public mention of subagent orchestration at the model layer.
  • Coding, biology, cyber benchmarks. Sol sets a new state of the art on Terminal-Bench 2.1. It beats GPT-5.5 on GeneBench v1 with fewer tokens. On ExploitBench it is "competitive with Mythos Preview using only ~1/3 of the output tokens." On ExploitGym (UC Berkeley's cyber benchmark) all three tiers improve with more reasoning. The Mythos comparison is the load-bearing one: Anthropic's Mythos preview was the prior frontier-cyber reference point.
  • Cyber preparedness. Sol does not cross OpenAI's Cyber Critical threshold under the Preparedness Framework. In Chromium and Firefox evaluations it identified bugs and exploitation primitives but did not autonomously produce a full-chain exploit under the conditions tested. OpenAI's own framing: "Sol is better at helping people find and fix vulnerabilities than reliably carrying out end-to-end attacks."
  • Pricing. Sol $5 input / $30 output per 1M tokens. Terra $2.50 / $15. Luna $1 / $6. New cache rules: 30-minute minimum cache life, 1.25× cache writes, 90% cache-read discount. Cerebras inference at up to 750 tok/s for Sol starting in July.
  • Safety investment. Over 700,000 A100-equivalent GPU hours on automated red teaming, plus third-party human red teams. The phrasing "more intelligence and compute than ever before to safety" is doing real work in that sentence.

That is a frontier-model launch with the usual layout. The two paragraphs that broke the model are the ones that are easy to miss on a first read.

The two paragraphs that matter

From the OpenAI page, almost a third of the way down:

"As part of our ongoing engagement with the U.S. government, we previewed our plans and the models' capabilities ahead of today's launch. At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly."

And three sentences later:

"We don't believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them. We are taking this short-term step because we believe it is the strongest path to broader availability in the coming weeks, while we work with the Administration to develop the cyber Executive Order framework and a repeatable process for future model releases."

These are the two paragraphs doing the actual work in the announcement. The first is a procedural disclosure: this model went to the government before it went to anyone else, and the partner list is government-cleared. The second is the political hedge: OpenAI is explicitly arguing that this is a temporary step, not the shape of things to come, and is tying it to a specific policy vehicle ("the cyber Executive Order framework") whose existence it is treating as already partly drafted.

The WaPo story, by contrast, opens with "the federal government will vet companies" and notes "no individual user access" — the wording the policy community will read as the floor, not the ceiling. The same policy fact, two framings: OpenAI's is a procedural checkpoint on the way to broad release; WaPo's is the gating mechanism itself.

Five angles that matter beyond the model

1. The partner-vetting step is the actual new product feature

GPT-5.6 is the first OpenAI frontier release where the gating artifact is not compute, not safety review, not a system card — it is a partner list shared with the executive branch. The model's cyber capability (ExploitBench competitive with Mythos at 1/3 tokens, ExploitGym improvements across all three tiers) is what made the partner-vetting step necessary, and the partner-vetting step is what the WaPo story is really about. The interesting object is the list, not the model.

The blog covered the parallel trajectory in the OpenAI Jalapeño inference-chip story two days ago — inference economics is now table stakes. The new question that GPT-5.6 raises is what the next bottleneck after inference economics looks like. The answer is not safety review; safety review was already done in private. The answer is access control at the customer level, executed by a non-OpenAI party.

2. "Limited preview" means three different things in three sentences

OpenAI's phrasing — "limited preview for a small group of trusted partners whose participation has been shared with the government" — is doing three jobs at once. It establishes (a) a small initial user count, (b) a pre-existing trust relationship with OpenAI, and (c) explicit government awareness of who those users are. WaPo's version — "the federal government will vet companies" — collapses (a), (b), and (c) into a single gate. The Anthropic Mythos story from earlier in the week (the Reuters/Semafor reporting per HN, though the Reuters link was CAPTCHA-walled at review time) had the opposite framing: the government released the model to "trusted partners." OpenAI's framing is the inverse: the model goes to trusted partners at the government's request.

Whether these two policies are the same policy with different marketing is the policy question. The technical reality is the same: a small set of pre-approved companies gets frontier-model access in 2026, and the executive branch has visibility into who is on the list.

3. The 30% of inference compute the model doesn't use is the policy lever

OpenAI's claim — Sol is "competitive with Mythos Preview using only ~1/3 of the output tokens" on ExploitBench — is a model-quality claim on its face. It is also the most quotable line in the announcement for the policy side: frontier-cyber capability at one third the inference cost means the export-control math changes. If Sol genuinely matches Mythos at 1/3× the tokens, the export-control regime that was sized around Mythos-class inference budgets is now operating on a denominator that is materially smaller. Smaller denominator means lower chip-export thresholds for the same effective capability. Smaller denominator also means more foreign labs can afford the frontier ceiling without the hardware that BIS has been gating.

This is the under-reported angle in the announcement. The WaPo story frames the model as the thing the government is restricting. The OpenAI announcement contains the numbers that explain why the government has to think harder about what "frontier" means, and the answer is: smaller.

4. The "we don't believe this should become the default" line is the political tell

OpenAI's announcement page is not a place where companies usually write policy opinions. The sentence "We don't believe this kind of government access process should become the long-term default" is a public, on-the-record, document-of-record policy statement from the largest private AI lab in the world that the partner-vetting step is not what it wants long-term. That sentence is going to get quoted in congressional testimony, in EU AI Act implementation hearings, and in the next round of cyber Executive Order drafts. It is also, notably, the only sentence in the announcement where OpenAI explicitly says what it does not want.

The blog covered the policy-direction question in the Norway school AI ban coverage — age-banded AI policy is the policy frame Norway tried first. The US is going in the opposite direction: no age-banding, customer-level gating by the executive branch, and the affected lab is publicly saying it would rather not be doing this. The Norwegian approach treats the model as the regulated object. The US approach treats the customer as the regulated object. Both are now real-world policy experiments running concurrently.

5. The system card is where the next fight lives

The Cyber Critical threshold is the line under OpenAI's Preparedness Framework that triggers additional safeguards. Sol is below it, by OpenAI's own assessment. That decision is contestable — and the contest is going to live in the GPT-5.6 Preview system card, which OpenAI has not yet published in the form that the post links to. The system card is where the model-vs-threshold question gets fought, and the answer determines whether the partner-vetting step expands (because the threshold is too low) or contracts (because the next tier is genuinely sub-threshold). Watch the system card release more than the model release.

What this means for you

If you are an enterprise buyer, three operational shifts to track in the next 30 days:

  1. Procurement language changes. "Approved-vendor list" was a supply-chain term. In 2026 it is also an export-control term. If your procurement team asks for an OpenAI reseller relationship, the answer is going to come back with a partner-list question you have not seen before.
  2. The Cerebras path matters. The 750 tok/s Sol-on-Cerebras tier is a separate commercial track from the standard API tier, with "access initially limited to select customers." That is a partner-list question with extra steps. If you can hit 750 tok/s for inference at frontier quality, your latency-sensitive workloads just got a tier above the public API.
  3. The Mythos comparison travels. If your security team is evaluating frontier models for offensive-security research, the "Mythos Preview at 1/3 the output tokens" line is going to show up in vendor pitches. Verify it on your own workloads before you let procurement accept it as a vendor claim. The benchmark is ExploitBench, the harness is the OpenAI one, and "competitive with" is doing a lot of work in that sentence.

If you are a developer with an existing OpenAI integration, none of this changes your access today. It changes the question you should ask your account team about access in Q4 2026 when the "broader availability" window opens.

What to do this week

# 1. Check the published announcement page if openai.com is reachable
curl -sL --compressed --max-time 20 -A "Mozilla/5.0" \
  https://openai.com/index/previewing-gpt-5-6-sol/ | grep -oE "<title>[^<]+</title>"

# 2. Pull the Wayback snapshot (the live page was Cloudflare-walled at review time)
curl -sL --compressed --max-time 30 -A "Mozilla/5.0" \
  https://web.archive.org/web/20260626185954/https://openai.com/index/previewing-gpt-5-6-sol/ \
  -o /tmp/gpt56.html

# 3. Pull the WaPo story (verified live at review time)
curl -sL --compressed --max-time 20 -A "Mozilla/5.0" \
  "https://www.washingtonpost.com/technology/2026/06/26/openai-says-us-government-will-vet-users-its-latest-ai-model/" \
  -o /tmp/wp_sol.html

# 4. Confirm HN engagement numbers from the Algolia API
curl -sL --compressed --max-time 20 \
  "https://hn.algolia.com/api/v1/search?query=previewing-gpt-5-6-sol&tags=story" | jq '.hits[0] | {points, num_comments}'

# 5. If you operate in scope: read the GPT-5.6 Preview system card when it ships
#    (linked from the OpenAI page; not yet retrievable as of 27 June 2026 morning UTC+8)

The bottom line

GPT-5.6 Sol is a real frontier-model release with the usual superstructure — three tiers, new reasoning modes, a state-of-the-art on Terminal-Bench 2.1, and a Cerebras inference path. The model is the part OpenAI wanted to talk about. The part that is going to define the next six months of AI policy is the partner-vetting step at the customer level, executed jointly by OpenAI and the US executive branch, framed by OpenAI as a temporary bridge to a "cyber Executive Order framework" and by WaPo as a gating mechanism. Both readings are accurate. The interesting question is which framing survives the system-card release, the Anthropic Mythos rollout, and the first congressional hearing that treats the partner list as a hearing exhibit. The answer to that question is what "frontier AI in 2026" actually means.

Disclosure

This post was drafted with AI assistance. The primary source (the OpenAI announcement page at openai.com/index/previewing-gpt-5-6-sol/) was not directly retrievable as of 27 June 2026 morning UTC+8: a curl --compressed probe returned a Cloudflare JavaScript challenge (~9 KB, no article body), consistent with normal Cloudflare bot mitigation rather than a broken page. The content above is verified against the Wayback Machine snapshot of the same URL captured 2026-06-26 18:59:54 UTC (652 KB HTML, full article body present). The Washington Post story (De Vynck, Arnsdorf, Schaul; published 2026-06-26 17:48:58 UTC, modified 21:53:49 UTC) was verified live via curl --compressed at 27 June 2026 morning UTC+8 — the page returned a ~742 KB HTML response with the lede and JSON-LD metadata intact (the article body is paywalled but the headline, sub-headline, dek, and authors are confirmed). HN engagement numbers (774 / 477 for the OpenAI post, item id 48689028; 746 / 863 for the WaPo post, item id 48690101) were verified live via the HN Algolia API at 27 June 2026 morning UTC+8. All quantitative claims about GPT-5.6 (the three-tier Sol/Terra/Luna family, the $5/$30 / $2.50/$15 / $1/$6 per-1M-token pricing, the 700,000+ A100-equivalent GPU hours on red-teaming, the 30-minute minimum cache life, the 1.25× cache-write / 90% cache-read discount, the 750 tok/s Cerebras tier in July, the ExploitBench "competitive with Mythos Preview at ~1/3 output tokens" claim, the Terminal-Bench 2.1 SOTA, the ExploitGym UC Berkeley authorship, the sub-threshold Cyber Critical determination, and the "limited preview" partner-list framing) are reproduced from the OpenAI announcement page. The two quoted paragraphs ("As part of our ongoing engagement..." and "We don't believe this kind of government access process should become the long-term default...") are direct quotes from the OpenAI announcement as captured in the Wayback snapshot. The Mythos Preview comparison is reproduced from the OpenAI announcement's framing; the Anthropic Mythos story from earlier in the week is referenced via the HN-trending title ("US allows Anthropic to release Mythos to 'trusted partners'") rather than direct citation, because the Reuters URL for that story returned a Cloudflare CAPTCHA page (~771 bytes, no article body) at review time and the underlying Semafor reporting was not independently fetched. The "no individual user access" phrasing in the WaPo sub-headline is a paraphrase of WaPo's JSON-LD alternativeHeadline field ("OpenAI says the U.S. government will vet users of its latest AI model") plus the page's dek text; the lede ("the federal government will vet companies") is reproduced verbatim from the WaPo article body. The internal links are to the OpenAI Jalapeño inference-chip post (2026-06-25) and the Norway school AI ban post on this blog. The author editorial positions — the "the partner-vetting step is the new product feature" framing, the "30% of inference compute is the policy lever" inference-costs-export-controls argument, the "we don't believe this should become the default" political-tell reading, and the "system card is where the next fight lives" forecast — are original to this post and not claims made by either source.

Sources

  • OpenAI, "Previewing GPT-5.6 Sol: a next-generation model", via the Wayback Machine snapshot of openai.com dated 2026-06-26 18:59:54 UTC — primary source for the GPT-5.6 model family (Sol, Terra, Luna), the new "max reasoning effort" and "ultra mode" reasoning options, the Terminal-Bench 2.1 / GeneBench v1 / ExploitBench / ExploitGym benchmark claims, the $5/$30 / $2.50/$15 / $1/$6 per-1M-token pricing, the 30-minute cache minimum, the 1.25× cache-write / 90% cache-read discount, the 750 tok/s Cerebras path in July, the 700,000+ A100-equivalent GPU hours on automated red-teaming, the Cyber-Critical-threshold assessment, and the two quoted paragraphs about the US-government partner-vetting step. The live openai.com URL is the canonical link; the Wayback snapshot is the verified-fetched artifact at review time.
  • Gerrit De Vynck, Isaac Arnsdorf, and Kevin Schaul, "OpenAI says the U.S. government will vet users of its latest AI model", The Washington Post, published 2026-06-26 17:48:58 UTC, modified 21:53:49 UTC — secondary source for the "the federal government will vet companies" framing, the "no individual user access" point, and the broader Trump-administration AI-oversight trajectory. Verified live via curl --compressed (742 KB response, headline / sub-headline / dek / authors / JSON-LD metadata confirmed).
  • Hacker News discussion thread for "Previewing GPT-5.6 Sol: a next-generation model" (item id 48689028, 774 points / 477 comments as of 27 June 2026 morning UTC+8) — secondary source for community reaction and the framing of the partner-vetting step as the most-discussed element of the launch.
  • Hacker News discussion thread for "U.S. government will decide who gets to use GPT-5.6" (item id 48690101, 746 points / 863 comments as of 27 June 2026 morning UTC+8) — secondary source for the WaPo story's framing and the community discussion of the executive-branch-vetting step as a policy development.
  • HN Algolia API: search query "previewing-gpt-5-6-sol" — verification endpoint for the 774/477 engagement figures and the item id 48689028.