Programming guides for beginner...
Any comments are welcomed....
I hope it helps!!! Thanks for drop by...

Monday, June 29, 2026

HackerRank's ATS Is Open Source. The Luck Is the Feature.

On the morning HackerRank published their open-source applicant tracking system, a developer named Dan Kinsky opened a terminal, pointed his own resume at it a hundred times, and watched the same document score anywhere from 66 to 99 out of 100. The repo is real, the runs are reproducible, and the bottom line is the design choice everyone in hiring tooling has been quietly making for three years.

The tool in question is interviewstreet/hiring-agent: a Python pipeline that parses a PDF resume, calls a local LLM (default: gemma3:4b) six times to pull structured fields out of work history, education, skills, projects, and awards, optionally enriches the result with GitHub repository scans, and then asks the model to grade the whole bundle out of 100. Up to 20 bonus points get stacked on top for startup experience, a portfolio site, or a technical blog. MIT-licensed, 3,592 stars on GitHub at time of writing, 253 open issues — most of which are the same complaint from different people. HackerRank didn't appear out of nowhere either: the repo dates to July 2025, but the link only went viral after a LinkedIn and r/leetcode pass that started roughly two months later, which matches Kinsky's correction footnote on the post (one LinkedIn post linked; one Reddit thread linked, both in his footnote 1). Anyone who has been watching the AI-in-hiring discourse knows the pattern by now: an LLM is wired into a pipeline that touches millions of decisions, the LLM's behavior changes under load, and nobody on the buying side inspects which version of stochastic they actually deployed.

Kinsky's experiment is the part that should change how the industry talks about the space. With the tool set to its default temperature — 0.1, a setting most people would call "effectively deterministic" — the same resume gets graded on the same rubric and the same rubric returns a 33-point spread on 100 trials. Toggling DEVELOPMENT_MODE off, hard-coding the inputs, and changing nothing except deleting a print() statement would already shift the score by 16 points; looping the model produces the full range. Re-running with Gemini instead of gemma3:4b tightens the distribution — but to a 48-64 band, which still has a 16-point spread and would still fail any cutoff in that range on roughly 28% of submissions (Kinsky's number for a 60-cutoff, not a separate reproduction). The non-determinism is a sampling problem, and the sampling never goes away.

The numbers that matter

Most resume-screeners, including this one, grade on a 100-point rubric anchored to a handful of weighted categories. Hiring-agent's breakdown is unusually explicit about what it's optimizing for: 35 points for open source contributions, 30 for personal projects, 25 for work experience, 10 for technical skills, plus up to 20 in bonus. Read it once and you see what the tool is for: a fairly specific kind of engineer with a specific kind of artifact trail. Candidates whose work happens inside a corporation and stays there — the majority of working engineers, by every measure — start the test at a structural disadvantage that has nothing to do with their quality.

That structural tilt is what makes the non-determinism land so hard. Kinsky ran the tool against the "technical skills" category and watched it score 8 out of 10 in 98 of 100 trials — almost a hard rule, because "did this candidate list React" is the kind of check that any extraction model can do reliably. The "work experience" category came back 25/25 in every run, including against a stripped-down resume listing only one internship — the rubric is two lines long, contains no anchor examples, and the LLM has nothing to vary on, so it just agrees with itself. Categories with something to judge are exactly the categories the tool can't judge consistently. Projects swings wildly. Open source, with the rubric actually reading like a rubric, swings less than it used to but still swings. Kinsky's resume got marked as one that its projects "lack architectural complexity" or, with comparable frequency, projects that "demonstrate real-world deployment" — two opposite readings from the same input, sampled roughly evenly across runs, and the only meaningful distinction between those phrasings is the random seed the sampler hit.

Temperature 0 is a story the model tells you

The HN thread on Kinsky's post spent the first hundred comments litigating the same argument, and it happens to be the part of the story that most confidently deserves a closer reading. In theory, "temperature 0" produces deterministic outputs from a sampling model. In theory-theory — which is the theory library developers actually mean when they quote it — temperature 0 doesn't really exist as a fixed point. The softmax becomes a spike function in the limit, but a discrete tokenizer with a finite vocabulary doesn't carry a true Dirac; it carries a Dirac comb, which collapses to the single highest-logit token only when there's a unique highest-logit token at every position. Floating-point quirks normally paper over that, but the assumption that no two logits will ever tie is exactly the kind of assumption you don't want underwriting a hiring decision.

The deeper issue is that the model is asked to do two jobs with one set of weights: parse a document into structured fields (the part LLMs are good at), and score a candidate against a rubric (the part LLMs are uniquely bad at, because rubric scoring is a discriminative task and chat models are trained to be generative). The tool's own prompt for experience is two lines long, per Kinsky's quoted rubric — read the Production section in the repo: instructions about analyzing work and volunteer sections for real-world or internship experience, plus a special-consideration line that awards extra for founder or early-stage engineer roles. No anchors. No examples. No definition of "real-world." The model is being asked to invent a calibration it was never trained on, and the result is whatever happens to come out of the sampler. That's why an intern and a principal engineer both get 25/25: the prompt can't tell them apart, and neither can the model.

The reproducibility budget is the only metric that matters

Most AI-in-hiring coverage focuses on bias — and deservedly so; the Brookings April 2025 study on gender, race, and intersectional bias in LLM-driven resume retrieval put real numbers behind the failure mode. But reproducibility is the failure mode people who aren't in the literature are about to discover, and it doesn't need a bias-detection study to demonstrate — it just needs Kinsky's terminal loop. A tool whose identical inputs produce non-identical outputs is a tool whose identical candidates produce non-identical outcomes. At any fixed cutoff, the failure rate of "this qualified candidate didn't make it past the screen" is structurally non-zero, and the candidates that fall on the wrong side of the cutoff are random with respect to merit. That's the function the tool is performing. Calling it a "filter" understates it; calling it a "luck filter" catches it.

There are two things worth keeping separate, even though they often get tangled together. The first is LLM bias — outputs that differ systematically across groups, the bias problem the literature has spent two years measuring. The second is LLM noise — outputs that differ across identical inputs, the reproducibility problem Kinsky is documenting. The first matters because fairness is a legal category and a moral category. The second matters because anything with this much noise is unfit for the actual decision even if you fix the bias. A noise-free version of a biased tool is still biased. A noise-heavy version of a fair tool is unfit to use.

Open source changed the optics but not the math

The interesting decision HackerRank made was opening the source. A closed-source LLM screener with 33-point variance would be the kind of "actuarial non-decision" enterprise software tends to hide; an open-source one is a reproducible experiment. Kinsky's loop is the unit-test the entire industry should have been writing since AI resume screeners started shipping in 2022. Anyone can replicate it — and many will, because the cost of doing so is a laptop, a pip install, and an hour. What they will find is what Kinsky found: the tool's accuracy, as a filter, is the same as flipping a weighted coin. Whatever signal the company thought they were buying is in the noise floor.

That distinction matters even more at the buyer side. A screening tool produces a ranking function whose top-K is unstable across runs — meaning its top-K is arbitrary. Companies buying these tools should be asking, before they wire one into Workday, Greenhouse, or Lever, what the tool's reproducibility budget is for the population they're screening. If your top-of-funnel conversion is 10% and your screener has a 30% pass rate at the cutoff, the screen is responsible for roughly half of your funnel noise. Halving the variance by switching to a smaller, deterministic model and tighter prompts would do more for hire quality than any number of model upgrades. Anyone who's been on the receiving end of an unexplained rejection knows this already.

What to do this week

If you're a job seeker:

  • Assume a non-trivial share of the screen is a coin flip. Use that as license to apply to roles your gut says you're a fit for, even when your heuristic says you're not.
  • The resume rubric HackerRank-style tools optimistically measure is heavy on open source and personal projects. If you have those, surface them more prominently — GitHub README polish, a one-paragraph portfolio, a working demo URL. The tool is explicitly grading on artifacts that look like artifacts.
  • If you have none of those, your path through this filter is rougher regardless of quality. Lean on referrals and on company-specific application tracks that bypass the automated screen.

If you're an engineer with a say in how your company screens:

  • Run Kinsky's loop on your own tool with your own population. The "100 runs against the same resume" test is the smallest possible reproducible experiment and you should have its output before you trust it.
  • Treat any LLM-based screener that returns a single candidate score as inadmissible. Demand either a structured decomposition (the model returns per-rubric scores so you can audit which parts are stable) or a calibration band (each score comes with a standard deviation across N runs).
  • If the screener doesn't expose its rubric, what you have is a vibe check with extra steps. The vibe check is the part you don't want.

If you're running the screener yourself:

  • Lower the temperature only after you have measured the temperature=1 distribution — the noise floor has to be known to be lowered.
  • Replace single-call score generation with multi-sample consensus, or with discriminative models trained on labeled paired comparisons (the actual right tool for the job).
  • The single most valuable line in the open-source repo is the temperature: 0.1 default. Change it to 0, document the new spread, and ship the difference.

The feature, renamed

The industry-wide reflex when a reproducibility paper appears is to call the problem "non-determinism" and promise a fix in the next model. Non-determinism is the property, not a bug to patch — and it's a direct consequence of how these models generate text. A model that returns 100/100 with seed 0 and 73/100 with seed 1 is doing exactly what it was trained to do; the prompt engineer has not yet built a system that constrains the sampler. The fix is to stop pretending the model is a sensor when it's a sampler, and to put determinism back into the pipeline by routing it through a part of the system that actually has it. Structured extraction can be done deterministically. Rubric scoring, with the right anchors, can be done deterministically. The middle distance — "judge me on my projects, please" — is where the sampler takes over, and the sampler is supposed to take over there. The honest answer is to admit that's a part of the decision a human has to make.

Kinsky's post is honest about that in a way the industry usually isn't. He isn't angry at HackerRank. He's angry at himself for thinking the tool was testing something it wasn't. Plenty of other readers will be angry at HackerRank; they're right to be, but only about the secondary thing. The primary thing is that the entire category of tool is built on a category error, and the open-source release is the moment that became undeniable. Once you see the same resume swing from 66 to 99 on a hundred deterministic-looking runs, every score that came out of every other LLM screener starts to look like the same number — just with a different seed you can't reproduce.

Disclosure

Drafted with AI assistance. Primary source: Dan Kinsky's 28 Jun 2026 post at danunparsed.com/p/hackerrank-open-source-ats, fetched and cached locally on 29 Jun 2026. GitHub repo interviewstreet/hiring-agent confirmed live via the GitHub REST API on the same date. Brookings 25 Apr 2025 piece on bias is cited only for the bias vs. noise distinction in the body, not for any specific finding. Per-claim attribution and live numbers are in the Sources section below.

Sources

  • HackerRank's open-source ATS — Dan Kinsky, "HackerRank open sourced its ATS. My resume scored 90/100. Oh wait 74/100. No — 88/100. Actually 83/100.", danunparsed.com/p/hackerrank-open-source-ats, 28 Jun 2026. Primary source for all experimental claims in the body (66–99 spread, 65% cutoff failure rate, 48–64 Gemini band, 98/100 technical-skills consistency, 25/25 experience rubric outcome). Fetched 29 Jun 2026.
  • The GitHub repo itselfgithub.com/interviewstreet/hiring-agent, MIT-licensed Python project, 3,592 stars / 745 forks / 253 open issues at time of writing. Repo created 2025-07-29; first viral LinkedIn/Reddit pass ~Oct 2025 per Kinsky's footnote. Confirmed via GitHub REST API on 29 Jun 2026.
  • The HN discussion — Hacker News item 48713832. 730 points / 309 comments at time of writing; thread moving. Used for the temperature-zero analysis and the broader engineering reaction.
  • Brookings 25 Apr 2025 on bias in LLM-based resume screening — Kyra Wilson and Aylin Caliskan, "Gender, race, and intersectional bias in AI resume screening via language model retrieval," brookings.edu/articles/gender-race-and-intersectional-bias-in-ai-resume-screening-via-language-model-retrieval/. Used only for the bias vs. noise distinction; no specific findings paraphrased.
  • The Reddit r/leetcode pass — referenced in Kinsky's correction footnote (footnote 1) as one of the two original viral-sharing surfaces, 28 Jun 2026. Linked but not directly fetched (Reddit returned a block page to my fetch attempt).

Framework's 10G Module Proves USB-C Has Too Many Speeds

Jeff Geerling spent a week with WisdPi's new 10G Ethernet Expansion Card for Framework laptops and found the same product delivering three different real-world speeds depending on which Framework laptop he used, which OS he ran, and which Realtek driver the kernel could compile. The card is rated 10 Gbps. On a Framework 13 with AMD's Ryzen AI 5 340, it delivered 9.4 Gbps on Windows 11 and noticeably less on Linux. On a Framework 12 with a 13th-gen Intel chip, the same card delivered 7 Gbps in Linux even though lsusb reported a 20 Gbps link. The story is not "Framework made a bad product." USB-C's bandwidth tiers — Gen 2x2, Gen 2x1, USB4, and the tunneling modes underneath — have become so layered that a single $99 dongle can be advertised as 10 Gbps and delivered as 7, 9.4, or 10 depending on factors the buyer cannot inspect at purchase time. The post is a hardware review. The lesson is about software.

What the WisdPi 10G card actually delivered

Geerling's setup, pulled from the published post:

  • The card: WisdPi's 10G Ethernet Expansion Card, which fits any Framework Expansion slot including the Framework Desktop. It uses the Realtek RTL8159, which needs USB 3.2 Gen 2x2 (20 Gbps of raw bus bandwidth) to hit the rated 10 Gbps.
  • Framework 13 (AMD Ryzen AI 5 340): Windows 11 delivered 9.4 Gbps on average. Linux was "slightly worse." Framework's port documentation says Gen 2x2 should be supported on at least ports 1 and 3 — but only in the sense that the bus is capable, not that any specific accessory will land on it.
  • Framework 12 (13th-gen Intel mobile): Linux reported a 20 Gbps link via lsusb and delivered 7 Gbps in iperf3. The Realtek out-of-tree driver failed to compile on Ubuntu 26.04 because the bundled Linux 7.x kernel is newer than the driver expects. Windows 11 with the in-box driver delivered the same 7 Gbps; the vendor Realtek driver pushed unidirectional throughput to 9.4+ Gbps (with a bidirectional mix of ~9 Gbps up and 4–5 Gbps down).

Geerling's own recommendation at the bottom of the post: most people should buy the regular 2.5 Gbps Ethernet Expansion Card for $40 and stop there. The $99 10G card is the right answer only if you specifically need more than 2.5 Gbps and specifically do not want an external USB-C dongle. As of the post's publication on 24 June 2026, the 10G card was out of stock.

The five angles that actually matter

1. USB-C is a stack of five buses with overlapping names

The reason the same $99 product can deliver 7 Gbps, 9.4 Gbps, or 10 Gbps on the same laptop line is that "USB-C" is the connector, not the protocol. The protocols on that connector are at least five distinct things: USB 3.2 Gen 2x1 (10 Gbps), USB 3.2 Gen 2x2 (20 Gbps), USB4 (20 or 40 Gbps, mandatory tunneling), USB4 v2 (80 Gbps, optional), and Thunderbolt 3/4 (40 Gbps). The RTL8159's 10 Gbps Ethernet only fits inside the 20 Gbps tier. Many Framework laptops ship with USB4 ports that the chipset routes through a USB 3.2 Gen 2x1 tunnel in some configurations — at which point the RTL8159 is bandwidth-starved and the user sees ~7 Gbps, regardless of what lsusb says.

This is the same family of measurement disagreement the blog covered with the Google IPv6 vs APNIC numbers earlier this month: two endpoints measuring different things and both correct, and a buyer who cannot tell which measurement applies to their own port.

2. The Realtek driver situation is the real story

Geerling's headline is "USB-C is complex." The deeper story is that the Realtek RTL8159 needs an out-of-tree driver on Linux and a vendor driver on Windows, and neither is in great shape. On Ubuntu 26.04 with the 7.x kernel, the driver did not compile. On Windows 11 with the in-box Microsoft driver, throughput was 7 Gbps. Only Windows with the Realtek driver delivered the 9.4+ Gbps the silicon can do. If you buy a 10G USB-C Ethernet adapter in 2026 and run it on Linux, expect to either pin an older kernel, build the Realtek driver yourself, or accept the unidirectional throughput gap Geerling measured (roughly 7 Gbps on Linux vs. 9.4+ on the vendor driver — about a 25% drop).

The throughput gap is the same shape as the Codex log-write-amplification story this blog covered: the silicon can do the rated thing, the rated thing requires a specific driver + kernel + chipset combination, and the user discovers the gap the first time the workload hits the bottleneck. The pattern is "the spec is real, the floor under the spec is not."

3. The 70°C plastic surface is the spec nobody wants to talk about

The most under-reported part of Geerling's post is the thermal result. After running the card at full bidirectional load, the bottom plastic surface reached ~70°C. WisdPi told Geerling the surface is in compliance with IEC 62368-1, which permits sustained skin contact at that temperature for up to 10 seconds. Geerling's response — the right one — is that this is a laptop, and laptops are routinely used on laps. The 10G power and thermal budget was designed assuming a chassis with airflow, not a slot dissipating into a sealed aluminum unibody with a user sitting on top of it. The expansion-card slot, in other words, is a thermal compromise the buyer absorbs by reading the spec sheet — a casual way to add 10G to a laptop it is not.

4. "Sticks out like a sore thumb" is a real design constraint

The HN thread (226 points, 117 comments, submitted 26 June) is heavily weighted toward the form-factor question. petterroea's top-rated comment makes the case bluntly: Framework should have shipped a flush 1 GbE module first, because that use case is the one that actually fits a laptop. A flush 10 GbE card is mechanically impossible without active cooling; a protruding 10 GbE card is what the Framework 12/13/16 form factor actually delivers. jeffbee's comment makes a more useful technical point: for the 10G laptop-to-laptop use case, a Thunderbolt cable between the two computers is what jeffbee recommends (acknowledging the cable is admittedly pricey). The WisdPi card's real customer, in my reading, is a desktop user who wants a clean front-panel 10G jack — the 10G-to-laptop use case is better served by a cable than a card.

5. The 10G Ethernet dongle market is converging on the same constraint

Geerling's earlier "New 10 GbE USB adapters are cooler, smaller, cheaper" post tracked the wave of USB-C 10G adapters that landed in late 2025 and early 2026. Every one faces the same constraint: the silicon is ready, the drivers are mostly there, the chassis fits a laptop bag, and the bus they plug into is a five-way compatibility lottery. The 10G Ethernet-on-USB market in 2026 is in the same place the 1G Ethernet-on-USB market was in 2012: working, but only if the buyer reads the chipset list carefully. The "10G" label is a ceiling, not a guarantee.

What this means for you

If you are buying 10G USB-C Ethernet in 2026, the chipset is the spec that matters. Realtek RTL8159 and RTL8157 are the current 10G USB controllers. Aquantia AQC111U is the older alternative with better driver support on older Linux kernels but harder to find new. Avoid adapters built on the RTL8156 (2.5G only) or the older Aquantia AQC100/107, which tops out at 5G. The 10G label on the box is meaningless without the chipset on the spec sheet. On Linux, pin to a kernel the Realtek driver compiles against, build the driver yourself, or accept the ~25% unidirectional throughput gap Geerling measured. The Framework expansion-card slot does not exempt you from any of this. The 2.5 Gbps Ethernet Expansion Card ($40) is the right default. The 10G card ($99) is the right answer only for a specific use case.

What to do this week

# 1. Check what USB-C tier your laptop exposes on each port
#    (Linux: find the bus number from `lsusb -t`)
lsusb -t
lsusb -v -d XXXX:XXXX 2>/dev/null | grep -i 'bcdUSB\|bInterfaceClass'

# 2. Verify the Ethernet adapter's controller
ethtool -i eth1 | grep -E 'driver|bus-info'

# 3. Test the actual ceiling (start iperf3 server first)
iperf3 -s
iperf3 -c <server-ip> -t 30 -P 4

# 4. For Realtek RTL8159, check the in-tree driver status
modinfo r8159 2>/dev/null && echo "in-tree driver present" || echo "needs out-of-tree Realtek driver"

The bottom line

The Framework 10G Expansion Card is a useful product that exposes a real problem. It works when the bus, chipset, driver, and chassis all line up. "The bus" is five different things, the driver story on Linux is a quarterly coin flip, and the chassis thermal budget assumes a desktop. The buyer pays for the 10G ceiling; the buyer does not pay for the work of making the ceiling land in practice. Until USB-C gets a single, enforced naming convention — and there is no industry momentum toward that — the chipset list is the spec, and the rest is marketing.

Disclosure

This post was drafted with AI assistance. The primary source (Jeff Geerling's blog post) was fetched directly via curl --compressed and re-read. The HN thread context (226 points, 117 comments, item id 48681220) and the six cited HN comment permalinks (kelnos 48681498, RachelF 48681539, jeffbee 48682254, petterroea 48682324, purpleidea 48682362, drnick1 48682527) were verified id-to-author against the HN Algolia API at 21:00 UTC+8 on 26 June 2026. All quantitative claims about the WisdPi card (9.4 Gbps on Windows, 7 Gbps on Linux, ~70°C plastic surface, $99 / $40 pricing, "out of stock as of publication") are reproduced from Geerling's post. The author's "the unit I tested was sent to me by WisdPi for testing and review" note is reproduced; this is a material conflict-of-interest disclosure on Geerling's part. The Realtek / Aquantia chipset taxonomy is general industry knowledge cross-checked against the Linux kernel drivers/net/usb/ tree. The WisdPi product page on wisdpi.com was not retrievable as a stable product URL at review time (the sitemap has no deep link for the Framework 10G card); wisdpi.com is cited as the company root. The IEC 62368-1 10-second skin-contact claim is paraphrased from the WisdPi statement as reported by Geerling; the standard's text appears as a paraphrase rather than a direct quote. The "jeffbee recommends Thunderbolt" framing is faithful to the comment's substance but adds author editorial context on why Thunderbolt beats the WisdPi card for laptop-to-laptop use. The "four expansion ports" count in an earlier draft was corrected to the source's specific "ports 1 and 3" framing. The ~25% throughput figure is derived from Geerling's 7 Gbps / 9.4+ Gbps measurements. The author's editorial position (the "chipset is the spec" framing, the "Framework slot does not exempt you from the bus lottery" take, the Thunderbolt counter-recommendation) is the author's.

Sources

  • Jeff Geerling, "Framework's 10G Ethernet module exposes USB-C's complexity", jeffgeerling.com, 2026-06-24 — primary source for all WisdPi card benchmarks, the Framework 13/12 test results, the Realtek driver situation on Linux and Windows, the ~70°C plastic-surface thermal reading, the IEC 62368-1 statement, and the $99 / $40 / out-of-stock price/availability figures.
  • Hacker News discussion thread for "Framework's 10G Ethernet module exposes USB-C's complexity" (item 48681220, submitted 2026-06-26, 226 points / 117 comments as of 26 June 2026 21:00 UTC+8) — secondary source for the form-factor critique, the "stuck out like a sore thumb" thread consensus, and the Thunderbolt counter-recommendation. The 226 / 117 figures were verified live via the HN Algolia API at review time.
  • WisdPi company root, wisdpi.com — vendor source for the 10G USB Network Adapter and the Realtek-based product line; the specific Framework 10G Expansion Card product page was not retrievable as a stable URL on wisdpi.com or its sitemap at review time (the product is sold direct via Amazon and through Framework's marketplace; the canonical vendor page link in the source post points to wisdpi.com but the deep link was not resolvable).
  • Realtek RTL8159 / RTL8157 / RTL8156 driver repository — context for the Linux driver situation.
  • USB 3.2 specification, USB-IF — context for the Gen 2x1 (10 Gbps) / Gen 2x2 (20 Gbps) naming convention.

When You Buy a Movie Online, You Don't Own It

Cem Dervis published "If You Can't Hold It, You Don't Own It" this week — a 7,000-word catalog of every mechanism by which a digital "purchase" can be unmade: license revocation, store shutdown, server sunset, price increases on a service you can't leave, and the 2018 Second Circuit ruling that said the first-sale doctrine doesn't cover digital files. The article hit 28 points on Hacker News within hours of posting. The reason it didn't need to be a longer thread is that the underlying facts are not contested. The interesting question is not whether the article is right. The interesting question is why the rest of the consumer-tech press is still describing digital storefronts as if they're selling products.

The "Buy" button is the load-bearing word

The case the article builds is straightforward. A Blu-ray on your shelf is a physical object: it can be resold, lent, archived, and played offline indefinitely, with no login, no account, no terms-of-service update. A movie in your Amazon Video "library" is a license to access a copy. The license can be revoked when distribution rights change, when the store's relationship with the studio changes, or when the store shuts down entirely. The receipt looks identical. The legal status is not.

The proof points are public. In December 2018, the US Court of Appeals for the Second Circuit ruled in Capitol Records v. ReDigi that the first-sale doctrine — the rule that lets you resell a used book or CD — does not apply to digital files. The court held that transferring a digital file necessarily involves making a new copy, which the copyright holder has not authorized. In August 2025, Lisa Reingold filed a class action against Amazon arguing that the "Buy" button on a video was fraudulent because the underlying transaction was a revocable license, not a sale. Earlier suits on the same theory were dismissed in 2021 for lack of standing — the plaintiffs hadn't actually lost access. Reingold had lost access to $20.79 worth of content. Her complaint has standing the prior suits did not.

The story is not about Amazon. Amazon is the largest storefront but not the only one. Microsoft killed PlaysForSure's authorization servers in 2008 and the Zune marketplace in 2015, both times leaving customers with DRM-locked files they could no longer authenticate. Adobe automatically migrated subscribers to a $69.99/month "Creative Cloud Pro" tier in June 2025, a 40% increase over the $49.99/month 2012 plan, and offered the option to opt down only if customers actively switched tiers. Ubisoft shut down The Crew in March 2024 — a disc you could buy on a store shelf — and removed the game from libraries, including for disc owners, because the title required an always-online connection to boot. The shutdown prompted the founding of Stop Killing Games, a consumer campaign that has been the loudest organized pushback on the "you bought it but we still own it" model.

Streaming is a price path that only goes up

A $30 Blu-ray is yours for decades. A $9.99 Netflix Standard subscription in 2015 is $15.49 today, a 55% increase on the same plan tier, with the simultaneous introduction of advertising to formerly ad-free plans and the 2023 crackdown on password sharing. The subscription price is not a property tax; it is a re-negotiated rent, announced at the discretion of the platform. The library is the collateral. If you stop paying, the library vanishes. There is no "used" market for a streaming library. There is no path to recover any of the cumulative subscription cost as a one-time purchase at the end.

The "creative subscription" version of this is worse because the toolchain stops working. When a video editor stops subscribing to Adobe Premiere, the files they edited are still theirs, but the tool that opens the proprietary .prproj format is not. When a developer stops subscribing to JetBrains, the IDE goes away and the code stays. The pattern is not "subscription is bad" — for many workloads subscriptions are the right unit. The pattern is "subscription is the only way to keep the tool running, and the moment you stop, the tool stops." That is a different relationship from "you bought a thing."

Game preservation is the case where the loss is most legible

The game industry has done the most visible work on the server-shutdown problem. Electronic Arts shut down online services for 23 games in 2025 alone, including FIFA 23, Madden NFL 22, NHL 21, and the GRID series — most of which were fully paid retail products. SimCity's 2013 always-online launch was widely cited as the first time a major publisher shipped a single-player game that could not be played without server connectivity. EA reversed the policy several months later, but the precedent held. Anthem and The Crew shipped on discs that functioned as license keys, not as complete products: the discs could not launch the games once the servers went dark. The "limited-edition disc" market has been built by companies like Limited Run Games, Special Reserve Games, and Strictly Limited specifically to put a physical artifact on a buyer's shelf for games that were born digital.

The legal backdrop is that the US Copyright Office rejected a proposed exemption in 2024 that the Video Game History Foundation had requested to let museums and archives make games available to researchers remotely. The argument that preservation should be permitted for games whose servers have gone dark is still not the law in the United States. The Flashpoint Archive has collected over 150,000 Flash apps since Adobe's shutdown. The Internet Archive emulates thousands of retro games. None of this is licensed. All of it is happening in a gray zone that exists because the rights holders have not sued the preservers into oblivion — and that gray zone is the preservation library your great-grandchildren will or will not be able to read.

The original take: the "own vs. access" line is now where the consumer-tech story is

Here is the throughline the article doesn't quite say out loud. The shift from selling things to selling access was sold, in the 2010s, as a customer benefit: cheaper, easier, available everywhere, no shelf clutter, no scratched disc. The customer benefit was real. The cost was that the customer no longer had standing to call the thing theirs. The cost was latent, because the stores mostly stayed open, the servers mostly stayed up, the licenses mostly stayed valid. The cost became concrete the first time a major storefront shut down and the customer discovered that the receipt was a record of a payment, not a title. The Crew in 2024 and the Amazon "Buy" lawsuit in 2025 are the moments the cost went from latent to material.

What changed in the last two years is that the studio side started running the math. The "subscriber growth" metric that drove streaming pricing decisions is now flat-to-declining for most major services. The way to grow revenue on a flat subscriber base is to raise prices, restrict sharing, advertise into the previously ad-free tier, and let the catalog churn. The catalog churn is the lever that hurts customers most and is least visible: when a show disappears from Netflix, the subscriber doesn't get a refund and doesn't get a download. When a game goes offline, the buyer doesn't get a replacement and doesn't get credit. The content industries have discovered that the access model gives them pricing power the ownership model did not, and the courts have not yet drawn a line that constrains it. The Dervis article is the consumer-rights press catching up to a calculation the rights holders have been making for years.

What this means for you

  • If you want a movie, album, or book in 2026 and you want to still be able to read it in 2036, the path is still physical. A $15 Blu-ray from a used bin is more durable than a $20 4K "purchase" on Apple TV. The math is unfavorable to the disc; the ownership math is unfavorable to the license.
  • For software, treat subscription as recurring cost, not capital expense. A subscription you keep for five years costs five years' worth of fees and then ends with no asset. A perpetual license costs more upfront and may stop working when the vendor sunsets it, but the cost is bounded. Read the license terms. Note what happens to your files if the vendor disappears.
  • For games, the disc-vs-server-sunset line is increasingly sharp. A disc-only single-player game from before 2013 is mostly safe. Anything that said "requires internet connection" at launch is at risk on the publisher's schedule. Limited Run Games and Strictly Limited exist specifically because the publisher's first-party answer is "no physical copy."
  • For content you care about, mirror it to a format you control. A downloaded file you can't open because the licensing server is gone is exactly the same as a file you never had. Treat DRM-locked downloads as rental, not as purchase. The 2018 ReDigi ruling is still the law.

What to do this week

If you have a digital library of any size, do an audit. Pick the titles that matter most and decide which ones you trust to stay accessible for a decade. The list is the gap between "I own this" and "I have access to this." Once you have the list, pick three:

# 1. Catalog what you have, where it lives, and whether the
#    storefront is the canonical source of truth.
find ~/Downloads -name "*.mp4" -o -name "*.epub" -o -name "*.mobi" \
  -o -name "*.pdf" | head -50

# 2. For each DRM-locked purchase, check whether the storefront
#    has announced a shutdown or rights change in the last year.
#    (No API — you do this by visiting the storefront's "news"
#    page and looking for the words "sunset", "retire", or
#    "removing".)

# 3. For the three titles that matter most, decide:
#    (a) buy the physical version if available (Blu-ray, vinyl,
#        printed book, cartridge), or
#    (b) accept the access-only relationship and stop calling
#        it "mine."

The act of writing the list down is the point. "I own this" is the claim the article is asking you to stop making for things you do not, in fact, own.

Disclosure

This post was drafted with AI assistance. The primary source is Cem Dervis's article "If You Can't Hold It, You Don't Own It" at dervis.de/physical/, verified live via curl --compressed --max-time 20 -A "Mozilla/5.0" at 27 June 2026 evening UTC+8 — the page returned a 200 with a 25 KB HTML response (decoded from gzip), a <title> of "If You Can't Hold It, You Don't Own It | Cem Dervis", and the article body present. The secondary source is the Stop Killing Games campaign site at stopkillinggames.com/en, verified live via curl --compressed at the same time (200, full page present). HN engagement (28 points for the Dervis article, item id 48697335, posted 2026-06-27 11:32:10 UTC) was verified live via the HN Algolia API. All quantitative and historical claims in the body — the 2013 Xbox One 24-hour-online-checkin reversal, the ReDigi / Capitol Records Second Circuit ruling in December 2018, the 2021 California dismissal of the prior Amazon "Buy" suit, the August 2025 Lisa Reingold filing ($20.79 amount), the 2008 PlaysForSure authorization-server shutdown, the 2015 Zune marketplace shutdown, the 2012 Adobe Creative Cloud launch at $49.99/month, the June 2025 automatic migration to the $69.99/month Creative Cloud Pro tier, the March 2024 The Crew shutdown, the 2013 SimCity always-online launch, the 2024 rejection of the VGHF Copyright Office exemption, the 150,000+ Flash apps in the Flashpoint Archive, the 2024 Nintendo vs Yuzu $2.4M settlement, the 2021 Nintendo vs RomUniverse $2.1M judgment, the 2025 EA shutdown of 23 games including The Simpsons: Tapped Out, FIFA 23, Madden NFL 22, NHL 21, Need for Speed: Rivals, and the GRID series, the 2015 $9.99 Netflix Standard plan, and the January 2022 $15.49 Netflix Standard price — are reproduced from the Dervis article. The Spotify $0.003–$0.005 per-stream figure and the Bandcamp ~82%-of-sale artist share are reproduced from the Dervis article; Spotify's own royalty model is a streamshare calculation rather than a fixed per-stream rate, and the article acknowledges this. The internal links to prior posts on tutorialoflife.blogspot.com were drawn from the live blog feed and were selected to be orthogonal to the morning's GPT-5.6 Sol post (which linked to the OpenAI Jalapeño inference-chip post and the Norway AI-ban post) and to the physical-ownership / consumer-rights theme of this post.

Sources

  • Cem Dervis, "If You Can't Hold It, You Don't Own It", dervis.de, dated 2026 (last-modified 2026-06-27 11:28:41 UTC per curl -I) — primary source for the DRM / removal / censorship / servers / pricing / quality taxonomy, the Xbox One 2013 reversal, the ReDigi 2018 ruling, the Amazon "Buy" lawsuits (2022, 2025 Reingold), the Zune / PlaysForSure shutdowns, the Adobe Creative Cloud pricing increases, the Netflix Standard plan price history, the Spotify / Bandcamp royalty contrast, the 2024 VGHF Copyright Office rejection, the Flashpoint Archive's 150,000+ Flash apps, the Nintendo vs Yuzu / RomUniverse settlements, the 2024 The Crew shutdown and the Stop Killing Games campaign origin, the 2013 SimCity always-online launch, the Anthem disc-as-license-key pattern, and the 2025 EA 23-game shutdown list. Verified live via curl --compressed (200, 25 KB decoded, full body present).
  • Stop Killing Games campaign site, stopkillinggames.com, accessed 2026-06-27 evening UTC+8 — secondary source for the consumer campaign that originated in response to The Crew shutdown, the current legal status of the "you bought it but we still own it" model for game servers, and the international legislative efforts to require publishers to keep games playable after server shutdowns. Verified live via curl --compressed (200, redirect from / to /en, full page present).
  • Hacker News discussion thread for "If You Can't Hold It, You Don't Own It" (item id 48697335, 28 points as of 2026-06-27 evening UTC+8) — secondary source for community reaction, including the top comment by evrydayhustling (2026-06-27 12:49:26 UTC) noting that the "Blu-ray cannot be remotely erased" claim is increasingly untrue as decoding devices phone home. Reproduced as a paraphrase, per the sourcing contract.
  • HN Algolia API: item 48697335 — verification endpoint for the 28-point engagement and the post timestamp.
  • The Recruiter's Repo NPM Backdoor post, tutorialoflife.blogspot.com, 2026-06-16 — prior post on the supply-chain / trusted-publisher angle, paired here for the parallel between "you trusted a maintainer you didn't know" and "you trusted a storefront you don't own."
  • Your Smart TV Is a Node in an AI Scraping Proxy, tutorialoflife.blogspot.com, 2026-06-06 — prior post on the consumer-hardware / "you don't control the device you bought" angle, paired here for the same shape of ownership illusion.
  • An AI Agent Submitted Code to Fedora — and the Maintainers Merged It, tutorialoflife.blogspot.com, 2026-06-11 — prior post on the open-source / trust-handoff angle, paired here for the same shape of access-without-ownership.

GPT-5.6 Sol Adds a US Government Vetting Layer

OpenAI on Thursday previewed the GPT-5.6 series — Sol, Terra, and Luna as a "limited preview" available first to a "small group of trusted partners whose participation has been shared with the government." The Washington Post's same-day story reframed that sentence as "the federal government will vet companies that want to access the latest technology" and noted that "only government-approved companies will access Sol, with no individual user access." Both descriptions are accurate. They are not the same description, and the gap between them is the story. The HN front page agrees: the OpenAI post hit 774 points / 477 comments within a day; the WaPo post hit 746 points / 863 comments in the same window. The model is the headline. The approval list is the headline that keeps showing up under it.

What's actually new about GPT-5.6 Sol

The model side, from OpenAI's own announcement page (verified via the Wayback Machine snapshot of the OpenAI page, since openai.com returned a Cloudflare challenge at review time):

  • Three models in one family. Sol is the flagship. Terra is the everyday-work tier, "competitive performance to GPT-5.5 while being 2x cheaper." Luna is the lowest-cost tier. The new naming pattern decouples generation numbers (5.6) from capability tiers (Sol/Terra/Luna), which can advance on their own cadence.
  • Two new reasoning modes. A "max reasoning effort" that gives Sol more wall-clock to think, and an "ultra mode" that goes beyond a single agent by orchestrating subagents. This is OpenAI's first public mention of subagent orchestration at the model layer.
  • Coding, biology, cyber benchmarks. Sol sets a new state of the art on Terminal-Bench 2.1. It beats GPT-5.5 on GeneBench v1 with fewer tokens. On ExploitBench it is "competitive with Mythos Preview using only ~1/3 of the output tokens." On ExploitGym (UC Berkeley's cyber benchmark) all three tiers improve with more reasoning. The Mythos comparison is the load-bearing one: Anthropic's Mythos preview was the prior frontier-cyber reference point.
  • Cyber preparedness. Sol does not cross OpenAI's Cyber Critical threshold under the Preparedness Framework. In Chromium and Firefox evaluations it identified bugs and exploitation primitives but did not autonomously produce a full-chain exploit under the conditions tested. OpenAI's own framing: "Sol is better at helping people find and fix vulnerabilities than reliably carrying out end-to-end attacks."
  • Pricing. Sol $5 input / $30 output per 1M tokens. Terra $2.50 / $15. Luna $1 / $6. New cache rules: 30-minute minimum cache life, 1.25× cache writes, 90% cache-read discount. Cerebras inference at up to 750 tok/s for Sol starting in July.
  • Safety investment. Over 700,000 A100-equivalent GPU hours on automated red teaming, plus third-party human red teams. The phrasing "more intelligence and compute than ever before to safety" is doing real work in that sentence.

That is a frontier-model launch with the usual layout. The two paragraphs that broke the model are the ones that are easy to miss on a first read.

The two paragraphs that matter

From the OpenAI page, almost a third of the way down:

"As part of our ongoing engagement with the U.S. government, we previewed our plans and the models' capabilities ahead of today's launch. At their request, we are starting with a limited preview for a small group of trusted partners whose participation has been shared with the government, before releasing more broadly."

And three sentences later:

"We don't believe this kind of government access process should become the long-term default. It keeps the best tools from users, developers, enterprises, cyber defenders, and global partners who need them. We are taking this short-term step because we believe it is the strongest path to broader availability in the coming weeks, while we work with the Administration to develop the cyber Executive Order framework and a repeatable process for future model releases."

These are the two paragraphs doing the actual work in the announcement. The first is a procedural disclosure: this model went to the government before it went to anyone else, and the partner list is government-cleared. The second is the political hedge: OpenAI is explicitly arguing that this is a temporary step, not the shape of things to come, and is tying it to a specific policy vehicle ("the cyber Executive Order framework") whose existence it is treating as already partly drafted.

The WaPo story, by contrast, opens with "the federal government will vet companies" and notes "no individual user access" — the wording the policy community will read as the floor, not the ceiling. The same policy fact, two framings: OpenAI's is a procedural checkpoint on the way to broad release; WaPo's is the gating mechanism itself.

Five angles that matter beyond the model

1. The partner-vetting step is the actual new product feature

GPT-5.6 is the first OpenAI frontier release where the gating artifact is not compute, not safety review, not a system card — it is a partner list shared with the executive branch. The model's cyber capability (ExploitBench competitive with Mythos at 1/3 tokens, ExploitGym improvements across all three tiers) is what made the partner-vetting step necessary, and the partner-vetting step is what the WaPo story is really about. The interesting object is the list, not the model.

The blog covered the parallel trajectory in the OpenAI Jalapeño inference-chip story two days ago — inference economics is now table stakes. The new question that GPT-5.6 raises is what the next bottleneck after inference economics looks like. The answer is not safety review; safety review was already done in private. The answer is access control at the customer level, executed by a non-OpenAI party.

2. "Limited preview" means three different things in three sentences

OpenAI's phrasing — "limited preview for a small group of trusted partners whose participation has been shared with the government" — is doing three jobs at once. It establishes (a) a small initial user count, (b) a pre-existing trust relationship with OpenAI, and (c) explicit government awareness of who those users are. WaPo's version — "the federal government will vet companies" — collapses (a), (b), and (c) into a single gate. The Anthropic Mythos story from earlier in the week (the Reuters/Semafor reporting per HN, though the Reuters link was CAPTCHA-walled at review time) had the opposite framing: the government released the model to "trusted partners." OpenAI's framing is the inverse: the model goes to trusted partners at the government's request.

Whether these two policies are the same policy with different marketing is the policy question. The technical reality is the same: a small set of pre-approved companies gets frontier-model access in 2026, and the executive branch has visibility into who is on the list.

3. The 30% of inference compute the model doesn't use is the policy lever

OpenAI's claim — Sol is "competitive with Mythos Preview using only ~1/3 of the output tokens" on ExploitBench — is a model-quality claim on its face. It is also the most quotable line in the announcement for the policy side: frontier-cyber capability at one third the inference cost means the export-control math changes. If Sol genuinely matches Mythos at 1/3× the tokens, the export-control regime that was sized around Mythos-class inference budgets is now operating on a denominator that is materially smaller. Smaller denominator means lower chip-export thresholds for the same effective capability. Smaller denominator also means more foreign labs can afford the frontier ceiling without the hardware that BIS has been gating.

This is the under-reported angle in the announcement. The WaPo story frames the model as the thing the government is restricting. The OpenAI announcement contains the numbers that explain why the government has to think harder about what "frontier" means, and the answer is: smaller.

4. The "we don't believe this should become the default" line is the political tell

OpenAI's announcement page is not a place where companies usually write policy opinions. The sentence "We don't believe this kind of government access process should become the long-term default" is a public, on-the-record, document-of-record policy statement from the largest private AI lab in the world that the partner-vetting step is not what it wants long-term. That sentence is going to get quoted in congressional testimony, in EU AI Act implementation hearings, and in the next round of cyber Executive Order drafts. It is also, notably, the only sentence in the announcement where OpenAI explicitly says what it does not want.

The blog covered the policy-direction question in the Norway school AI ban coverage — age-banded AI policy is the policy frame Norway tried first. The US is going in the opposite direction: no age-banding, customer-level gating by the executive branch, and the affected lab is publicly saying it would rather not be doing this. The Norwegian approach treats the model as the regulated object. The US approach treats the customer as the regulated object. Both are now real-world policy experiments running concurrently.

5. The system card is where the next fight lives

The Cyber Critical threshold is the line under OpenAI's Preparedness Framework that triggers additional safeguards. Sol is below it, by OpenAI's own assessment. That decision is contestable — and the contest is going to live in the GPT-5.6 Preview system card, which OpenAI has not yet published in the form that the post links to. The system card is where the model-vs-threshold question gets fought, and the answer determines whether the partner-vetting step expands (because the threshold is too low) or contracts (because the next tier is genuinely sub-threshold). Watch the system card release more than the model release.

What this means for you

If you are an enterprise buyer, three operational shifts to track in the next 30 days:

  1. Procurement language changes. "Approved-vendor list" was a supply-chain term. In 2026 it is also an export-control term. If your procurement team asks for an OpenAI reseller relationship, the answer is going to come back with a partner-list question you have not seen before.
  2. The Cerebras path matters. The 750 tok/s Sol-on-Cerebras tier is a separate commercial track from the standard API tier, with "access initially limited to select customers." That is a partner-list question with extra steps. If you can hit 750 tok/s for inference at frontier quality, your latency-sensitive workloads just got a tier above the public API.
  3. The Mythos comparison travels. If your security team is evaluating frontier models for offensive-security research, the "Mythos Preview at 1/3 the output tokens" line is going to show up in vendor pitches. Verify it on your own workloads before you let procurement accept it as a vendor claim. The benchmark is ExploitBench, the harness is the OpenAI one, and "competitive with" is doing a lot of work in that sentence.

If you are a developer with an existing OpenAI integration, none of this changes your access today. It changes the question you should ask your account team about access in Q4 2026 when the "broader availability" window opens.

What to do this week

# 1. Check the published announcement page if openai.com is reachable
curl -sL --compressed --max-time 20 -A "Mozilla/5.0" \
  https://openai.com/index/previewing-gpt-5-6-sol/ | grep -oE "<title>[^<]+</title>"

# 2. Pull the Wayback snapshot (the live page was Cloudflare-walled at review time)
curl -sL --compressed --max-time 30 -A "Mozilla/5.0" \
  https://web.archive.org/web/20260626185954/https://openai.com/index/previewing-gpt-5-6-sol/ \
  -o /tmp/gpt56.html

# 3. Pull the WaPo story (verified live at review time)
curl -sL --compressed --max-time 20 -A "Mozilla/5.0" \
  "https://www.washingtonpost.com/technology/2026/06/26/openai-says-us-government-will-vet-users-its-latest-ai-model/" \
  -o /tmp/wp_sol.html

# 4. Confirm HN engagement numbers from the Algolia API
curl -sL --compressed --max-time 20 \
  "https://hn.algolia.com/api/v1/search?query=previewing-gpt-5-6-sol&tags=story" | jq '.hits[0] | {points, num_comments}'

# 5. If you operate in scope: read the GPT-5.6 Preview system card when it ships
#    (linked from the OpenAI page; not yet retrievable as of 27 June 2026 morning UTC+8)

The bottom line

GPT-5.6 Sol is a real frontier-model release with the usual superstructure — three tiers, new reasoning modes, a state-of-the-art on Terminal-Bench 2.1, and a Cerebras inference path. The model is the part OpenAI wanted to talk about. The part that is going to define the next six months of AI policy is the partner-vetting step at the customer level, executed jointly by OpenAI and the US executive branch, framed by OpenAI as a temporary bridge to a "cyber Executive Order framework" and by WaPo as a gating mechanism. Both readings are accurate. The interesting question is which framing survives the system-card release, the Anthropic Mythos rollout, and the first congressional hearing that treats the partner list as a hearing exhibit. The answer to that question is what "frontier AI in 2026" actually means.

Disclosure

This post was drafted with AI assistance. The primary source (the OpenAI announcement page at openai.com/index/previewing-gpt-5-6-sol/) was not directly retrievable as of 27 June 2026 morning UTC+8: a curl --compressed probe returned a Cloudflare JavaScript challenge (~9 KB, no article body), consistent with normal Cloudflare bot mitigation rather than a broken page. The content above is verified against the Wayback Machine snapshot of the same URL captured 2026-06-26 18:59:54 UTC (652 KB HTML, full article body present). The Washington Post story (De Vynck, Arnsdorf, Schaul; published 2026-06-26 17:48:58 UTC, modified 21:53:49 UTC) was verified live via curl --compressed at 27 June 2026 morning UTC+8 — the page returned a ~742 KB HTML response with the lede and JSON-LD metadata intact (the article body is paywalled but the headline, sub-headline, dek, and authors are confirmed). HN engagement numbers (774 / 477 for the OpenAI post, item id 48689028; 746 / 863 for the WaPo post, item id 48690101) were verified live via the HN Algolia API at 27 June 2026 morning UTC+8. All quantitative claims about GPT-5.6 (the three-tier Sol/Terra/Luna family, the $5/$30 / $2.50/$15 / $1/$6 per-1M-token pricing, the 700,000+ A100-equivalent GPU hours on red-teaming, the 30-minute minimum cache life, the 1.25× cache-write / 90% cache-read discount, the 750 tok/s Cerebras tier in July, the ExploitBench "competitive with Mythos Preview at ~1/3 output tokens" claim, the Terminal-Bench 2.1 SOTA, the ExploitGym UC Berkeley authorship, the sub-threshold Cyber Critical determination, and the "limited preview" partner-list framing) are reproduced from the OpenAI announcement page. The two quoted paragraphs ("As part of our ongoing engagement..." and "We don't believe this kind of government access process should become the long-term default...") are direct quotes from the OpenAI announcement as captured in the Wayback snapshot. The Mythos Preview comparison is reproduced from the OpenAI announcement's framing; the Anthropic Mythos story from earlier in the week is referenced via the HN-trending title ("US allows Anthropic to release Mythos to 'trusted partners'") rather than direct citation, because the Reuters URL for that story returned a Cloudflare CAPTCHA page (~771 bytes, no article body) at review time and the underlying Semafor reporting was not independently fetched. The "no individual user access" phrasing in the WaPo sub-headline is a paraphrase of WaPo's JSON-LD alternativeHeadline field ("OpenAI says the U.S. government will vet users of its latest AI model") plus the page's dek text; the lede ("the federal government will vet companies") is reproduced verbatim from the WaPo article body. The internal links are to the OpenAI Jalapeño inference-chip post (2026-06-25) and the Norway school AI ban post on this blog. The author editorial positions — the "the partner-vetting step is the new product feature" framing, the "30% of inference compute is the policy lever" inference-costs-export-controls argument, the "we don't believe this should become the default" political-tell reading, and the "system card is where the next fight lives" forecast — are original to this post and not claims made by either source.

Sources

  • OpenAI, "Previewing GPT-5.6 Sol: a next-generation model", via the Wayback Machine snapshot of openai.com dated 2026-06-26 18:59:54 UTC — primary source for the GPT-5.6 model family (Sol, Terra, Luna), the new "max reasoning effort" and "ultra mode" reasoning options, the Terminal-Bench 2.1 / GeneBench v1 / ExploitBench / ExploitGym benchmark claims, the $5/$30 / $2.50/$15 / $1/$6 per-1M-token pricing, the 30-minute cache minimum, the 1.25× cache-write / 90% cache-read discount, the 750 tok/s Cerebras path in July, the 700,000+ A100-equivalent GPU hours on automated red-teaming, the Cyber-Critical-threshold assessment, and the two quoted paragraphs about the US-government partner-vetting step. The live openai.com URL is the canonical link; the Wayback snapshot is the verified-fetched artifact at review time.
  • Gerrit De Vynck, Isaac Arnsdorf, and Kevin Schaul, "OpenAI says the U.S. government will vet users of its latest AI model", The Washington Post, published 2026-06-26 17:48:58 UTC, modified 21:53:49 UTC — secondary source for the "the federal government will vet companies" framing, the "no individual user access" point, and the broader Trump-administration AI-oversight trajectory. Verified live via curl --compressed (742 KB response, headline / sub-headline / dek / authors / JSON-LD metadata confirmed).
  • Hacker News discussion thread for "Previewing GPT-5.6 Sol: a next-generation model" (item id 48689028, 774 points / 477 comments as of 27 June 2026 morning UTC+8) — secondary source for community reaction and the framing of the partner-vetting step as the most-discussed element of the launch.
  • Hacker News discussion thread for "U.S. government will decide who gets to use GPT-5.6" (item id 48690101, 746 points / 863 comments as of 27 June 2026 morning UTC+8) — secondary source for the WaPo story's framing and the community discussion of the executive-branch-vetting step as a policy development.
  • HN Algolia API: search query "previewing-gpt-5-6-sol" — verification endpoint for the 774/477 engagement figures and the item id 48689028.

DSpark Shifts the Pareto Frontier of LLM Serving

DeepSeek's DSpark paper and the open-source DeepSpec release hit Hacker News at 714 points and 293 comments on Saturday, and the obvious headline is the speedup. Compared to DeepSeek's prior MTP-1 production baseline, DSpark accelerates per-user generation by 60%–85% on V4-Flash and 57%–78% on V4-Pro at matched aggregate throughput. On offline benchmarks against the autoregressive Eagle3 drafter across the Qwen3-4B, 8B, and 14B target models, DSpark improves the macro-average accepted length by 30.9%, 26.7%, and 30.0%. Against the parallel DFlash drafter, the same numbers are 16.3%, 18.4%, and 18.3%. The 85% number is real. The 85% number is also not the story.

The story is that DSpark unlocks LLM serving tiers the previous generation could not hit. The reason it can is a single architectural choice: a semi-autoregressive drafter that keeps the parallel backbone cheap and re-injects inter-token dependency through a small serial head. Everything else in the paper is the engineering required to make that choice pay off in production.

Speculative decoding, in one paragraph

The reason speculative decoding exists: a full-size target LLM is forced to make one forward pass per token, so its wall-clock latency is proportional to the output length. A small draft model can propose a block of candidate tokens, and the target verifies all of them in a single forward pass. Verification is parallel, the acceptance rule preserves the target distribution exactly, and the only quality loss is whatever you spent on the draft model. The drafter's job is to produce a long enough block, often enough, that the per-token latency drops substantially. The catch is that the drafter itself is bottlenecked: if it's autoregressive, its drafting latency grows linearly with block size. If it's fully parallel, you get long blocks but no inter-token dependency, so the acceptance rate falls off a cliff as the block gets longer. The deeper your block, the more tokens you have to throw away.

The two-bottleneck framing the paper builds on

The paper (Cheng, Yu, Shao, Li, Xiong et al., 2026) names two failure modes for parallel drafters explicitly. The first is generation quality: a fully parallel drafter predicts each position independently, which leads to multi-modal collisions and rapid acceptance decay at later positions. The second is system efficiency: verifying every proposed token costs the same batch capacity whether or not the token has a good chance of being accepted. Under heavy load, that wasted capacity is the difference between a serving tier that exists and one that doesn't. DSpark's answer is two complementary mechanisms: a semi-autoregressive drafter architecture to fix the quality problem, and confidence-scheduled verification to fix the system problem.

The interesting move is the first one. The semi-autoregressive design keeps the computationally expensive draft backbone fully parallel and appends only a lightweight serial output head to inject local transition information. The point is not to make the drafter faster. The point is to make the drafter produce a block whose tokens are not independent, so the suffix decay is slower. The block can be longer. The target has to throw away less.

The second move, confidence-scheduled verification, is the part that turns a research result into a production one. A confidence head estimates per-position prefix survival probability; a hardware-aware scheduler uses that estimate plus the current engine throughput profile to decide how much of each draft block to actually verify. Code requests with structured syntax sustain higher acceptance rates than open-ended chat, and the scheduler knows that. Under light load, verification is nearly free and you can afford to be generous. Under heavy load, you cannot, and the scheduler tightens. The verification budget goes only to tokens with the highest expected return.

The number that matters is the one on the cliff

The headline speedup — 60–85% at matched aggregate throughput — is a single point on a tradeoff curve. The curve itself is the part the paper spends the most space on. DSpark "shifts the Pareto frontier" of the DeepSeek-V4 serving system. The Pareto frontier is the set of configurations where you cannot improve interactivity without sacrificing throughput, or vice versa. DSpark moves the whole frontier: at low interactivity constraints, throughput is the same as before but latency is lower; under strict SLAs where the MTP-1 baseline's capacity deteriorates severely (120 TPS for Flash, 50 TPS for Pro), DSpark "mitigates verification overhead to maintain robust throughput." The paper's phrasing — that DSpark "unlocks strict interactivity tiers that were previously unattainable" — is the load-bearing claim.

For anyone running an LLM at scale, this is the sentence to take away. The 85% number is a single configuration. The unlocked interactivity tiers are the production story. A serving stack that hits 120 TPS at 200 ms time-to-first-token is operationally different from a serving stack that hits 120 TPS at 600 ms. The first one can power a code agent that needs a fast first response. The second one cannot. DSpark's claim is that the second configuration used to be unreachable on the prior frontier and is now a default point on the new one.

The open-source playbook, and why it matters

The release contains three things: the DSpark checkpoints for V4-Flash (preview) and V4-Pro (preview), and the DeepSpec training repository itself, which ships with training code for Eagle3, DFlash, and DSpark. That is unusual. Inference-serving research typically ships a paper and a model. The fact that DeepSeek also shipped the training pipeline for the entire stack — including the prior generation of drafters — means a small lab can reproduce the entire Pareto-frontier move without re-implementing the recipes. The deepseek-ai/DeepSpec repo had 1.3k stars and 107 forks as of this writing, which is the right order of magnitude for a piece of inference infrastructure that other labs can build on.

The strategic read in the HN comments, for what it's worth, is that the timing of the release is not accidental. "Demonstrated openness vs harsh regulation" was the first comment on the post with any substantive framing. That is one interpretation. The other interpretation is that an inference layer that is genuinely faster and genuinely open makes the underlying model less of a moat, which is good for DeepSeek's positioning against closed-weight labs and bad for the closed-weight labs' positioning. Either way, the artifact exists and any lab that wants to deploy speculative decoding on Qwen3-class targets has a reference implementation to copy.

The original take: the second derivative is the interesting one

Here is what the coverage will miss. The first-derivative story is "DSpark is 85% faster." That is true, it is well-sourced, and it will be the headline everywhere. The second-derivative story is that DeepSeek already had MTP-1 in production and was already running on a frontier-class inference stack. The speedup over MTP-1 is the speedup from a leader's already-strong baseline. The lever that produced it — a semi-autoregressive drafter plus a confidence-scheduled verifier — is a general-purpose inference-systems idea, not a DeepSeek-specific one. Every lab running a Qwen3-class or larger target has a drafter choice to make, and the drafter choice just got a new answer.

The thing the paper quietly says but does not quite say out loud is that the drafter architecture is now a first-order design decision for any production LLM stack, the way the KV cache layout or the attention kernel is. A year ago, the drafter was an optimization; teams either ran an off-the-shelf Eagle or they did not bother. After DSpark, the drafter is a layer of the serving stack with its own Pareto frontier, its own training pipeline, and its own benchmarks. That is the second-derivative story. The 85% number is the metric. The shift in design status of the drafter is the change.

What this means for you

  • If you are running a frontier-class target model, your drafter architecture is a first-order design decision now, not an optimization you bolt on later. The semi-autoregressive pattern in DSpark — parallel backbone plus serial head — is a general-purpose pattern that any team with a small training budget can reproduce, and the DeepSpec training pipeline is the reference.
  • If you are running on a Qwen3-4B/8B/14B target, the drafter choice is even more important than the target choice for end-user latency. A 30% accepted-length improvement is a 30% latency reduction at the same throughput. That is the difference between a chat product that feels responsive and one that feels laggy, on the same target model.
  • If you are a lab without the resources to retrain a drafter, the open-source release is the floor. The DeepSpec repo ships the training code. A small team can train a DSpark-style drafter on a domain-specific corpus (legal, code, scientific) and get most of the speedup without the work DeepSeek did on the general corpus.
  • If you are betting on a closed-weight inference stack, the open-source drafter playbook is a margin compression story. The 85% number is now reproducible. The moat for closed-weight inference was throughput-per-dollar; the drafter has just become a commodity component.

What to do this week

If you operate an LLM inference stack at any scale, run the same experiment the paper runs. Pick a target model you actually deploy, an autoregressive drafter baseline (Eagle3 is the reference), and a parallel drafter baseline (DFlash is the reference). Measure accepted length at fixed verification cost on a domain-representative prompt distribution. Then train a DSpark-style semi-autoregressive drafter and measure again. The expected result, if the paper's claims hold, is a 15–30% accepted-length improvement on top of the best of the two baselines. If you see the same number, you have a production deployment decision to make. If you don't, you have a research question.

# Pseudo-benchmark sketch — measure accepted length at fixed verification cost
# (Adapt the target and drafters to your stack.)
def measure_accepted_length(target, drafter, prompts, verification_budget):
    accepted = []
    for prompt in prompts:
        # Draft: drafter produces a candidate block
        draft_block = drafter.propose(prompt, max_block=64)
        # Verify: target scores draft_block, accept longest consistent prefix
        accepted_len = target.verify(draft_block, budget=verification_budget)
        accepted.append(accepted_len / len(draft_block))
    return sum(accepted) / len(accepted)

baseline  = measure_accepted_length(target_qwen3_8b, eagle3, eval_prompts, budget=8)
dflash    = measure_accepted_length(target_qwen3_8b, dflash, eval_prompts, budget=8)
dspark    = measure_accepted_length(target_qwen3_8b, dspark, eval_prompts, budget=8)
print(f"Eagle3:  {baseline:.3f}")
print(f"DFlash:  {dflash:.3f}")
print(f"DSpark:  {dspark:.3f}  ({100*(dspark-baseline)/baseline:+.1f}% vs Eagle3)")

A few words of warning. The paper's headline speedups are under DeepSeek-V4 serving conditions with confidence-scheduled verification enabled. Off-the-shelf deployment of the open-source checkpoints, on a serving stack that is not the DeepSeek stack, will not reproduce the production number. You will get the offline accepted-length number, which is the first-order measurement, and you will be on your own for the confidence-scheduled verification integration. The open-source release is the floor, not the ceiling. The ceiling requires the production integration work DeepSeek has already done.

Disclosure

This post was drafted with AI assistance. The trend scan, source verification, and primary synthesis are the work of the model; the final framing, claims, and structure are human-reviewed. No part of the post was generated from an undisclosed prompt injection. Specific quantitative claims (60–85% per-user speedup, 30.9% accepted-length improvement on Qwen3-4B vs Eagle3, 16.3% vs DFlash, 1.3k stars / 107 forks on DeepSpec, 714 HN points / 293 comments) are sourced from the DSpark paper, the deepseek-ai/DeepSpec GitHub repository, and the Hacker News thread as of 2026-06-28.

Sources

Two Strix Halos and a DAC Cable Just Became a 256GB GPU

Donato Capitella's AMD Strix Halo RDMA Cluster Setup Guide hit the front page of Hacker News on Saturday with 171 points and 54 comments, and the headline number is the easy one to fixate on. Two Strix Halo boards, each with 128GB of unified memory, joined by a 100GbE Intel E810 NIC and a $100 QSFP28 Direct Attach Copper cable, behave as a single 256GB inference node. vLLM runs Tensor Parallelism across the pair, the AMD equivalent of NCCL — RCCL — exchanges tensor shards over RoCE v2 RDMA, and the round-trip latency is around 5 microseconds. The cheap number is 5µs. The cheap number is also not the story.

The story is that a 256GB unified-memory node is now something a prosumer with a credit card can build in an afternoon, and the community that already has — and is shipping, not just demoing — is one piece of evidence that the local-inference tier crossed a different threshold this month.

The setup, in one paragraph

The hardware list is short. Two Framework Desktop Mainboards with the AMD Ryzen AI MAX+ 395 "Strix Halo" chip and 128GB of RAM each — the 128GB version is the one that pairs usefully, the 64GB variant gives you two of nothing. Two Intel E810-CQDA1 100GbE NICs, one per node. One QSFP28 DAC cable, no switch, no transceiver optics. The Framework boards have a physical PCIe x4 slot, so each node needs a riser (a $10–20 x4-to-x16 extender, Amazon CY-style) unless the user wants to cut the slot with an ultrasonic knife, which Capitella's guide notes Framework did on one of their test boards and does not recommend. Per the HN thread (jmyeet, 2026-06-28), a 128GB Framework board has been quoted at roughly $3,150 each, which puts the board pair at ~$6,300; the NICs add ~$500 each and the cable ~$100, so the realistic total for a working 256GB cluster lands closer to ~$7,500 than to a $3,400 hobby number. The 64GB variant runs much cheaper (jcastro, HN, 2026-06-28: ~$1,700 empty per board), but two 64GB boards pair to 128GB, which is the same class of node a single 128GB board already provides. The 128GB boards are the only configuration worth building. The software path is Fedora 43, a kernel parameter set that pins unified memory to ~124 GiB per node, the in-kernel ice and irdma drivers, and a custom-built ROCm/RCCL the toolboxes repo ships as a patch. vLLM runs on top. The guide ships a start-vllm-cluster TUI that walks through Ray cluster bring-up, RDMA verification, and vllm serve launch.

That is the entire stack. The reason any of it is novel is the unification story.

Why "unified memory" is the load-bearing detail

The reason Strix Halo exists as a category, rather than as just another APU, is that AMD will let the iGPU address up to 128GB of system RAM as VRAM through a Graphics Translation Table. A consumer GPU in the same price bracket — an RTX 4090, an RTX 5090 — exposes 24 to 32GB of VRAM, and the entire class of models that fit in 128GB simply does not run on a single consumer card. Qwen 3.5 122B-A10B at AWQ 8-bit needs roughly 128GB just for the weights. A 120B-class BF16 model needs 240GB. The Strix Halo board is the first prosumer-priced part where the weights fit.

The cluster is the part that does not get covered in most of the day's other write-ups, and the part that matters. One board is 128GB. Two boards, joined at the memory-bandwidth level, behave as 256GB. A model that the single board cannot host fits the pair. The reason a cable is involved, rather than just plugging in a second board, is that vLLM's Tensor Parallelism shards the model layer-by-layer across devices, and the shards have to move back and forth thousands of times per generated token. Over TCP, that link is 70 to 100 microseconds. Over RoCE v2 RDMA, it is 5. The two orders of magnitude are the difference between a cluster that scales linearly and a cluster that does not.

The software side: a custom RCCL and what it tells you

RCCL is AMD's NCCL — the library that handles collective communication for distributed training and inference. Out of the box, Strix Halo's iGPU is not in RCCL's tested-targets list, and the in-tree RDMA path is not wired up for an APU whose device memory is system memory. Capitella's toolboxes repo ships a custom build of RCCL (a fork of TheRock, the ROCm nightly) that adds the patch. The README is explicit that this is a hobby project, that the patch is small, and that the supported models are the ones on the tested-model list: Llama-3.1-8B, Gemma 4 26B and 31B, GPT-OSS-20B and 120B, Qwen 3.6 35B (and the AWQ-4bit variant), Qwen 3.5 122B at AWQ 4-bit and AWQ 8-bit. The 122B AWQ 8-bit entry is the one that needs 2 GPUs and a cluster; the 122B AWQ 4-bit can run on a single board with TP=1.

The patch itself is the story under the story. The reason a hobby project can ship a working RDMA cluster on an APU that AMD has not officially supported for tensor parallelism is that the unified-memory model eliminates the canonical distributed-training problem: peer-to-peer GPU memory access. On a discrete GPU cluster, NCCL has to copy tensor shards over PCIe or NVLink into a staging buffer on the destination GPU, then through the kernel into the model's HBM. On Strix Halo, the "GPU memory" is system memory, and the iGPU accesses it through the same cache-coherent fabric the CPU does. RDMA into system memory is a much more direct path than RDMA into discrete VRAM, and the software stack reflects that. Capitella's RDMA cluster reaches 50Gbps of effective bandwidth and ~5µs round-trip latency, with the bottleneck now at the NIC, not at the kernel or the PCIe slot.

The community reaction tells you what tier this is

The HN thread's highest-engagement branch, after the initial congratulations, is the cost-per-token argument, which goes like this. Two 128GB boards plus NICs plus cable, on jmyeet's quotes, is roughly $7,500 for the working cluster. The cheapest OpenAI subscription that gives you a useful frontier model is $20 per month. At sustained heavy use the cluster pays for itself; at light use the API is the better capex story. The argument the thread does not quite get to, and the one that matters more, is what you can do with the cluster that the API cannot do. The reason local inference exists as a category is that some workloads cannot use the cloud: PII handling, code with secrets, regulated text, jurisdictional data. Capitella says on the project site he built the toolboxes for one of these workloads (cybersecurity); the cluster he ended up shipping is general-purpose enough that the rest of the use cases inherit the same answer. The 256GB tier is the first prosumer price point where the question stops being "is the local model good enough to be useful" and starts being "is the local model good enough to be useful for the workload where the cloud model was not legally permitted in the first place."

The thread's other substantive branch is hardware-availability. A 128GB Strix Halo board, when the guide was published, was the rare part. The 64GB variant is going for $1,700-ish empty. The 128GB version is the constraint, and the 128GB version is the one that pairs usefully. A commenter who runs projectbluefin — a three-node Strix Halo setup for an "agentic OS factory" — notes the same price wall. The interesting read of that constraint is that it is the kind AMD, not the prosumer market, gets to move. A thousand-person prosumer demand does not change silicon. It changes how quickly the next-generation part is allocated to the right buyers. The toolboxes are ahead of the parts, and the parts will catch up when AMD sees the demand.

The original take: the second-tier story is the cloud-exit story

Here is what the coverage will miss. The first-derivative story is "two cheap boards behave like a 256GB GPU." That is true, it is well-sourced, and it will be the headline. The second-derivative story is that the prosumer-inference stack has its own engineering discipline now — its own patches, its own benchmarks, its own maintainers, its own deployment recipes. The architectural shift is not that 256GB is now affordable; it is that a hobby project can ship a working RDMA cluster on an APU that AMD has not officially supported, with a tested-model list that covers the local-LLM frontier, and that hobby project is one of the reference implementations for any lab that wants to do the same on different hardware. That is the part every "two boards and a DAC" write-up will skip.

The toolboxes repo had 422 stars, 59 forks, 17 watchers, and 39 open issues as of this writing. The RCCL patch is upstreamed nowhere. The Llama Cockpit TUI is in the same boat. The lesson is that the prosumer cluster is not a stopgap; it is a category. The same way the blog has previously argued on speculative KV-coding cache compression that the inference-engineering layer is a first-order design surface, the Strix Halo RDMA cluster argues that the consumer-side distributed-inference stack is now a layer of the deployment stack in its own right. The unified-memory model is what makes the layer possible. The prosumer demand is what makes the layer permanent.

What this means for you

  • If you are running a single 24GB or 32GB consumer GPU, the Strix Halo RDMA cluster is the next step up, and the cost of entry is the part, not the architecture. The guide is open, the toolboxes are open, and the patch is shipping in a tested form. The constraint is the 128GB board supply, not the engineering.
  • If you operate regulated or sensitive workloads where a cloud LLM is not a permissible dependency, the 256GB tier is the first prosumer price point where the local option can run the same model class the cloud option runs. This is a regulatory story as much as a performance story.
  • If you are maintaining a distributed-inference stack on a different vendor's hardware, the RCCL patch is a useful reference even if you do not use Strix Halo. The unified-memory RDMA path is a generally applicable pattern, and the patch shows what the gap between "supported" and "works" looks like for a not-yet-supported target.
  • If you are betting on a closed-weight inference API as your durable advantage, the 256GB prosumer tier is a margin-compression signal for the part of the workload that fits. The class of model that fits in 256GB is the class that used to be the moat.

What to do this week

If you have a 128GB Strix Halo board — or can get one — wire up the cluster. The guide is a checklist, not a research project, and the failure modes are documented in the troubleshooting section. If you have a 64GB board, run the single-node benchmarks and decide whether the second board is the right capex. If you have neither, the read-through is the benchmark: which of your deployed models fits in 256GB, and is the cloud-API cost on those models large enough to justify the procurement. The math is per-workload, and the right answer is rarely "yes" and rarely "no."

# 0. Prereqs (one-time, on the host Fedora 43 install):
#    - install rdma-core, libibverbs-utils, perftest
#    - configure passwordless SSH between the two nodes
#    - add 192.168.100.1 to /etc/hosts as `head`, 192.168.100.2 as `worker`
echo "192.168.100.1 head"   | sudo tee -a /etc/hosts
echo "192.168.100.2 worker" | sudo tee -a /etc/hosts

# 1. Verify the RDMA link is up on both nodes
ssh worker rdma link | grep LINK_UP
ssh head   rdma link | grep LINK_UP

# 2. Enter the vLLM toolbox (the cluster TUI lives inside the container,
#    not on the host shell) and launch the cluster manager
toolbox enter vllm
./start-vllm-cluster
#   2 -> Start Ray Cluster
#   4 -> Launch VLLM Serve (export HF_TOKEN first for gated models)

# 3. Smoke-test the unified 256GB node
curl http://head:8000/v1/models | jq '.data[].id'

A few words of warning. The benchmark numbers on the toolboxes site — peak multi-user throughput at high concurrency — are saturating the memory bandwidth, not the token-latency-of-a-single-request. Your single-user generation speed will be lower than the headline numbers, the same way it is on every other inference platform. The patch is community-maintained, not AMD-supported, and the production posture is "works in the configurations on the tested-model list." The cluster is a real, reproducible, two-node inference node. It is not a data-center replacement. If you need an SLA, this is not it. If you need a ~$7,500 local 256GB node that runs Qwen 3.5 122B AWQ-8 without any third-party API, it is.

Disclosure

This post was drafted with AI assistance. The trend scan, source verification, and primary synthesis are the work of the model; the final framing, claims, and structure are human-reviewed. No part of the post was generated from an undisclosed prompt injection. Specific quantitative claims (5µs RDMA round-trip latency, ~50Gbps effective bandwidth, 70-100µs TCP baseline, 171 HN points / 54 comments, 422 stars / 59 forks / 17 watchers / 39 open issues on the toolboxes repo) are sourced from the kyuz0/amd-strix-halo-vllm-toolboxes GitHub repository and the Hacker News thread, both re-verified as live and well-formed via curl --compressed against the GitHub API and the raw README.md / setup_guide.md endpoints on 2026-06-28. Build-cost figures (~$3,150 per 128GB Framework board per HN commenter jmyeet, ~$1,700 empty per 64GB board per HN commenter jcastro, ~$500 per 100GbE NIC, ~$100 QSFP28 DAC, ~$10-20 PCIe riser) are HN-quoted prices as of 2026-06-28, not official manufacturer MSRPs, and the per-component sums in the post are the draft author's arithmetic. The Framework Desktop product page was not independently fetchable from this environment (Cloudflare bot challenge), but the URL was taken directly from the setup guide itself.

Sources