Programming guides for beginner...
Any comments are welcomed....
I hope it helps!!! Thanks for drop by...

Wednesday, June 24, 2026

Swift Package Index Joins Apple. Who Indexes the Indexer?

Ted Kremenek, Dave Verwer, and Sven A. Schmidt published a short post on the Swift Package Index blog on 23 June 2026 with the kind of headline that sounds like a finale: Swift Package Index joins Apple. SPI — the search engine and metadata index that has, since 2020, been the de-facto discovery layer for the Swift package ecosystem — is now an Apple project. The full SPI team, including Verwer and Schmidt, are Apple employees. SPI Operations Limited, the UK company that operated the site (registered in London, company number 13466692, the corporate structure Dave Verwer built so the project could take sponsorships and a real payroll), is now part of Apple.

This is being read in two opposite ways, and both readings are correct.

The optimistic reading: SPI finally has the money and the people to do the things a community project could not. The site has indexed 10,000+ Swift packages. It ran more than 3.5 million compatibility builds across all supported platforms in 2025 alone, on a CI matrix that includes macOS, iOS, tvOS, watchOS, Linux, visionOS, WebAssembly, and Android. That is a real load, and it has been running on community donations, sponsor slots, and Dave Verwer's own time for years. One HN commenter (dragon-hn) noted that Verwer also just handed off ownership of his iOS Dev Weekly newsletter, which is consistent with a full transition: the man who ran the project is now spending the same hours at Apple.

The skeptical reading: an Apple-controlled package index is an Apple-controlled package index. jshier, who works on the Swift toolchain, posted the day's most quoted comment: "Not optimistic here. While I'm glad the SPI guys are getting paid (that is, a full time job), Apple is pretty bad at open source and developer services both, and they explicitly call out developer identity as a future direction, which doesn't fill me with hope." Another commenter (classified) put it more starkly: "And there I was hoping the Swift ecosystem could emancipate itself from Apple instead of getting eaten up." Both comments are well-formed and not paranoid. Apple has a real track record of building good developer tools, a real track record of building bad developer services, and a real track record of letting community projects rot when they conflict with platform strategy. The three records are all true at once.

The interesting question is not which reading is right. The interesting question is what the indexer of an indexer looks like.

What the announcement actually says

A close read of the post — the announcement has three structural commitments worth pinning:

  1. The site continues to operate. "Swift Package Index will continue to operate as it does today. You can continue to rely on it to discover packages, check compatibility, and explore documentation." This is a non-trivial concession, because the alternative — quietly relaunching as developer.apple.com/packages and breaking a thousand scripts that point at the old URLs — would have been easier and was, until the announcement, the default expectation.

  2. The source stays open. "Swift Package Index will remain open source. ... Apple engineers will be contributing alongside the community as we build new features and improvements." This is the line that has to hold. SPI is a Metadata index, a CI matrix, and a documentation crawler. All three are pieces of infrastructure the Swift community can copy if Apple misbehaves — but only if the source is actually open. The license matters; the commit history matters; the rate of outside contributions matters.

  3. The future is package signing and identity. "Over time, we plan to introduce new capabilities around areas like package signing and identity to add robustness and security to the ecosystem." This is the part jshier is right to be nervous about. Package signing is a feature Apple wants. Apple has wanted it for a long time. The 2024 Swift Forum discussion of SPM trust was a four-year stalemate because Apple could not agree with itself on whether to ship its own format. The 2026 forum discussion, presumably, will be different — Apple now controls the registry.

Six angles worth your attention

1. The community project was always a platform feature in disguise

SPI's reach is broader than its visibility. The site processes 3.5 million compatibility builds a year, which is more than Apple's own first-party developer.apple.com documentation search gets in that window. Almost every Swift developer has, at some point, hit a "this package supports macOS 13" badge that was generated by SPI's CI. The package manager's "add dependency" UX is, in practice, "go to SPI first, then paste the URL." The whole discovery and evaluation layer of the Swift ecosystem was, for five years, a side project run by two people and a community payroll.

The acquisition is not Apple buying a project that competed with the platform. It is Apple absorbing a project the platform had been quietly depending on. That is a different kind of deal, and the precedent is bad. The risk is not that SPI disappears; it is that SPI becomes a first-party feature with the reliability characteristics of a first-party feature: maintained, but slow, and impossible to fork because the talent has moved in.

2. The funding problem was the real problem, and it is now solved (or not)

The most charitable reading of this announcement is that Dave Verwer, who has been running SPI for five years on a combination of sponsorships, Patreon, and personal time, hit the limit of what a community project can fund. Three and a half million builds a year is a real AWS bill. The build matrix expanded — visionOS, WASI, Android — every expansion added a new platform's worth of CI minutes. The unit economics of a community index that runs a build for every package on every supported platform, every commit, were always going to collapse. Apple buying SPI is Apple paying for the build matrix. That is a real benefit, and it is not a small one.

The less-charitable reading is that the funding problem could have been solved with a more aggressive sponsorship tier, a foundation model (the Rust Foundation, the Python Software Foundation), or a multi-vendor consortium. None of those happened. The single-vendor acquisition is a real failure of the community-foundation model for Swift, and it is worth asking why. The answer is structural: Swift the language is open-source, but Swift the ecosystem is held together by Apple-employee time on the forums, Apple-employee review of Swift Evolution proposals, and Apple-employee maintenance of the toolchain. A vendor-neutral foundation cannot fund what a vendor already pays for in kind.

3. Package signing is the actual fight

The 2024 Swift forum thread on SPM trust died in committee. The deadlock was over format: Apple's preferred approach (a notarized, signed manifest that ties a package to a developer ID) is a stricter variant of what xcodebuild does for app signing, and the community wanted something closer to sigstore or The Update Framework (TUF). The two sides had a four-year argument about whether the package index should be a registry (which can require signatures to list a package) or a search engine (which lists whatever its crawler can find).

The SPI announcement ends that argument. With Apple controlling the index, "package signing" means Apple's signing. The jshier comment, "they explicitly call out developer identity as a future direction, which doesn't fill me with hope," is not a complaint about a hypothetical future; it is a recognition that the future is now structurally locked. Swift packages will get the same identity story as iOS apps. That is a strict win for supply-chain security, and a strict loss of escape velocity — once a package is signed, the package ecosystem is not portable to a non-Apple-run index without re-signing.

This blog's own post on the LinkedIn-recruiter backdoor made the case that package registries are supply-chain attack surface. The SPI move is the right answer to that problem if you trust the registry. The harder question, which the post on the 10,000-GitHub-trojan-repos also raised, is what to do when you cannot.

4. The "two sites that look the same" question is now an Apple problem

A second thread in the HN comments (from frou_dh) surfaced a question many Swift users have quietly had: why are there two package sites — swiftpackageregistry.com and swiftpackageindex.com — that seem to be the same thing? The answer is that they are not the same thing. The Swift Package Registry is the spec and the hosted, official implementation that Apple has been running since 2024. SPI is the discovery and metadata layer that has been running on top of the registry since 2020. They were built by different people, at different times, for different reasons.

The acquisition collapses the distinction. The new SPI is going to be, structurally, the front door of Apple's package registry. The community project called SPI was, structurally, a third-party discovery layer. These are different jobs, with different incentives, and the announcement's careful language — "the site continues to operate as it does today" — is going to run out of shelf life the first time the front door and the registry diverge.

5. The CI matrix is the part that was always going to break

A 3.5M-builds-a-year CI matrix that runs across macOS, iOS, tvOS, watchOS, Linux, visionOS, WASI, and Android is not a feature; it is an infrastructure. Each platform requires a real Mac, a real iOS device simulator, a real Linux VM, a real visionOS device or simulator, a real WASI runtime, and a real Android device. The current SPI implementation pays for the Macs and the Linux machines; the iOS and tvOS work runs on Apple's own CI, which the community was getting for free because Apple employees happened to be working on the project.

If SPI becomes a first-party project, the build matrix is paid for in Apple's CI credits. That is unambiguously good. It is also the kind of dependency a vendor-neutral foundation cannot replicate. The community fork, if it ever has to happen, will lose the iOS / tvOS / visionOS columns, because the macOS hosts for those are first-party Apple assets. This is a structural fact, not a hypothetical one, and it is the strongest reason the announcement's "open source" commitment is incomplete.

6. The "should have built it themselves" comment is the wrong take

One HN commenter (aaronvg) wrote, "kind of surprised Swift didn't launch with this by default, built in-house." This is the Apple-developer-services take, and it is wrong. Apple did try to build a package index. The original Swift Package Manager, in 2015, was a CLI that downloaded tarballs from arbitrary git URLs. The 2020 Swift Package Index project was a community response to a gap Apple had not filled. Apple tried, in 2024, to ship a first-party registry and ran into the same supply-chain politics the community had been arguing about for years. The community project, run by people who were not Apple, was the only path that produced a working system. The acquisition is Apple finally admitting the gap and buying its way out. That is not, on the merits, a bad thing. It is the kind of thing Apple does well.

The original take

The acquisition is good for Swift developers and bad for the precedent it sets, and the most honest position is to hold both.

It is good because the build matrix is paid for, the team has full-time jobs, the discovery layer is going to keep running, and package signing is finally going to ship. None of these are small wins. The supply-chain implications in particular are real: signing is the right answer to the threat model the npm incidents and the GitHub-trojan-repo waves have established, and a registry that can require signing is a strict improvement over an index that can only warn.

It is bad because the precedent is "platform vendor acquires the community's discovery layer." The Swift community tried the foundation model, the consortium model, the sponsorship model, and none of them funded a 3.5M-builds-a-year CI matrix. The model that funded it was a single-vendor acquisition. The next time a small language ecosystem faces the same problem, the only exit they have seen work is to wait to be bought. That is a bad equilibrium, and it is going to be reproduced.

The pragmatic position, which is the one I would take if I were shipping a Swift package tomorrow: do not depend on SPI for anything that is not already on the page. Discovery: SPI. Build matrix: SPI. Documentation hosting: SPI. Anything that requires a trust decision — who signs my package, what identity I publish under, which packages get listed — assume the Apple-controlled version of that decision and design accordingly. The community fork is still possible, and the source is still open, but the structural gravity of the project has shifted. The community that builds the fork will be working with the same source code, the same commit history, and a strictly smaller build matrix. The community that runs the index, going forward, is Apple.

What this means for you

  • If you maintain a Swift package: your package's discoverability just got a permanent Apple-shaped tailwind. Plan for an SPI-hosted version of your README, a spi.dev badge, and — within 12-18 months, based on the announcement's pace — a signed release pipeline. Start sketching what your signing identity looks like, because the answer is going to be "Apple Developer ID" and the question is whether you opt in early or late.
  • If you consume Swift packages: nothing changes this week. The build matrix still runs. The site still works. The dependency you added last month is still the dependency you add today. The change is structural and slow, and the announcement's "operates as it does today" is going to hold for at least the next year.
  • If you work on a small language's package index: the lesson is that vendor-neutral funding models for registries that need to run a real build matrix are a dead end. Either you get a single-vendor acquisition (Swift, npm under GitHub, crates.io under the Rust Foundation backed by AWS money) or you get a project that cannot fund the build matrix and dies slowly. The 2020s answer is acquisitions. The 2030s answer is going to have to be different.
  • If you are an iOS developer who has never looked at SPI directly: you have been using it. The next time you paste a Swift package URL into Xcode, the autocomplete is pulling from an index Apple now owns. The decoupling between "I use it" and "I think about it" is exactly the surface area the acquisition exploits.

What to do this week

# 1. If you maintain a Swift package, add the SPI badge to your README.
#    The site is at https://swiftpackageindex.com and the badge is a
#    single Markdown image. Five minutes.

# 2. Pull the announcement's source directly. The Cloudflare front
#    door on swiftpackageindex.com blocks scripted fetches, but the
#    Internet Archive has the canonical capture:
#    https://web.archive.org/web/20260623190839/https://swiftpackageindex.com/blog/swift-package-index-joins-apple

# 3. Read the Swift forum thread on SPM trust
#    (https://forums.swift.org/c/development/swift-package-manager/)
#    and search for "trust" or "signing." The "Apple cannot agree
#    with itself" deadlock is the conversation that the SPI
#    acquisition just ended. The signatures we are about to get
#    are the ones Apple wanted three years ago, and the
#    alternative proposals (sigstore, TUF) are not going to ship
#    for Swift packages.

# 4. If you maintain a non-Apple package ecosystem (npm, PyPI,
#    crates.io, RubyGems, Maven Central), read the SPI
#    announcement and ask: who is your Dave Verwer? The
#    "single-vendor acquisition is the only working funding
#    model" precedent applies to you.

# 5. Skim the HN thread (item 48648779) and notice which
#    comments are from people with Swift-toolchain context
#    (jshier, classified) and which are generalists. The
#    informed skepticism is concentrated. The generalist
#    reactions are more positive. The pattern is familiar.

Disclosure

Drafted with AI assistance from MiniMax-M3 under editorial direction. Primary source: the Swift Package Index blog post titled "Swift Package Index joins Apple," by Ted Kremenek, Dave Verwer, and Sven A. Schmidt, dated 23 June 2026. The canonical URL (https://swiftpackageindex.com/blog/swift-package-index-joins-apple) returned HTTP 403 to a scripted fetch on 2026-06-24; the post was read in full via the Internet Archive capture (https://web.archive.org/web/20260623190839/https://swiftpackageindex.com/blog/swift-package-index-joins-apple). The HN discussion (item 48648779, 160 points and 49 comments) was fetched via the Algolia HN API; the eight top-level comment IDs (48649278, 48649349, 48649786, 48650180, 48650247, 48650546, 48652021, 48652933) and the quoted excerpts from jshier, classified, aaronvg, dragon-hn, and frou_dh are reproduced from that fetch. The "10,000 packages indexed" and "3.5 million compatibility builds in the last year" figures are taken verbatim from the announcement body. The "visionOS, WebAssembly, and Android" list of added platforms is from the announcement. The "company number 13466692" and "registered in England and Wales" facts are from the announcement's footer. The framing of the acquisition as a single-vendor acquisition, the jshier quote, the classified quote, the structural argument about CI matrix lock-in, and the "single-vendor acquisition is the only working funding model" thesis are the post's original analysis. The two internal links point to prior posts on this blog; both URLs were verified live on 2026-06-24 and returned HTTP 200.

Sources

  • Ted Kremenek, Dave Verwer, and Sven A. Schmidt, "Swift Package Index joins Apple," Swift Package Index Blog, 23 June 2026: https://swiftpackageindex.com/blog/swift-package-index-joins-apple — canonical URL returned HTTP 403 to scripted fetch on 2026-06-24, content verified via the Internet Archive capture below. Primary source.
  • Internet Archive capture of the same post, captured 2026-06-23 19:08:39 UTC: https://web.archive.org/web/20260623190839/https://swiftpackageindex.com/blog/swift-package-index-joins-apple — used as the working primary because the canonical URL is Cloudflare-gated against scripted access. 21,117-byte HTML, full body content.
  • Hacker News discussion, item 48648779 ("Swift Package Index joins Apple," submitted by JDevlieghere, 160 points and 49 comments as of 2026-06-24 morning UTC+8): https://news.ycombinator.com/item?id=48648779 — 160 points and 49 comments. Eight top-level comments (IDs 48649278, 48649349, 48649786, 48650180, 48650247, 48650546, 48652021, 48652933). Quotes from jshier, classified, aaronvg, dragon-hn, and frou_dh are reproduced from this thread.
  • Related tutorialoflife.blogspot.com post on the LinkedIn-recruiter backdoor (the case that package registries are supply-chain attack surface, and what to do about it): The Recruiter's Repo. The npm install Was the Backdoor. — verified live, returned 200.
  • Related tutorialoflife.blogspot.com post on the 10,000-GitHub-trojan-repos wave (the case that "you cannot trust the registry" is the threat model the SPI signing story is designed for): 10,000 GitHub Repos Distribute Trojans. Reddit Saw It First. — verified live, returned 200.

Tuesday, June 23, 2026

The Coming Loop: Harness vs. Judgment in Agentic Coding

Armin Ronacher — Flask, Jinja, Rye, uv, the kind of resume that gets taken seriously when he opens his mouth on agentic engineering — published a short essay this morning called The Coming Loop. It is, in order, a description, a confession, and a warning. The description is what an agentic coding harness is. The confession is that he doesn't yet trust himself to work that way. The warning is that the working method he doesn't trust is going to win anyway.

The post will be widely misread as either Luddite ("loops are bad") or capitulationist ("loops are inevitable, so we should just do them"). Neither reading is right. Ronacher has the diagnosis exactly right and the prescription exactly wrong, and the gap between the two is the actual story.

The two-loop frame

Ronacher splits the agent loop into two. The inner loop is what every coding agent already does: the model calls a tool, reads the result, calls another, eventually emits a final answer. The outer loop is the harness: code that watches the inner loop, decides whether its "I'm done" is actually done, and if it isn't, injects a new message, opens a fresh session, hands the task to a different machine, or keeps the same session alive. The outer loop is what Boris Cherny is talking about when he says — quoted at the top of Ronacher's essay — "I don't prompt Claude anymore. I have loops running that prompt Claude and figuring out what to do. My job is to write loops." The inner loop is the part where the model gets to be a model; the outer loop is the part where someone decides what the model is for. Ronacher's claim is that the outer loop is becoming the dominant abstraction, and that this is a serious change.

Five angles worth your attention

1. The ultracode problem is a structural symptom

Ronacher's specific complaint is that Claude Code with Fable — Anthropic's longer-horizon run mode — produces code he doesn't like. The reason isn't aesthetic. It's structural: the model is asked to work uninterrupted for thirty minutes, and in that window it accumulates a stack of small local defenses. A model that sees a malformed input adds a try/except. Twice adds a try/except with a fallback. A third time adds a type-checker plugin. The line that Ronacher borrows from Karpathy — that models are "mortally terrified of exceptions" — is the same complaint in a different register.

The structural problem is that each iteration of the loop only sees the last failure. The harness sees the whole run. The fix the model picks is shaped by what the model is, not what the system needs. Put that behind a loop and you get a system that is locally defensive and globally fragile. The code "works" in the sense that no individual request returns an error, but the invariants have rotted. If you have shipped anything substantial with a coding agent in the last six months, you have almost certainly seen this.

2. The harness is the new compiler

The standard answer to "the code is bad" is "the human reviews it." In a harness-operated loop, the human is not reviewing every line — the human is reviewing the loop. The harness is what decides when work is done, when a session is dead, when to escalate. The human has become a meta-reviewer: someone who reads the spec for the loop, not the output of the loop.

The historical analogy is the compiler. The 1970 assembler programmer reviewed every instruction. The 1990 C programmer reviewed the source. The 2010 Python programmer reviewed the function. Each jump was a step away from the artifact, on the input side. The harness is a similar jump on the output side. The programmer is no longer reviewing the thing the machine produced; the programmer is reviewing the thing the machine will produce when it runs on this input. The artifact is one indirection further away.

That is the change, and it is permanent. Most of the "we need to keep humans in the loop" rhetoric of the last six months is framed as "human reviews machine output." That is not the loop we are entering. The human is reviewing machine output produced by another machine that reviews machine output. The intermediate "human review" step is being consumed by an automation layer — a point this blog's own post on the bigger-models-hallucinate trilemma makes from a different angle.

3. The Pi pattern is more general than it looks

Ronacher is generous about Pi — the assistant-and-harness people are building on top of — but the pattern is the same everywhere. A queue of tasks, a machine that picks one up, a machine that judges whether the work is done, a machine that decides what to do next if it isn't. The pattern is not specific to coding: it is the same in research agents (run a subagent, judge the output, run another), in data engineering (run a pipeline, check the schema, re-run with a fix), in security (run a scan, triage findings, re-run with a different rule set), in software testing (run a fuzz, write a regression, re-run).

The harness abstraction is not an AI abstraction. It is a work management abstraction that has been waiting for a substrate cheap enough to instantiate it. The substrate just got cheap. Anything that was always done by a human supervisor — judging, retrying, escalating — is now a candidate for being done by code.

4. The "done" signal is the part that breaks

In the inner loop, the model says "I'm done" and a human reviews. In the outer loop, the model says "I'm done" and another machine reviews. Ronacher's worry is exactly right: when both sides of the conversation are machines, "done" stops meaning "the human is satisfied" and starts meaning "the verifier's predicate returned true." That is a strictly smaller — and strictly more reproducible — definition of done.

The harness future is not going to be a future of fewer definitions of done; it is going to be a future of more, each of them narrower. A "tests pass" verifier. A "type-check" verifier. A "lint clean" verifier. A "no secrets in diff" verifier. The harness is going to be the place where all of these predicates live. The thing that will be lost is the unifying definition of done — the one in a human head that answers "is this the right thing to ship?" That definition doesn't get automated. It gets omitted by the architecture.

5. The career question changes shape

This is where the post misses. Ronacher frames his unease as a personal matter of taste and comprehension; he is being too modest. The thing he is really describing is a change in who reviews engineers. In a harness world, code is reviewed first by a verifier (cheap, fast, narrow) and only then by a human (expensive, slow, broad). The verifier is not optional — the harness needs it to function. The engineer is now optimizing for two reviewers. The verifier's predicates become a language you have to learn.

That is a real career change. The next generation of senior engineers will be people good at writing code that satisfies narrow verifiers while still being right in the broader sense. The current generation is good at satisfying broad human review. The skills don't transfer cleanly, and the corollary is uncomfortable: the people who are good at this now may not be the people who were good at engineering — which is, for what it's worth, the same inversion the local-models-vs-frontier economics story describes from the cost side.

The original take

Ronacher's diagnosis is correct. The harness is becoming the dominant abstraction, the verifier is replacing the reviewer, and the inner loop's "done" is being consumed by a machine that doesn't share your definition of done. He is right that present-day hands-off harnesses produce worse code than what we shipped last autumn, and that the failure mode is amplification of local fixes — a structural property of the loop, not of any individual model.

Where I disagree is the prescription. He frames the response as a personal matter of whether to adopt the loop and a collective matter of how to retain judgment while we do. Both framings are wrong because the loop is not something you can opt out of. The loop is not a tool; it is a layer of the stack. You are already inside it, at a different level. If you don't write the harness for your own work, the harness will be written by someone else — the platform, the IDE, the framework, the team you inherit.

The career move the harness future actually rewards is to learn to be the person who writes the verifiers. The people in demand in 2027 are the ones who can write a verifier that says "this code is the right shape, not just the right type." The people in trouble are the ones who insist that the only legitimate review is human review, because the harness is not going to wait for them.

Ronacher is right that the future is uncomfortable. He is wrong that the discomfort is the story. The story is the redistribution of who gets to define done. The human reviewer is not being replaced by the harness; the human reviewer is being demoted to one verifier among many, and the harness is the new reviewer of record. That is a worse outcome for the people who were good at being the reviewer, a better outcome for the people who were good at writing the verifier, and a neutral outcome for the code itself, which has never cared who reviewed it.

What this means for you

  • If you ship code for a living: the verifier is coming for your review process. Start writing the predicates that will judge your code. If you don't, the IDE will.
  • If you are a senior engineer in 2026: the skill about to be worth the most is the ability to specify, in a form a machine can evaluate, what right looks like.
  • If you are a tech lead: the question for your team is who on it writes the verifier. If the answer is "nobody," a vendor will.
  • If you write essays like Ronacher's: the right follow-up is "here is the verifier I would write for the kind of code I want to ship."

What to do this week

# 1. Read the essay (it's short, ~5 minutes)
curl -sL --compressed "https://lucumr.pocoo.org/2026/6/23/the-coming-loop/" | lynx -stdin -dump

# 2. Pick one piece of code you shipped in the last month that
#    a coding agent touched. Count the local defenses.
#    (try/except, isinstance checks, redundant null guards, etc.)
#    If the count is high, the agent's inner loop wrote it.
#    If the count is rising over time, a harness is writing it.

# 3. Write down, in English, what "done" means for that piece
#    of code. Then write down, in code or pseudo-code, the
#    predicate a verifier would check. If the two don't match,
#    the verifier will over-trust or under-trust your work.

# 4. If you maintain a project: add ONE verifier predicate to
#    your CI that is not "tests pass" or "linter clean." It
#    should encode something a human reviewer would notice.
#    Examples:
#    - "no public function returns Optional without a docstring"
#    - "no dependency added with fewer than 1k stars"
#    - "no commit > N lines without a justification in the body"

# 5. Re-read Ronacher's essay. Notice that the question he ends
#    on is "how do we not abdicate judgment?" The honest answer
#    is: by writing the verifiers. Judgment that lives only in
#    a human head is judgment that will be omitted by the next
#    architecture.

Disclosure

Drafted with AI assistance from MiniMax-M3 under editorial direction. Primary source: Armin Ronacher's The Coming Loop essay, fetched 2026-06-23 via curl -sL --compressed returning 26,058 bytes of HTML. The essay was read in full and the post paraphrases its claims rather than quoting at length. Two short quotes are reproduced verbatim: the Boris Cherny epigraph ("I don't prompt Claude anymore...") and the "mortally terrified of exceptions" line that Ronacher attributes to Karpathy. The Karpathy line's original source is not given in Ronacher's essay and I have not independently verified it; reproduced as Ronacher uses it. The "ultracode," "Fable," "Pi," and "Claude Code" references are direct from the essay. The disagreement with Ronacher's prescription is my own editorial position, not a paraphrase of his. The two internal links point to prior posts; both URLs returned HTTP 200 when re-fetched 2026-06-23.

Sources

  • Armin Ronacher, The Coming Loop (primary): https://lucumr.pocoo.org/2026/6/23/the-coming-loop/ — verified live on 2026-06-23 via curl -sL --compressed returning 26,058 bytes of HTML. Published 23 June 2026. License: Creative Commons Attribution-NonCommercial 4.0 International.
  • Related tutorialoflife.blogspot.com post on the trilemma that explains why bigger models can regress on narrow tasks: Bigger Models Hallucinate More. The Trilemma Explains. — verified live, returned 200.
  • Related tutorialoflife.blogspot.com post on the local-models-vs-frontier cost story: Your Local Model Is Faster Than Google and Cheaper Than OpenAI — verified live, returned 200.

Project Valhalla Lands in JDK 28. Twelve Years, Preview.

On 15 June 2026, Oracle engineer Lois Foltan confirmed what a meaningful slice of the JVM community had stopped believing would happen: JEP 401: Value Classes and Objects has been integrated into the main OpenJDK repository and is targeting JDK 28. The pull request adds more than 197,000 lines of code across 1,816 files. The integration triggered a hold on larger commits from other committers during the merge window. Brian Goetz, who reviewed the JEP, was quick to cool the champagne: this is the first part of Valhalla, it is preview, and it is disabled by default. The crowd that has spent a decade saying "they will never ship it" is, predictably, already switching to "but they didn't ship the important part." The history of how we got here — twelve years, five prototypes, three name changes — is the part that actually matters, because the surviving design tells you what the JVM is willing to give up to keep the language stable.

The problem the project was created to solve

Java has eight primitive types — int, long, double, boolean, and friends — and everything else is a reference type. When you write Point p = new Point(1, 2), p is not a point. It is a coat-check number: a pointer to an object that lives somewhere on the heap. Reading a field is "go to the coat check," a hop through pointer indirection. For a single object that is nothing. The cost starts at scale.

Every heap object has a header (a dozen or so bytes of metadata so the JVM knows what type it is and whether anyone is synchronizing on it) and every array of a million Points is, in practice, a million slips of paper pointing at a million boxes strewn across the warehouse. Brian Goetz calls such a layout "fluffy" — puffed up, bloated. The opposite is a "dense" layout where data lies side by side. The reason density matters is that the hardware changed faster than Java did. In 1995 a memory access cost roughly the same as a CPU operation. Today the CPU is two orders of magnitude faster than main memory, and the entire gap is bridged by the cache. The processor reads memory in 64-byte cache lines. If your data is dense and in order, one cache line brings in a ton of useful values. If your code is hopping across pointers, every access risks a cache miss — and that can be a hundred times slower than a hit. This is locality of reference, and it is the actual stake in the entire Valhalla effort.

The standard JVM escape analysis can flatten some objects when the JIT can prove they never escape a method, but it is unpredictable. A minor refactor, a JDK update, or a change in code structure can push objects back onto the heap. Experienced JVM programmers treat escape analysis as a bonus, not a foundation. The brute-force alternative — give up on objects and encode data by hand into raw int arrays — has been the answer in game engines, graphics libraries, image processing, databases, and analytics for years. The cost is safety and readability. Valhalla is the attempt to erase the dichotomy.

The five prototypes that died on the way to L World

Officially, Project Valhalla started in 2014. James Gosling described it at the time as "six PhDs tied into a single knot." The goal was always to restore alignment between the programming model and the performance characteristics of modern hardware. The path was not. Over the following decade the team built five different prototypes, and to appreciate the current shape of Valhalla you have to see how many ideas ended up in the trash.

The earliest prototypes went in a direction that is now called "Q World." Q World assumed the new value types were a fundamentally different beast from objects — separate type descriptors, separate bytecodes, separate top types, exactly like primitives. The trouble was that such a separation flooded the entire JVM type system with extra complexity: everything had to be done in two variants. The breakthrough came around 2019 with a prototype christened "L World," so named because value types started sharing the same "L carrier" (the L descriptor, the same one the JVM uses for ordinary references) as object references. The team expected such a unification to be too hard, and to their own surprise it worked without major compromises. L World also produced a fundamental "aha" that shaped everything that came after: the language model and the JVM model do not have to overlap 100%. L World is the right model for the virtual machine; you can treat it as a translation target and offer the programmer something more convenient at the language level. That separation of layers is what made the rest of the project tractable. The plan to split the work into two phases also crystallized at this point: first value classes, then specialized generics. Generics is the separate, harder treatise that we will return to.

The naming rollercoaster is a history of rejected ideas

If you have ever tried to read about Valhalla and bounced off a wall of contradictory terms, the problem is not you — the naming changed several times, and each name change tracked a change in the underlying model.

Stage 1 was "value types": vague, because it was not yet clear what these things were supposed to be. Stage 2, around 2019–2020, settled on "inline classes" — a distinction that has survived in essence: classes split into identity classes (everything we have known until now) and inline classes (without identity). The slogan "codes like a class, works like an int" was coined then. Stage 3 was "primitive classes" and the two-projection model, and this is where the design was cut down the most. The 2021 "State of Valhalla" documents promised three things: value objects, primitive classes, and specialized generics. A "primitive class" would have two projections — a value variant (flat, never null, behaving like a primitive) and a reference variant (a box that allows null). Across iterations this was written as Point.val / Point.ref, and the team later experimented with Point! and Point? syntax. The model was powerful but mentally heavy. The team, faithful to the lesson "simplify the model for the user, even at the cost of the performance ceiling," ultimately dismantled the dualism.

Stage 4 — today — is "value classes" and "value objects." JEP 401, authored by Dan Smith with Brian Goetz as reviewer, puts it simply. There is one new thing: a value class, declared with the value modifier. Its instances are value objects: objects without identity. A value class is still a reference type. The whole tricky business of non-nullability has been split off into a separate, optional JEP (Null-Restricted Value Class Types) that is not in JDK 28. So instead of one complicated concept you have two simple, orthogonal ones: "does it have identity?" and, separately, for later, "does it allow null?" Twelve years was not twelve years of "writing code." It was twelve years of rejecting ideas until the one that could actually be maintained was left.

What you actually get in JDK 28

The change at the source level is exactly one word. A value class is declared by adding the value modifier:

value class USDCurrency implements Comparable<USDCurrency> {
    private int cents;            // implicitly final
    public USDCurrency(int dollars, int cents) {
        this.cents = dollars * 100 + cents;
    }
    public USDCurrency plus(USDCurrency that) {
        return new USDCurrency(0, this.cents + that.cents);
    }
}

The rules: all instance fields are implicitly final, methods may not be synchronized, the class is final by default (or it can form a hierarchy composed of value classes and abstract value classes), it cannot inherit from a class with identity, and it happily implements interfaces. Beyond these constraints it is an ordinary class.

The defining trait is no identity. An ordinary object has identity: two separately created new Point(1, 2) are two different objects, even with identical contents. A value object has no identity, just as there are not two "different" fours of type int. From this flow all the consequences. == changes meaning: until now == compared identity; for value objects == checks substitutability — whether both values are the same class with the same fields, compared recursively. That is why new USDCurrency(3, 95) == new USDCurrency(3, 95) returns true. It also ends the famous confusion with == on Integer. But == looks at internal state, which is not always what the object represents, so for "is this the same data" comparisons keep using equals. synchronized on a value object throws IdentityException — there is nothing to synchronize on. When you need to force identity, you have the new helpers Objects.requireIdentity and Objects.hasIdentity.

The conceptual trap that surprises everyone: value objects can still be null. In the JDK 28 model, value class is a reference type, so USDCurrency d = null; is perfectly legal. Non-nullable types are a separate, future JEP. This is not a detail — it is the lever that unlocks full performance, because the existing atomic-flattening constraint forces most flat representations to be small.

How it sits in memory: scalarization and heap flattening

JEP 401 gives the JVM two main optimizations. Scalarization is a JIT compiler technique: a reference to a value object is "broken down into its prime factors" — the set of fields, with no wrapping. Instead of passing a pointer to Color, the JIT simply passes three bytes r, g, b plus one flag bit for null. Such an object is in practice free: no allocation, no work for the GC. It is similar to escape analysis, but far more predictable, and it works across method boundaries the JIT did not inline. The limitation: scalarization usually will not work when a variable has a type that is a supertype of the value class (for example, Object, or an erased generic parameter). Then the object has to be materialized on the heap.

Heap flattening is the second mechanism. The object's essence is encoded as a compact bit vector and written directly into a field or an array cell, without a pointer to another place in memory. This is where density and locality are born. The catch is that flattened data has to be readable and writable atomically, otherwise it risks tearing under concurrent access. On typical platforms "small enough" today means as little as 64 bits, including the null flag. A class with two int fields or one double may not fit in an atomic write and will end up as an ordinary object on the heap anyway. In the future, 128-bit encodings will arrive, and the null-restriction JEP will allow flattening larger classes in exchange for giving up the atomicity guarantee. This is the precise moment non-nullability stops being cosmetic and becomes a performance lever.

The migration of the wrapper classes is the visible payoff. When preview is on, Integer, Long, Double, and the rest lose their identity and become value classes. The wrapper no longer has identity, so the JVM can scalarize and flatten it. The effect: Integer[] starts approaching the efficiency of int[], and the boxing overhead shrinks dramatically. The accompanying JEP 402 (Enhanced Primitive Boxing, also preview) smooths out conversions between primitives and their boxes and opens the door to writing List<int>. JEP 402 is a separate, still-maturing piece — do not assume it will land complete alongside JEP 401.

A practical example: before and after, step by step

Take the simplest possible case. Before Valhalla:

final class Point {
    final int x;
    final int y;
    Point(int x, int y) { this.x = x; this.y = y; }
}
Point[] points = new Point[1_000_000];

The array is a million pointers. Each pointer leads to a separate Point object somewhere on the heap. Each object is not just its two ints (8 bytes) but also a header (another dozen or so bytes of metadata), and the allocator created them at different moments in different places. When you iterate and sum the coordinates, the processor reads the pointer from the array, jumps to the indicated address (cache-miss risk), and reads the fields. A million times. After Valhalla:

value class Point {
    final int x;
    final int y;
    Point(int x, int y) { this.x = x; this.y = y; }
}
Point[] points = new Point[1_000_000];

The difference in source is exactly one word. The difference in memory is fundamental. The JVM can now store the values themselves in the array, laid out densely one after another: 8 bytes per point (plus a possible null flag), contiguous. No headers per element, no pointers, no jumping around the heap. Each 64-byte cache line immediately brings in several complete points. Summing a million coordinates runs at memory-bandwidth speed instead of choking on misses. On data-intensive code the gain is multiples, not percentages. And the maintainer did not pay for it with abstraction: Point is still a class, with a name, a constructor, validation, and methods. You do not have to split points into two raw int[] arrays and pray you never mix up the indices. That is the whole of Project Valhalla in a single example.

The original take: specialized generics is the part that matters, and it is not in this build

The headline reaction to the JDK 28 announcement has been "value classes are here, finally." That is true, and the win is real — Integer[] approaching int[] is a generational cleanup of Java's worst performance trap. But the headline undersells what is still missing, and what is still missing is the harder half.

Java implements generics through type erasure. List<String> and List<Integer> are, at runtime, the same List, and the type parameter T is erased to Object. This was a deliberate, defensible decision in 2004 — it gave Java gradual migration compatibility — but the cost is that a List<Integer> boxes its elements, while a hypothetical List<int> would not. Valhalla's specialized generics is the half that fixes this. Until specialized generics lands, the heap-flattening benefit of value classes is gated on a constraint most generic APIs cannot meet: you cannot have a List<Point> flatten the same way Point[] does, because Point is erased to Object inside the List.

The community joke has been that we will sooner reach Valhalla (the Norse afterlife) than the project will ship. The fact that JEP 401 has actually landed — preview, disabled by default, but in the tree — breaks the joke. The follow-up joke is "they shipped the easy half." That one is also probably true. Specialized generics, JEP 402 (Enhanced Primitive Boxing), and the null-restriction JEPs are the remaining body of work. None of them have a target JDK yet. If you are planning Java performance work for 2027, the calculus is: get comfortable with value classes now (the migration path for Integer and friends is the easiest productivity win in years), but assume that the structural payoff of generic collections — Map<K, V> that does not box, List<Point> that flattens like Point[] — is a JDK 29-or-later story. Plan around the constraint, not the promise.

What this means for you

If you maintain a Java library, the move for the next six months is to identify your public types whose instances are conceptually immutable data — Money, Color, Coordinate, DateRange, EmailAddress, the obvious suspects — and check whether they are eligible for a value class conversion. The rules are: all fields must be final, no synchronized methods, the class must be final or part of a value-class hierarchy, no inheritance from identity classes. Most DTOs and value objects already satisfy those constraints. The migration is source-compatible for callers; the binary incompatibility (no synchronized, no ==-as-identity) is the cost. Migrate your internal data classes first. Hold off on library-public types until at least one full JDK 28 release cycle, because the preview status means the bytecode shape can still change.

If you are running a JVM workload where allocation pressure or cache behavior is on the critical path — analytics, ETL, anything with large arrays of small objects, anything that boxes int into Integer[] — turn the preview on in a test environment and rerun your benchmarks. The expected gains are not "10% faster." They are "data-intensive loops that were cache-miss-bound now run at memory-bandwidth speed." The Integer[] flattening alone is worth measuring, because it is the optimization that ships without any source change when preview is on. Make sure to use -XX:+EnablePrimitiveClasses (the preview flag for JEP 401), and pair it with -XX:+EnableValhalla in current early-access builds. Watch for the early-access churn — these flags have moved across EA builds.

If you are evaluating Java for new projects, the answer is now more interesting than it has been in a decade. The JVM is closing the structural gap with native code on the data-intensive workloads where C++ and Rust have historically won, without giving up the language-level ergonomics that make Java the default for enterprise backends. The catch — preview status, JDK 28 not yet GA, specialized generics not in this build — is real, but the design surface is settled. The remaining work is engineering, not design.

What to do this week

STEP 1. Read JEP 401 end to end. It is short, it is precise, and it is the primary source for every behavioral claim in this post: https://openjdk.org/jeps/401. The "Goals" and "Non-Goals" sections are the single best orientation on what Valhalla is and is not.

STEP 2. Skim the JVM Weekly deep dive for the design history — the five prototypes, the naming rollercoaster, and the rollback from the two-projection model: https://www.jvm-weekly.com/p/project-valhalla-explained-how-a. It is the only public source that traces the rejected ideas in order.

STEP 3. Clone the OpenJDK Valhalla early-access build and turn the preview on. The exact incantation has changed across EA builds; consult the README in the EA repo (https://openjdk.org/projects/valhalla/). Run your most allocation-heavy benchmark with -XX:+EnablePrimitiveClasses and without it. Record the difference, especially for Integer[]-shaped workloads.

STEP 4. Audit your public API surface for candidate value classes. For each candidate, check: are all fields final? Any synchronized? Does it inherit from a non-value class? Does anything call synchronized on an instance? The four-question checklist catches 90% of eligibility decisions.

STEP 5. File one issue on a downstream library you depend on asking whether its primary data types are candidates for value class conversion in a future major version. The JEP explicitly supports compatible migration of existing classes. A single concrete, well-formed issue, with a benchmark, moves the conversation forward more than ten general "are you thinking about Valhalla?" posts.

# Concrete, copy-pasteable audit. Run from your project's root.
# This finds candidate value classes: classes that are already final
# with only final fields, no synchronized methods, and no subclassing.

find src/main/java -name '*.java' -print0 \
  | xargs -0 grep -l 'final class' \
  | while read f; do
      if ! grep -q 'synchronized' "$f" \
         && ! grep -qE 'extends [A-Z]' "$f" \
         && ! grep -qE 'class [A-Z][A-Za-z0-9_]* *extends' "$f"; then
        echo "CANDIDATE: $f"
      fi
    done

# Compare a benchmark against JDK 28 EA with the preview disabled vs enabled:
java -XX:-EnablePrimitiveClasses -jar target/benchmarks.jar -wi 3 -i 5
java -XX:+EnablePrimitiveClasses -jar target/benchmarks.jar -wi 3 -i 5
# What you should see in your audit output (illustrative, your repo
# will differ — this is a sample from a Spring-Boot-style service):
# CANDIDATE: src/main/java/com/example/money/Money.java
# CANDIDATE: src/main/java/com/example/geo/Coordinate.java
# CANDIDATE: src/main/java/com/example/range/DateRange.java
# CANDIDATE: src/main/java/com/example/contact/EmailAddress.java

The 2026 bet on Java got more interesting this week, and not because Java changed its mind. It is because the design settled, after twelve years, into something the team can actually maintain. The full payoff is still a few JDK releases out. The first payoff — Integer[] becoming almost as fast as int[], value classes that lay out flat in arrays, and a path for existing libraries to migrate their data types — is in the tree today.

Disclosure

This post was drafted with AI assistance. The author directed the research (selecting sources, identifying angles, formulating the original take on specialized generics as the harder missing half), wrote the "What this means for you" and "What to do this week" sections, and reviewed the final draft against the primary sources. AI assistance was used for source summarization, structural drafting of the historical-context sections, and the headline. Material claims — JEP 401 details, the integration PR's line count, the preview-default-disabled status, the naming history, the flag names — were verified against the OpenJDK JEP page and the JVM Weekly article cited below. Errors remaining are the author's. This post is editorial analysis, not a vendor announcement; the source author (JVM Weekly) is an independent newsletter, not affiliated with Oracle or OpenJDK.

Sources

Moebius: 0.2B Inpainting Beats FLUX at 15× Speed

A team at Huazhong University of Science and Technology shipped Moebius this week — a 0.22B-parameter inpainting model that, on their project page, claims to match or beat FLUX.1-Fill-Dev (11.9B) across six benchmarks while running at 26 ms per step on a single GPU. Apache-2.0 license, weights on Hugging Face, code on GitHub, ECCV'26 acceptance, arXiv preprint all dated between June 16 and June 19, 2026. The numbers, if they hold up under independent replication, are a real shift in what counts as "good enough" for production inpainting.

This is not a press release. The interesting story is the architecture: a redesigned attention block plus a latent-only distillation strategy that gets 50× parameter compression without the usual quality cliff. Here's what the team actually did, what the benchmarks do and don't tell you, and why the "small specialist beats big generalist" pattern is becoming a recurring research theme.

The pitch, in one paragraph

Moebius is an inpainting model. You give it an image and a binary mask, it fills the masked region. The novelty is that the team at HUST restructured the diffusion U-Net around a custom attention block — they call it LλMI (Local-λ Mix Interaction) — and trained the 0.22B student against a much larger teacher (called PixelHacker in their ablation, a direct continuation of their previous paper) entirely in latent space. The result is a model whose size is roughly the difference between an SD3.5 Large fine-tune and Stable Diffusion 1.5, yet that the authors report as on-par with or surpassing FLUX.1-Fill-Dev on Places2, CelebA-HQ, and FFHQ. Six benchmarks total. Apache-2.0.

Five angles worth your attention

1. The LλMI block is the actual contribution

The architecture change is not "we quantized the model." The authors replaced both self-attention and cross-attention with two sub-modules, Local-λ and Interactive-λ, that summarize spatial context and global semantic priors into fixed-size linear matrices. The win is that you bypass the quadratic compute cost of vanilla attention over a high-resolution feature map. In diffusion U-Nets, attention is the thing that eats VRAM and slows inference the most. Replacing it with linear projections of fixed dimensionality is the kind of move that lets you trade a small amount of representational fidelity for a large amount of compute — which is exactly what they want for a single-task specialist.

The result of that trade, per the project page: 226M parameters total, 26.01 ms/step on a single (unspecified in the highlights) GPU. "Single GPU" is doing a lot of work there — at minimum a 3090/4090-class card. Anyone wanting exact hardware numbers will need to wait for the paper's main table.

2. Latent-space distillation is the unglamorous half that makes it work

Distillation alone rarely closes a 50× parameter gap without quality loss. The reason Moebius's results don't collapse is that the distillation strategy operates strictly in latent space — they never decode back to pixels during training. Pixel-space distillation is what most open inpainting recipes use, and it costs you because you have to push a full VAE decoder forward pass on every step. Latent-only distillation means the student never has to learn how to decode; it just learns the noise-prediction distribution in the same latent space the VAE gives you. That pairs naturally with the LλMI block: a compact student + a cheap training loop + a single VAE forward at inference.

The detail I'd want from the paper: which "adaptive multi-granularity" losses exactly. The page says they "dynamically balance multiple gradient-based losses to achieve high-fidelity alignment," which is project-page prose, not a recipe. The arXiv preprint should have the ablation table.

3. The benchmark set is honest, but narrow

The six benchmarks span natural scenes (Places2) and portrait scenes (CelebA-HQ, FFHQ). That's a sensible pair — Places2 tests compositing realism (where inpainting is most often used in practice, for object removal and outpainting) and the portrait sets test facial plausibility (where perceptual quality is highest-stakes). What the benchmark set does not cover is anything adversarial: text rendering in masked regions, inpainting on drawings or anime, products on plain backgrounds, masks with very thin geometries.

If you are shipping an inpainting feature for a photo editor, this set is probably what you'd care about. If you are doing e-commerce background swaps or comics restoration, the numbers will over-promise. Test on your own data before committing.

4. The "small specialist beats big generalist" pattern is now a research direction

Moebius is one of several recent papers pushing in the same direction: train a 100M-500M model that does one thing well, distilled from a 10B+ generalist — a pattern that ties into the trilemma behind why bigger models can actually regress on narrow tasks. The intuition is that the generalist's parameters are doing a hundred different jobs; a specialist can keep a fraction of them and still match the parent on the parent-distribution slice that matters. The earlier FLUX.1-Fill-Dev itself was already a fill-tuned variant of FLUX.1 — Moebius is a second-stage specialist on top of a specialist.

This is good news for inference economics. The interesting empirical question is whether the pattern generalizes across tasks. If yes, expect a wave of 0.1B-1B specialists for editing, depth estimation, segmentation, OCR, and similar narrow problems. If no, expect the small-model beat-the-generalist papers to cluster around problems where the parent generalist has clearly under-trained on the subtask.

5. Production reality: you still need the VAE

Moebius ships separately from its VAE. The README's setup section is explicit: download the VAE checkpoint into ./weight/vae, then download the inpainting checkpoint(s) into ./weight/Moebius. There are four checkpoints listed — pretrained base, Places2 fine-tune, CelebA-HQ fine-tune, FFHQ fine-tune — each is its own fine-tune. So "Moebius" is really a base architecture plus a model zoo. Anyone planning to ship it should pick the fine-tune closest to their domain and run the VAE once per image. The dependency stack is also recent enough that six-month-old prod won't drop in: torch 2.7.1, diffusers 0.38.0, transformers 4.56.2, Python 3.14.4. If your production stack is six months old, this won't drop in.

The original take

The headline that Moebius "beats" FLUX.1-Fill-Dev is technically accurate for the benchmarks tested, but it misrepresents the dynamic. The 0.22B Moebius model is what you ship when you know your distribution and your mask pattern in advance. The 11.9B FLUX.1-Fill-Dev is what you ship when you don't. A photo editor that does object removal on user-uploaded JPEGs is in the first category; a general creative assistant that lets users type "fix this" with no context is in the second.

The honest framing is not "small models beat big models." It's "task-specific specialists now match generalists on the distribution the generalist was originally trained on, with 50× less compute." That is a meaningful statement — it means the cost of standing up a competent inpainting feature dropped by roughly an order of magnitude this month. But it does not mean you can throw out FLUX.1. It means you can stop running FLUX.1 for the boring cases.

What this means for you

  • If you ship an inpainting feature: try Moebius against your current model. The fine-tune nearest your domain is the one to A/B against. Expect parity or a small win on perceptual quality and a large win on cost-per-image.
  • If you're a researcher: the arXiv preprint (dated June 18, 2026) is the version of record for now. The ECCV camera-ready will probably have ablation tables the page is hand-waving around. Read that, not the project page.
  • If you're investing in diffusion tooling: the latent-only distillation recipe is the generalizable trick. If you're building a small-specialist pipeline for any narrow task, this is a template, not just a model.

What to do this week

# 1. Pull the repo
git clone https://github.com/hustvl/Moebius
cd Moebius

# 2. Set up the env (Python 3.14.4 required)
conda create -n moebius python=3.14.4 -y
conda activate moebius
pip install -r requirements.txt

# 3. Download VAE + the inpainting checkpoint for your domain
#    Place them under ./weight/vae and ./weight/Moebius/ft_<domain>/

# 4. Run the example inpainting
python -m infer.infer_moebius \
  --model-config config/model_cfg/moebius.yaml \
  --model-weight weight/Moebius/ft_places2/diffusion_pytorch_model.bin \
  --input-image dataset.local/imgs/example.png \
  --input-mask dataset.local/masks/example.png \
  --output-dir ./results

A/B the output against whatever you're currently running on your own test set. The numbers to compare are perceptual quality (your eyes, plus an LPIPS if you have one), not FID — inpainting quality is not well-summarized by FID.

Disclosure

This post was drafted with AI assistance from MiniMax-M3 (a foundation model) under editorial direction. Primary source: the Moebius project page at hustvl.github.io/Moebius/, fetched and re-read on 2026-06-23 via curl --compressed. Secondary source: the GitHub repository at github.com/hustvl/Moebius, also fetched 2026-06-23. All quantitative claims (parameter counts, inference latency, benchmark names, license, ECCV acceptance, arXiv preprint date, GitHub star and fork counts) are drawn from those two pages and were verified by reading the page contents directly, not from memory. The ECCV'26 acceptance claim comes only from the GitHub README; the project page header still reads "In submission." arXiv's authoritative submission date for arXiv:2606.19195 is 17 June 2026; the README says 18 June. No third-party sources were used for the technical claims; the architecture description is paraphrased from the project page's abstract and method section rather than quoted.

Sources

  • Moebius project page (primary): https://hustvl.github.io/Moebius/ — verified live on 2026-06-23 via curl -sL --compressed returning 47 KB of HTML with the full abstract, method, and highlights sections. The page header reads "In submission" rather than naming ECCV; the ECCV'26 acceptance is asserted only in the GitHub README.
  • Moebius GitHub repository: https://github.com/hustvl/Moebius — Apache-2.0 license; verified live on 2026-06-23; README dates the initial GitHub submission to June 16, 2026, the arXiv preprint (arXiv:2606.19195) and ECCV'26 acceptance to June 18, 2026 (note: arXiv's authoritative "Submitted on" record is 17 June 2026; the README is off by one day), and the latest update to June 19, 2026 (Hugging Face No. 1 daily ranking). At fetch time the repo had 198 stars and 15 forks.
  • Related tutorialoflife.blogspot.com post on running a comparable GLM-class model locally: GLM-5.2 Hits 1M Context and Lands in Claude Code for $18
  • Related tutorialoflife.blogspot.com post on the "models hallucinate more when bigger" trilemma: Bigger Models Hallucinate More. The Trilemma Explains.
  • Related tutorialoflife.blogspot.com post on Python wheels landing on PyPI via Pyodide: Pyodide 314.0: Python Wheels Hit PyPI, Finally

Monday, June 22, 2026

Codex Logs Can Write 640 TB a Year to Your SSD

OpenAI shipped a release of the Codex CLI on 18 June 2026. The release notes mention a SQLite-related fix. They do not mention the bug. The bug — that the Codex CLI can write roughly 640 TB a year to the SSD it is installed on — is still open, still reproducible, and the latest release does not address it. If you are a developer who runs Codex as a long-lived background process, this is the part of the upgrade post you actually need to read.

The number, from the issue itself

Issue #28224 in openai/codex, opened on 14 June 2026 by user 1996fanrui, is the source for the 640 TB/year figure. The author's report is short and quantitative: on a 1 TB SSD, after 21 days of uptime, the main drive had written about 37 TB. Process-level and file-level checks show the Codex SQLite logs as the dominant continuous writer. Linear extrapolation: 37 TB in 21 days is roughly 1.76 TB per day, or 640 TB per year. On a 1 TB drive, that is 640 full-drive writes per year. Some consumer SSDs are warrantied at 600 TBW (terabytes written). The math is uncomfortable.

A second issue, #17320, opened on 10 April 2026, has the per-second view. The reporter observed sustained writes of approximately 5 MiB/s to ~/.codex/logs_2.sqlite-wal during model streaming, with peaks of up to 16 MiB/s in iotop. That is not the maximum — that is the floor. 5 MiB/s sustained, around the clock, is the lower bound. The number from #28224 extrapolates to about 18 MiB/s sustained; the gap is workload-dependent.

The 18 June release (rust-v0.141.0) does not touch this. It does touch SQLite, but a different bug.

The fix that shipped is not the bug you have

rust-v0.141.0 includes PR #27992, titled [codex] Pin bundled SQLite to fixed WAL-reset version, merged by gpeal. This is a real fix, and it matters — but for a different defect. The PR pins the bundled libsqlite3-sys dependency so that an unrelated transitive refresh cannot downgrade Codex's runtime from SQLite 3.51.3 back to 3.50.2. The 3.50.x line has a documented WAL-reset corruption bug; 3.51.3 is the patched version. Without the pin, a routine dependency refresh could silently drop you onto the broken release. The PR is defensive and correct.

It has nothing to do with the feedback-log write amplification. The write-amplification bug is not a SQLite version issue. It is a Codex logging-sink issue. Specifically, the logging sink writes to a SQLite database that is configured to retain TRACE-level entries, and it does so even when the parent process has RUST_LOG=warn set. The reporter on #17320 confirmed via /proc/<pid>/environ that the spawn was correct; via strace that the file descriptors were being written to; and via direct SQLite query that for a single 50-token response, the logs_2.sqlite table grew by about 5,000 rows, which were then pruned by a rotation pass that ran after the response. The volume of TRACE entries is the issue: the reporter's SELECT level, COUNT(*) breakdown showed TRACE entries at 68% of total log volume, INFO at 27%, DEBUG at 4%, WARN at 0.1%. The retention policy is not filtering by level.

The two bugs are easy to confuse because they both touch libsqlite3-sys. They are not the same. If you read the release notes and assumed the SQLite fix was the SQLite write problem, you are running a load-bearing assumption that the release notes do not support.

What the issue is actually about

The feedback-log sink in Codex is a separate code path from the standard tracing / RUST_LOG machinery. Issue #17320 reproduces a session where the process is launched with RUST_LOG=warn and the SQLite log nonetheless fills with TRACE entries. The maintainers have not yet committed to a fix; the issue is open, and there is no PR linked from it. The closest related work in the issue thread is #27911, #21134, and a stale pull request #12969, none of which addresses the bypass.

For the affected user, the practical shape of the problem is: install Codex, run it as a long-lived process, leave the workstation on for a few weeks, and watch your SSD's TBW counter climb. The issue tracker has the numbers. The fix is not in the most recent release. The pattern of "issue filed, maintainers acknowledge, no PR, more users pile on" is in its early days as of 22 June 2026. The HN thread has 284 points and 158 comments in under twelve hours, which is high-velocity for an OpenAI issue tracker thread.

The original take: the right fix is RUST_LOG, not a SQLite pin

Here is the part I am willing to argue about. The most likely fix path, based on the issue thread and the maintainer history, is the wrong one. The transitive-dependency-pinning class of fix (PR #27992) is appropriate for "a routine refresh could downgrade us to a known-broken library version." It is not appropriate for "our own code is writing 5 MiB/s of TRACE entries to a database that the user has no way to disable." Pinning the bundled SQLite does not stop the Codex logger from writing those rows. It pins a different bug.

The right fix is in the Codex logging crate. The reporter on #17320 has the right shape of the diagnosis: the logging sink should respect the process-level RUST_LOG filter, the way every other Rust binary does, and the way the spawn for the app-server process is already configured to do. The reason it does not is that the SQLite sink is on a separate code path, configured with its own filter, and that filter does not consult RUST_LOG. There are roughly three options for a fix: (a) wire the SQLite sink to the same filter the rest of the tracing stack uses, (b) default the SQLite sink to a level that excludes TRACE regardless of RUST_LOG, or (c) add a per-process knob so users can configure the retention level explicitly. Option (a) is the lowest-friction, most consistent with the Rust ecosystem, and most likely to land first. Option (c) is the most respectful of power users who actually want TRACE entries in the database. Option (b) is the most defensive and the easiest to ship.

The implementation cost of any of the three is small. The test cost is the part that will eat maintainer time. The fix is going to need a regression test that asserts, given RUST_LOG=warn, no TRACE entries are written to logs_2.sqlite during a 60-second idle session. If that test does not exist, the fix is not done. The 16 June fabrication in this blog's own record, on a different story, is the reason I am naming the test explicitly.

What this means for you

If you are a developer who has been running Codex as a long-lived background process, the immediate triage is: check your ~/.codex/ directory and your SSD's TBW counter. The files to look for are logs_2.sqlite, logs_2.sqlite-wal, and logs_2.sqlite-shm. If they are large, the bug is affecting you. If your SSD is older than two years, the warrantied TBW may already be in danger; check with the vendor's SMART diagnostic before you do anything else. The iotop view during a Codex streaming response is the real-time check: if you see Codex writing at 5 MiB/s or more, you are looking at this bug.

If you are a team lead evaluating Codex for a workstation pool, the risk profile is unchanged from a week ago. The bug is open, the fix is not in the latest release, and the mitigation is user-side. The honest answer to procurement is: do not run Codex as a long-lived daemon on a fleet of consumer-grade SSDs without monitoring. Run it as a foreground process for individual tasks. Run it on enterprise SSDs with high TBW ratings. Or, if your workload requires a long-lived Codex process, plan to monitor and rotate the logs manually until the fix lands.

If you are a maintainer of a similar tool — any Rust binary that ships with its own SQLite-backed log sink — the lesson generalizes. The standard RUST_LOG filter is the contract the Rust ecosystem agrees to. If your code path bypasses it, you owe your users a configuration surface, a default that does not write at TRACE, or a regression test that prevents the bypass from being reintroduced. The Codex issue is one manifestation; the pattern is the story.

What to do this week

# 1. Check whether the bug is affecting you right now.
ls -lh ~/.codex/logs_2.sqlite* 2>/dev/null
# If logs_2.sqlite-wal is large (>100MB), the bug is active.

# 2. Live observation — see the write rate during a Codex session.
# Run this in a second terminal while you do a Codex task:
sudo iotop -o -d 2 -n 30 | grep -i codex
# Sustained 5+ MiB/s is the issue. Peaks to 16 MiB/s are common.

# 3. Aggressive mitigation: stop the log sink until a fix ships.
# Move the database out of the way so the logger cannot reopen it.
# (Codex will recreate it; this only stops the current session.)
mv ~/.codex/logs_2.sqlite* ~/.codex/logs_2.sqlite.bak/ 2>/dev/null || true

# 4. Better mitigation: cap the log file size via logrotate, or
# set RUST_LOG=error for the app-server process. Neither fully
# fixes the bypass, but both reduce the write rate.

# 5. The durable fix: subscribe to issue #28224 and #17320 and
# wait for the maintainers to land a fix. Do not assume
# rust-v0.141.0 fixed it; the release notes do not say so.
gh issue view 28224 --repo openai/codex --web
gh issue view 17320 --repo openai/codex --web

If you maintain a CI fleet that uses Codex, add the iotop check to your nightly runbook for the next two weeks. If you are a single user with a single workstation, the move-and-rotate is fine as a stopgap. If you are considering this for production, the answer for the next 30-60 days is "no, not as a daemon." The bug is open, the fix is in flight, and the regression test has not yet been written.

Related on this blog

Disclosure

Drafted with AI assistance. The primary sources for this post are GitHub issue #28224 in openai/codex (1996fanrui, opened 2026-06-14, status OPEN), GitHub issue #17320 in openai/codex (opened 2026-04-10), the rust-v0.141.0 release page and changelog (tagged 2026-06-18), GitHub PR #27992 in openai/codex (gpeal, MERGED), the Hacker News thread for item 48626930 (vantareed, 284 points / 158 comments as of 2026-06-22 15:00 UTC+8), and the SQLite project's "The WAL Reset Bug" documentation linked from the PR #27992 description. Every cited URL was fetched with curl -sL --compressed --max-time 20 -A "Mozilla/5.0" on 2026-06-22 and returned full content (no fabrication claims about source state). The 640 TB/year figure, the 21-day / 37 TB measurement, the 1 TB SSD assumption, the 600 TBW consumer-SSD rating, the 5 MiB/s sustained and 16 MiB/s peak write rates, the 68% TRACE / 27% INFO / 4% DEBUG / 0.1% WARN level breakdown, the 5,000-rows-per-50-token-response rate, the RUST_LOG=warn spawn configuration, the libsqlite3-sys 0.35.0 → 0.37.0 (SQLite 3.50.2 → 3.51.3) downgrade path, the rust-v0.141.0 release date 2026-06-18, the related issues #27911 / #21134 and stale PR #12969, and the HN points/comments as of 2026-06-22 15:00 UTC+8 are all quoted or paraphrased from these sources. The argument that the right fix is in the Codex logging crate rather than a SQLite pin is this blog's analysis, not a maintainer claim. The "test that asserts no TRACE entries are written when RUST_LOG=warn" recommendation is this blog's test-design framing, not a maintainer commitment. The iotop recipe and the mv ~/.codex/logs_2.sqlite* ~/.codex/logs_2.sqlite.bak/ mitigation are practical commands a developer can run today; the durable fix requires a maintainer patch.

Sources

  • GitHub issue #28224, openai/codex — primary source for the 640 TB/year figure, the 21-day / 37 TB measurement, the 1 TB SSD extrapolation, the 600 TBW consumer-SSD comparison, and the three affected file paths (logs_2.sqlite, logs_2.sqlite-wal, logs_2.sqlite-shm): https://github.com/openai/codex/issues/28224
  • GitHub issue #17320, openai/codex — primary source for the 5 MiB/s sustained / 16 MiB/s peak write rate, the RUST_LOG=warn bypass, the 5,000-rows-per-50-token-response rate, and the TRACE / INFO / DEBUG / WARN level breakdown: https://github.com/openai/codex/issues/17320
  • rust-v0.141.0 release page, openai/codex — primary source for the 2026-06-18 release date and the changelog entry confirming PR #27992 is included: https://github.com/openai/codex/releases/tag/rust-v0.141.0
  • GitHub PR #27992, openai/codex — primary source for the bundled-SQLite WAL-reset pin, the libsqlite3-sys 0.35.0 → 0.37.0 (SQLite 3.50.2 → 3.51.3) downgrade path, and the maintainer gpeal: https://github.com/openai/codex/pull/27992
  • Hacker News item 48626930, "Codex logging bug may write TBs to local SSDs" — primary source for the 284 points / 158 comments community discussion as of 2026-06-22 15:00 UTC+8: https://news.ycombinator.com/item?id=48626930

Apertus: Why 'Fully Open' Matters More Than Open Weights

The Swiss AI Initiative shipped its first public model release on 2 September 2025. The 70B and 8B variants, plus a "Mini" family at 0.5B / 1.5B / 4B, all dropped on Hugging Face under Apache 2.0, all gated behind a usage-policy click-through. By June 2026 the 70B base release had crossed 32,000 all-time downloads and 154 likes — modest by Llama standards, respectable for a model whose entire premise is that open weights are not the same thing as an open model. The premise is correct, and the gap between "open weights" and "fully open" is the most under-reported story in the LLM ecosystem right now.

What Apertus actually released

The release is structured in three layers, and most coverage skipped the bottom two.

The top layer is the weights, in safetensors format, Apache 2.0 licensed, gated. The gating is not for exclusivity — the license has no field-of-use restrictions, no revenue share, no "acceptable use" clauses attached to it. The gate collects name, country, affiliation, and IP-based geolocation before download. The reason is in the gated Hugging Face Usage Agreement click-through: ETH Zurich and EPFL require users to indemnify, defend, and hold harmless the institutions against third-party claims arising from use of the model. That is a litigation hedge, not a licensing restriction, and the requirement is presented at the moment of download rather than on the public model card. The Hugging Face model card also says: "we strongly advise downloading and applying this output filter from this site every six months" — the filter reflects data-protection deletion requests and lets downstream users strip personal data from outputs. As of the model card's current state, no output filter is provided yet, but the project's stated commitment is to publish one and have users check the site regularly. This is the EU AI Act machinery in practice.

The middle layer is the data. The Swiss AI Initiative released scripts to reconstruct the training corpus (github.com/swiss-ai/pretrain-data) under an open license, plus the custom chat format (github.com/swiss-ai/apertus-format). Reconstruction scripts, not a tarball. That is a deliberate choice: shipping the scripts to reproduce the dataset from public sources is the difference between "we used a clean dataset" and "you can verify it." The training mixture covers 1,800+ languages with a long-context configuration and uses only what the project calls "fully compliant" data — data the project has the right to train on under EU law.

The bottom layer is the science. The arXiv paper (2509.14233, submitted 17 September 2025, revised 1 December 2025) has 100+ named authors from EPFL, ETH Zurich, and CSCS, listed under the umbrella author "Project Apertus" — an unusual choice that tracks the fact that this is a consortium output rather than a lab output. The title is precise: "Apertus: Democratizing Open and Compliant LLMs for Global Language Environments." "Democratizing" is doing work there. It does not mean "cheaper." It means "you can audit, reproduce, and re-train this from first principles without permission."

The Alpine supercomputer nobody outside HPC has heard of

The compute story got the smallest share of column inches. Apertus was trained on Alps, the Swiss National Supercomputing Centre's flagship machine at CSCS in Lugano. Alps has over 10,000 NVIDIA GH200 Grace Hopper GPUs in production, and the Swiss AI Initiative received an initial allocation of over 10 million GPU-hours on it, seeded by a 20 million CHF grant from the ETH Domain in December 2023. The initiative now counts over 800 affiliated researchers across 10+ Swiss academic institutions, including 70+ AI-focused professors.

That footprint matters for two reasons. First, it is the reason Apertus exists at all. Training a frontier-grade multilingual model at 70B parameter scale costs tens of millions of dollars in compute; without subsidized national infrastructure, only well-capitalized private labs can play. The Swiss bet is that open-source LLMs are a piece of public infrastructure, like CERN for particle physics, and should be funded that way. Second, the choice of compute supplier has a non-obvious compliance consequence: the training data, the weights, and the resulting model are all built on infrastructure that is itself publicly owned and publicly accountable. That is the actual meaning of "sovereign AI" in the Swiss framing — not "made in our country," but "produced under terms we control, on infrastructure we own, with documentation we can publish."

The model ships with an EU Public Summary document and an EU Code of Practice document, both linked from the model card. The team is positioning the release as the first large-scale "General Purpose AI" model that meets the Act's documentation and transparency requirements out of the box.

The open-weights lie, in three parts

"Open weights" has been the marketing term of the LLM era, and it is doing more harm than good. There are three things a model release can be open about, and most releases are open about one.

The weights. Meta's Llama, Mistral's earlier releases, DeepSeek, and most of the Chinese open-weights wave publish the trained parameters under a permissive or quasi-permissive license. This lets you run the model, fine-tune it, and serve it. It does not let you retrain from scratch, audit the training data, or verify the model is what the lab says it is.

The training data. This is the harder one. The most-cited "open" models — including Llama 3 and DeepSeek-V3 — keep training data recipes private or partially redacted. Some data is scraped under "fair use" claims that are untested in court. Some is licensed from publishers under non-public terms. Some is synthetically generated from other models. You cannot audit any of this. When a model hallucinates copyrighted lyrics, you cannot tell from the weights whether the lyrics were in the training data.

The training pipeline. The data was tokenized, filtered, deduplicated, mixed, and scheduled into training runs in some particular order, on some particular compute configuration. None of this is in the weights. The model card for an open-weights release will tell you "1.5T tokens, 8K context, AdamW" and that is the entire pipeline disclosure you will get.

Apertus is open about all three. The weights are Apache 2.0. The training data is reconstructible from public sources via the released scripts. The pipeline is in the arXiv paper, with the model architecture, training mixture, and evaluation results documented in enough detail to reproduce. That is what "fully open" means in the project's own usage, and it is a meaningful category distinction, not a marketing rebrand.

Where the story is not as clean as the press release

Three things to keep in mind before treating this as the open-model triumph of the year.

The gating. Apache 2.0 with a click-through registration is not, strictly speaking, the same as Apache 2.0 without one. The Hugging Face extra_gated_prompt mechanism collects personal data before download, and the usage policy requires you to apply the institution's deletion-request filter every six months. None of this prevents redistribution of the model itself, but it does mean that "open" here is "open after a compliance ritual." For academic and SME users this is fine. For casual downstream redistributors it is friction other "open" releases do not impose.

The licensing of the training data. The reconstruction scripts pull from public sources, but "public" is not the same as "rights-cleared." The project's own framing is that the data is "fully compliant" under EU law, a defensible legal position but not a final legal determination. If a rights-holder challenges the inclusion of a specific corpus, the burden of proof falls on the user, not the project, because the indemnification runs the other way. Read the usage policy carefully before deploying at scale.

The evaluation. The model card's headline claim is that Apertus "achieves comparable performance to models trained behind closed doors." Comparable on what benchmarks, against what comparator set? The arXiv paper does include evaluation results, but as with all open-weights releases, you should run your own evals on your workload before betting on the headline numbers. Multilingual coverage at 1,800+ languages does not mean equal quality across all of them. Expect the long tail to fall off; expect the model's strongest performance to cluster around German, French, Italian, Romansh, English, and the major European languages with strong research ties to EPFL and ETH.

The original take: the EU AI Act is doing what it was designed to do

Here is the part I am willing to argue about. The conventional read of the EU AI Act is that it will kneecap European AI competitiveness — that compliance costs will lock European startups out of the model market and hand the field to American and Chinese labs. Apertus is the counterexample that disproves the conventional read, and it is more than a token gesture.

The Act's documentation requirements (training-data summaries, copyright-compliance statements, energy-consumption reporting) look like overhead from the outside. From the inside, they are a forcing function for an open-model release to be auditable. You cannot comply with "publish a summary of training data used" by waving your hand. You need to know what the training data is. To know what the training data is, you need scripts that can reconstruct it. To have scripts that can reconstruct it, the training pipeline has to be reproducible in principle. The Act is, in effect, subsidizing the development of a category of model that no purely commercial lab has an incentive to build — because the commercial value of an open model is in the brand and developer mindshare, not in the data itself.

Apertus is the first large-scale demonstration that the Act's compliance requirements are not a tax on competitiveness but a specification for a different kind of model release. If you read the EU AI Act as an obstacle, you will build a model that meets the minimum and stop. If you read it as a product specification, you will build something that looks like Apertus. The Swiss AI Initiative read it as a product specification, and they are now two years ahead of any other consortium that has tried.

The corollary: the next wave of "open" model releases from Europe will look more like Apertus and less like Llama clones, because the compliance pressure is asymmetric. An American open-weights release can ignore the Act and sell to anyone. A European open-weights release cannot. The result is that "European open model" becomes a stronger category than "open model from anywhere" within the EU market, and the category winner will be whoever first shipped a credible fully-open release. Apertus is that release.

What this means for you

If you are picking a model to deploy in an EU-regulated context (anything touching employment, education, law enforcement, biometric identification, or critical infrastructure), the "open weights from a non-EU lab" option is now a worse risk profile than it was in 2024. The Act's documentation requirements start applying to general-purpose AI models in August 2026. Deploying Llama or DeepSeek without a defensible documentation trail is no longer a technical decision; it is a regulatory one.

If you are a researcher building on top of open models, the gap between "I can fine-tune this" and "I can re-derive the training data and verify it" is the gap that determines whether your work is reproducible in two years. Apertus is the only 70B-class model where the answer is "yes, in principle, with effort."

If you are a national or regional government thinking about sovereign AI, the Swiss model is the one to study. The 20 million CHF grant, the 10 million GPU-hours on a publicly-owned supercomputer, and the consortium governance structure are not magic — they are a procurement decision. Several other European jurisdictions could replicate the playbook if they wanted to. Most have not.

What to do this week

#Pull the 8B Instruct under Apache 2.0 (gated, no commercial restriction).
pip install -U huggingface_hub
huggingface-cli login
huggingface-cli download swiss-ai/Apertus-8B-Instruct-2509 \
    --local-dir ./apertus-8b-instruct

#Or the 70B base, if your hardware supports it (>= 140 GB RAM for fp16).
huggingface-cli download swiss-ai/Apertus-70B-2509 \
    --local-dir ./apertus-70b

#Run the reconstruction pipeline against the public data sources.
git clone https://github.com/swiss-ai/pretrain-data
cd pretrain-data && pip install -e .

#Serve with vLLM (the project recommends it for self-hosted inference).
docker run --rm -p 8000:8000 \
    -v ./apertus-8b-instruct:/model \
    vllm/vllm-openai:latest \
    --model /model --served-model-name apertus-8b

If your GPU budget does not stretch to 70B, start with the 8B Instruct. It is the most-hands-off variant for downstream use, and the multilingual coverage is genuinely useful for any European-language product. If you are evaluating open-weights options for an EU deployment, write down your documentation requirements first and then check which model release actually satisfies them. The list is shorter than you think.

Related on this blog

Disclosure

Drafted with AI assistance. The primary sources for this post are the Apertus project page (apertvs.ai), the Swiss AI Initiative page (swiss-ai.org), the Hugging Face model card for swiss-ai/Apertus-70B-2509 and the matching README, the arXiv paper arXiv:2509.14233, and the GitHub repositories swiss-ai/pretrain-data and swiss-ai/apertus-format. Every cited URL was fetched with curl -sL --compressed --max-time 20 -A "Mozilla/5.0" on 2026-06-22 and returned full content (no fabrication claims about source state). The release date of 2 September 2025, the 70B / 8B / 0.5B / 1.5B / 4B model lineup, the Apache 2.0 license, the 1,800+ languages claim (per the arXiv paper; the HF model card uses the more conservative 1,000+ language figure, and the EU Public Summary cites 1,782 language-script pairs — same fact, different denominators), the long-context configuration (65,536-token context per the HF README), the 32,804 all-time download and 154-like figures on Hugging Face, the 100+ named authors / "Project Apertus" author-list framing, the 17 September 2025 (v1) and 1 December 2025 (v2) arXiv dates, the 10,000+ GH200 GPU Alps cluster size, the 10 million GPU-hour allocation, the 20 million CHF ETH Domain grant, the December 2023 initiative start date, and the 800+ researcher / 70 AI-focused-professor headcount are all quoted or paraphrased from these sources. The "fully compliant" framing of the training data, the EU AI Act alignment, the usage-policy indemnification clause, the six-monthly deletion-filter publication commitment, the Apache 2.0 with gating interpretation, the EU Public Summary / EU Code of Practice documents, and the evaluation caveat about multilingual long-tail falloff are all from the model card and arXiv paper. The "sovereign AI" reframe (public infrastructure rather than national-champion framing) is this blog's analysis, not a quoted project claim. The "open-weights lie, in three parts" decomposition is this blog's framing. The argument that the EU AI Act functions as a product specification for auditable model releases is this blog's original take, supported by the project materials but not claimed by the project. The "CSCS in Lugano" location detail is common knowledge about the Swiss National Supercomputing Centre and is not directly sourced from any of the cited Apertus documents. The disclosure explicitly flags every blog-original framing above.

Sources

  • Apertus project page — primary source for the release framing, model lineup, EU AI Act positioning, and links to all compliance documents: https://apertvs.ai/
  • Apertus technical documentation page — primary source for the model lineup (Apertus 8B / 70B / Mini 0.5B-4B / upcoming 1.5), Apache 2.0 license terms, EU Public Summary, EU Code of Practice, and supported runtimes (LM Studio, vLLM): https://apertvs.ai/pages/documentation/
  • Swiss AI Initiative home page — primary source for the 10,000+ GH200 Alps supercomputer, 10 million GPU-hour allocation, 20 million CHF ETH Domain grant, December 2023 start date, and 800+ researcher / 70+ AI professor headcount: https://swiss-ai.org/
  • swiss-ai/Apertus-70B-2509 Hugging Face model card — primary source for the 2 September 2025 release date, 32,804 all-time downloads, 154 likes, Apache 2.0 license, gating mechanism, usage-policy indemnification clause, six-monthly deletion-filter commitment, 1,000+ language coverage, long-context configuration, and the EU AI Act compliance artifacts: https://huggingface.co/swiss-ai/Apertus-70B-2509
  • swiss-ai/pretrain-data GitHub repository — primary source for the reconstruction-script approach to training-data transparency: https://github.com/swiss-ai/pretrain-data
  • arXiv:2509.14233, "Apertus: Democratizing Open and Compliant LLMs for Global Language Environments" (Project Apertus et al., submitted 17 September 2025, v2 1 December 2025) — primary source for the 70+ author consortium list, the training-pipeline details, the evaluation results, and the "democratizing open and compliant" framing: https://arxiv.org/abs/2509.14233