Xe Iaso's "I hate compilers" hit the front page of Hacker News on 18 June 2026 with 111 points, and the title undersells what is actually a reproducible-build horror story dressed up as a WASM-to-JavaScript engineering writeup. Anubis — the proof-of-work reverse proxy that this blog covered recently as the de facto answer to the LLM-scraper DDoS problem — is moving its challenge logic from SHA-256 to WebAssembly so administrators can swap in custom PoW schemes. The goal is clean: define the check logic once, run the same bytes on both client and server. The reality is that getting the same bytes out of clang twice in a row is the actual hard part.
The lesson generalizes well beyond Anubis — to anyone shipping compiled artifacts (WASM modules, native binaries, LLVM bitcode, kernel modules) from CI and expecting the bytes to be stable.
Angle 1: Why your WebAssembly binary has a different hash on every rebuild
The first demonstration in Xe's post is the reproducible-builds thesis in twenty lines of C++. The example defines __DATE__ and __TIME__ as compiler builtins that stamp the build timestamp into the output, then compiles the same hello.cpp twice in a row. The two outputs differ in the embedded timestamp. Identical source, different bytes — on every run, for a reason no one designing a "reproducible build" would have invented.
Compiler nondeterminism shows up in three places that the Anubis writeup hits in order: embedded timestamps via __DATE__ / __TIME__ (trivial); tooling the compiler shells out to, like Clang silently invoking wasm-opt from $PATH (surprising); and address-sensitive codegen, where pointer values leak into the order of try_table blocks in Clang's exception-handling path (genuinely hard). Xe observed the last one as a 29-byte drift between consecutive builds of the same wasm2js on the same machine with the same flags. Structurally meaningless, byte-for-byte meaningful.
@pertymcpert identified the mechanism in the HN comments: Clang iterating over a DenseMap (a hash-map with non-deterministic iteration order) on some code path when generating try_table blocks; the fix is to swap for a MapVector (preserves insertion order, with some runtime/memory cost). One-line fix in Clang. Until it ships, every WASM binary built from C++ with exception handling will drift on every build.
Angle 2: The tooling supply chain is the actual attack surface
The most operationally alarming finding is the chain clang → wasm-opt → binaryen → wasi-sdk → Clang's bundledwasm2js`. Every one has its own version, schedule, and vendoring story. Thewasm-optXe had on a DGX Spark ARM machine was 108. The version on his x86 workstation, from Homebrew, was 130. The version Clang reaches for depends on$PATH. When the installedwasm-optis too old to understand the WebAssembly Exceptions extension thatwasi-sdk` emits by default, the build fails silently — looks like a Clang bug, is a binaryen version mismatch.
The lesson: the compiler's "implicit dependencies" are not in your lockfile. Nix picks this up — @crvdgc pointed out in the comments that Nix sets the build time to epoch to make hash calculation stable — but most CI pipelines do not. Pinning clang alone is insufficient; pin every binary the compiler can shell out to.
For Anubis — where the WASM binary is the trust anchor for the entire proof-of-work challenge — the compiler's nondeterminism lands as a security boundary. Reproducible builds are the property that lets an independent party re-build your binary, compare hashes, and be confident they got what you shipped. Without it, the "is this WASM actually from the Anubis project?" question becomes unanswerable.
Angle 3: The fallback chain is more honest than most production stacks
The original WASM-based PoW challenge had one failure mode: a client with WebAssembly disabled (privacy settings, browser policy, an old embedded device, Tor Browser) cannot solve the challenge and gets locked out. Xe did not want to exclude those users, so:
- Primary: WASM check, runs on both client and server, fast.
- Fallback when WASM is disabled:
wasm2jsrecompiles the same WASM module into JavaScript at build time. Slower, but it runs on any browser. - Why both artifacts stay byte-equal: the WASM and the JS both encode the same source, so the PoW logic is identical. The browser picks one.
The original-recipe implementation uses wasm2js from the Linux distribution's package manager. That's where the reproducibility problem comes in: Debian's version is too old, Homebrew's produces different output, and the version Clang produces depends on $PATH. Xe's fix is to bundle a copy of wasm2js compiled to WASM with wasi-sdk, and ship it inside the Anubis repo. Single-architecture, single-toolchain, byte-stable (modulo the Clang bugs above).
A generic "WASM is the answer" stack would ship the WASM-only path and add a "supported browsers" list. Xe's stack is "if you can't run WASM, run our slower JS port, and we keep both artifacts under the same reproducibility guarantee." The fallback is part of the product, not a TODO.
Angle 4: This is the second anti-AI-bot arms escalation that depends on toolchain trust
The first escalation was the original Anubis PoW: a SHA-256 challenge that proves the client spent CPU. It works because SHA-256 is in WebCrypto on every browser and the CPU cost is honest. The second escalation moves the challenge itself into a WASM module, giving the server operator control over the PoW scheme — memory-hard, GPU-unfriendly, custom preimage format, all without coordinating with the Anubis core team.
The new attack surface is the WASM module itself. With SHA-256, the trust chain was Anubis project → npm package → your server → browser. With WASM, it is Anubis project → WASM binary built by someone → mirrored to a CDN → loaded by the browser. The honest defense is reproducible builds. Xe's whole post is an open admission that the reproducible-builds half of that defense is missing for the toolchain he is using, plus a working note on the patches he applied to make it so.
Angle 5: The HN thread shows the canonical mistakes
Three top comments identify the three common wrong responses to "this build is non-deterministic":
@charcircuit: byte-identical output is an arbitrary restriction, equivalent programs are equivalent regardless of the build hash, the right defense is signature verification. Cryptographically correct in the narrow sense. Wrong for Xe's use case: Anubis is community-run and the trust model is anyone can rebuild and verify, not trust the single signing key holder.@dyauspitr: LLMs should be trained on and directly output binary. The "skip the compiler" position. The determinism problem goes away when the model is the compiler — except it does not, it just moves.@ComputerGurupushed back on the title as clickbait, noting that compilers literally made the project possible. The right read. Xe hates compilers the way a structural engineer hates gravity: gravity is a real force, and you design around it anyway.
All three replies are partially correct in isolation. None engages with the actual problem: "I need this WASM binary reproducible so downstream operators can verify it."
The original take: the compiler is the supply chain
The honest read of "I hate compilers" is that the modern compiled-artifact supply chain has the same trust properties as a software dependency graph, and most projects are not treating it that way. You pin npm versions. You audit container base images. You run cargo audit or npm audit. You do not, as a rule, audit your clang's implicit wasm-opt dependency.
The reproducible-builds community has been saying this for fifteen years. Debian's reproducible-builds project has been patching individual nondeterminism sources across the archive. Nix, Guix, and Bazel-with-remote-execution each take a swing at the hermetic-build problem. None of them is the default.
Xe's post is, in this reading, a public service announcement that the Anubis team is one of the few projects in the WASM ecosystem taking the question seriously. They ship their own vendored wasm2js, accept the 29-byte Clang-exception-handling drift as a known-unfixed upstream bug, and document the patch trail. That is not "I hate compilers." That is "I have read the source code of my compiler and I am not happy about what I found, but here is the patch."
What this means for you
If you ship a WASM module, native binary, or any compiled artifact that downstream parties verify, ask this week:
- Two consecutive builds on the same machine — same bytes? Run three times,
sha256sumthe outputs. - Two different machines, both pinned — same bytes? Pin
clang, pinwasm-opt, pin everythingclangcan shell out to.strace -f -e execvethe build, read what it invokes. - If a downstream operator runs your build today, do they get the same bytes you got last month? If the answer is no, your signing story is the only thing standing between "trust us" and "trust us, plus our key." Decide before the audit asks.
If you are using Anubis (or any tool that ships a WASM PoW check), ask your vendor whether the WASM module you load is reproducible from a clean checkout. If they cannot answer, the "is this WASM actually from the project?" question is one CDN compromise from being unanswerable.
What to do this week
Pick a compiled artifact you ship and run this three times — same source, fresh build each time, hash the output:
make clean && make my-wasm-module
sha256sum my-wasm-module
make clean && make my-wasm-module
sha256sum my-wasm-module
make clean && make my-wasm-module
sha256sum my-wasm-module
If the three hashes disagree, the artifact is non-reproducible. The usual culprits, in order of frequency: embedded timestamps (__DATE__, __TIME__, build epoch); source paths in debug info (-ffile-prefix-map helps); compiler-shelled-out-to tooling (strace your build); address-sensitive codegen (MapVector vs DenseMap, etc.).
For Nix users the fix is partially built in:
nix-build -A my-wasm-module
nix-build -A my-wasm-module # second build, same hash?
If the two builds disagree and you are not on Nix, the path forward is either Nix (heavy lift, real fix) or a hand-pinned toolchain inside a container with the tool versions frozen in the Dockerfile (lighter lift, recurring maintenance). Xe chose the second path for Anubis. Most projects do not choose either, and ship non-reproducible binaries anyway.
Disclosure
Drafted with AI assistance. Primary source (Xe Iaso's "I hate compilers") and the HN thread (item 48581070) were both retrieved via direct HTTP fetches on 2026-06-18 around 13:30 UTC. All quoted comments are paraphrased, not blockquoted; the compiler-nondeterminism claims (__DATE__ / __TIME__, Clang's silent wasm-opt shell-out, DenseMap vs MapVector for try_table ordering, the 29-byte drift) are sourced from Xe's writeup, with the MapVector mechanism confirmed in the comment by @pertymcpert. The 111-point HN figure is from the Algolia API at the fetch timestamp (live-page counter was 113 at the same moment; the API value is the canonical figure for citation). Xe Iaso is the author of Anubis; weight that into any verification claims about the toolchain.
The compiler is the supply chain. You are not auditing it.
Sources
- Xe Iaso, "I hate compilers" — the primary writeup, with the full reproducible-builds walkthrough (published 2026-06-18, 1665 words): https://xeiaso.net/notes/2026/anubis-wasm-vendor-binary/
- HN discussion, item 48581070, "I hate compilers" (111 points per Algolia API as of 2026-06-18 13:30 UTC fetch; live-page counter was 113 at the same moment): https://news.ycombinator.com/item?id=48581070
- Anubis project, the proof-of-work proxy whose WASM-port this post is about: https://github.com/TecharoHQ/anubis
- Binaryen /
wasm2js, the WebAssembly-to-JavaScript transpiler Xe is vendoring for the deterministic-builds fix: https://github.com/WebAssembly/binaryen wasi-sdk, the WASI-flavored Clang toolchain Xe used to compilewasm2jsto WASM: https://github.com/WebAssembly/wasi-sdk- Related on this blog: "An AI Agent Burned $6,531 on AWS to Scan a Hobby Network Nobody Asked It To" — covers Anubis as the standard answer to LLM-scraper DDoS: https://tutorialoflife.blogspot.com/2026/06/an-ai-agent-burned-6531-on-aws-to-scan.html
- Related on this blog: "Linear Is Fast Because the Browser Is the Database" — different problem, same supply-chain-trust theme: https://tutorialoflife.blogspot.com/2026/06/linear-is-fast-because-browser-is.html
No comments:
Post a Comment