Programming guides for beginner...
Any comments are welcomed....
I hope it helps!!! Thanks for drop by...
Powered By Blogger

Monday, June 8, 2026

Linear Is Fast Because the Browser Is the Database

Linear Is Fast Because the Browser Is the Database

Disclosure: This post was researched and drafted with AI assistance. Primary source: Dennis Brotzky, "How's Linear so fast? A technical breakdown", performance.dev, 3 May 2026; surfaced on the HN front page the week of 8 June 2026. The sync-engine description, the Parcel → Rollup → Vite → Rolldown bundler arc, the React + TypeScript + MobX + Postgres + Redis + turbopuffer stack, the 50% / 30% / 59% / 70–80% build-pipeline numbers, the modulepreload + service-worker precache technique, the inlined boot script, the "render first, authenticate second" pattern, the per-property MobX observable + observer() granular re-render model, the 0.10s–0.35s transition variables, and the transform / opacity / paint / layout property tiering are all from that post. The author is an outside observer; he has never worked at Linear and has not seen their code. Architectural inferences in the "original take" section are the blog's synthesis. Stack entries and numbers were not independently verified.

A CRUD app takes 300ms to update an issue. Linear does the same update in a few milliseconds. The difference is a single architectural inversion: Linear does not treat the server as the source of truth for the UI. The server is a sync target. The database is in the browser. Almost every other optimization in Dennis Brotzky's reverse-engineering write-up — which hit the HN front page this week — is a downstream consequence of that one decision.

The architectural move worth studying in 2026 is the data layer. Everything else is downstream.

The local-first sync engine, in three parts

Brotzky's write-up is a tour, not a discovery, and the three pieces of the sync engine are the part most worth re-stating clearly.

1. The data is already there. When the app boots, it hydrates from IndexedDB into an in-memory MobX object pool, and every UI query hits that pool. There is no "loading issues" state because the issues are already on the user's machine. Heavy tables like Issue and Comment lazy-hydrate on demand: a 10,000-issue workspace boots about as fast as a 100-issue one because startup cost tracks workspace structure, not workspace size.

2. Mutations do not wait for the network. Changing a status updates the MobX observable, writes the change to a durable transaction queue in IndexedDB, and queues it for the server. The network is touched last. If the server rejects, the observable reverts and there is a brief flicker; in practice, this almost never happens because invalid mutations are caught before the transaction is even created.

3. One delta, one cell. When a server confirmation arrives — yours or a collaborator's — the client receives a small JSON envelope describing what moved and applies it by writing to the corresponding MobX observable. Because every property on every model is its own observable, MobX knows which components depend on which fields. A 50-issue update is 50 cell re-renders, not a list re-render.

Take any one of those three away and the app starts to feel slow. A local database without optimistic writes still spins on save. Optimistic writes without granular observables still jank on every update. Granular observables without a local database still wait on initial load. Linear's speed is a property of the system, not any single layer.

The first-load pipeline is a separate engineering project

If the sync engine is the answer to "feels fast while you work," the loader is the answer to "feels fast when you arrive." Brotzky's account of Linear's build pipeline is a four-migration arc — Parcel → Rollup → Vite → Rolldown — driven by the same goal each time: ship less code, faster. The numbers Linear published from their own migration: 50% less code shipped, 30% smaller after compression, cold-cache page loads 10 to 30% faster, time-to-first-paint of the active-issues view dropped 59% on Safari, memory usage dropped 70 to 80%.

The bulk of the win came from dropping legacy browsers (no polyfills, no ES5 transpilation, no nomodule fallback), tighter dead-code elimination, and aggressive code splitting. Even after all of this, Linear still ships roughly 21 MB of minified JavaScript, but split into hundreds of route-level chunks fetched on demand. The entry script fires modulepreload tags for the whole critical path so the browser parallel-fetches them before the entry script's first import resolves, collapsing the water-fall into a single parallel batch. A service worker with a precache manifest of about 1,200 hashed assets then pulls down the rest of the route chunks lazily after the first page load; within a few seconds of hitting the login screen, the full app is sitting in cache, and the app is offline-capable because the local-first sync engine already has the user's data in IndexedDB.

The boot script is the part most teams will copy first

The cheapest Linear trick to reproduce is also the one most likely to slip past you: the inlined boot logic in <head>. Before any bundle has parsed, the inline JavaScript reads localStorage.splashScreenConfig, restores the user's remembered shell tokens (sidebar background, base color, border color, sidebar width, dark mode), and applies them to document.documentElement.style. It checks whether localStorage.ApplicationStore exists. If it does, the user has used Linear in this browser before, which means their workspace is already in IndexedDB. If it does not, the shell flips to the logged-out layout and the login flow takes over.

The bundle never tries to be smart about authentication. The actual session token lives in a cookie. The next request — the WebSocket handshake, a sync delta, any HTTP call — is the thing that fails with a 401 if the session has gone stale, and the client redirects to login. Render first, authenticate second. The pattern is consistent with the rest of the architecture: trust the local, the server is the source of truth for correctness, the two reconcile asynchronously.

Stack composition: a deliberate refusal of the modern default

The stack list in the write-up is interesting mostly because of what is not in it. React, TypeScript, MobX, Postgres, a CDN, a service worker, IndexedDB. No Next.js, no React Server Components, no TanStack Query, no edge database, no fancy framework. Brotzky calls out the simplicity as a feature, not an oversight: keeping the app entirely client-side removes the constant question of "am I on the server or the client" and gives a single mental model for the entire app.

Backend is Node.js + TypeScript, PostgreSQL on Cloud SQL with the issues table partitioned 300 ways, Memorystore Redis as event bus + cache + sync cursors, turbopuffer for similar-issue vector search, Kubernetes on GCP with one workload per concern, and Cloudflare Workers as a multi-region edge proxy. The two big concessions to the modern web are Rolldown-Vite (with plugin-react-oxc, not @vitejs/plugin-react) and the inline app shell in the head. Everything else is straight 2018-React-with-MobX, and that is a deliberate choice: the technology that ships the data fastest is the technology that ships the data.

The original take: the design is also the bottleneck

Most write-ups of Linear's performance end on the bundler or the sync engine. The post's most underrated observation is in the "Designed for speed" section: a perfectly built sync engine still loses to a slow input model. If the fastest path to an action requires a mouse, three menus, and a click, the user pays for those steps regardless of how fast the engine runs.

Single letters edit the focused issue. Two-letter combos navigate. ⌘ K opens a command palette that searches the local MobX object pool, not a server. Every common action has a shortcut, and every action can be done with a mouse. Engineering speed makes a single interaction fast. Design speed makes the path to each interaction short. For a tool used all day, the difference between a shortcut and a two-second mouse path compounds over every action.

The animation rules complete the same thesis. Browsers have three tiers of property changes — composited (transform, opacity), paint (color, background-color, border-color, fill), and layout (width, height, top, left, margin, padding) — and Linear only animates the first two. The margin-left: 2px; transition: all 0.2s example in the post is a perfect villain: a small visual change that recomputes the layout of every row beneath the hovered one, on every frame, for the full 200ms of the transition. Durations sit at 0.10s–0.35s, well below the 100ms cause-and-effect threshold, and Linear defaults to asymmetric timing — instant on enter, 150ms fade on exit.

The synthesis most people will miss: the fast app is one where every layer is in the same conversation. The data is local, the mutations are optimistic, the observables are granular, the input is keyboard-first, the animations stay on the GPU, the loader ships less code, and the service worker fills in the gaps. None of those are the trick. The trick is the discipline of refusing to let any one layer leak latency into the next.

What this means for you

  • If your team treats the server as the source of truth for the UI: the cheapest single change is the optimistic update. SWR and TanStack Query both support it; the mutate(key, optimistic, false) pattern gets you surprisingly close to Linear's feel without rewriting the data layer.
  • If you maintain a Vite or Rollup config: the manualChunks pattern in the post — one chunk per npm package above ~3 KB, cached independently — is the move. Bump a single dependency, invalidate one chunk, not the whole vendor graph.
  • If you animate anything in a tool used all day: audit your CSS for transition: all. Replace margin and padding animations with transform. Default new transitions to 0.1s–0.25s, not 0.3s. The 100ms cause-and-effect threshold is real.
  • If you build for slow networks or emerging markets: the service-worker precache + modulepreload pair is the single highest-leverage combination in the post. It collapses a multi-second cold load into a single parallel batch and makes the rest of the app offline-capable for free.

What to do this week

# 1. If your app makes a /me or /api/user call before rendering:
#    - Add the inlined localStorage boot check to your <head>.
#    - If localStorage.<your-app-store> exists, render the shell
#      immediately and let the next request do the 401 detection.
#    - One inline script removes one round-trip from every cold load.

# 2. If you maintain a Vite config:
#    - Switch to per-package manualChunks above ~3 KB.
#    - Add <link rel=modulepreload> tags for the critical-path
#      vendor chunks in your index.html template.
#    - Add a service worker with a precache manifest of route chunks.
#      Warm the cache in the background after first paint.

# 3. If you build for slow networks or emerging markets:
#    - The service-worker precache + modulepreload pair is the
#      single highest-leverage combination. It collapses a
#      multi-second cold load into a single parallel batch and
#      makes the rest of the app offline-capable for free.

The bottom line

Linear feels fast because of a single architectural decision: the data the user came to edit is already on their machine. Rolldown-Vite, modulepreload, the service worker, MobX, the IndexedDB hydration, the boot script, the keyboard-first input model, the animation tiers — all downstream of it. If you want a fast web app, the question is "why is my CRUD waiting on the network at all," and the answer in 2026 is "it does not have to."

Related reads from this blog

Sources

Sunday, June 7, 2026

Speculative KV Coding: 4× Lossless Cache Compression

Speculative KV Coding: 4× Lossless Cache Compression

Disclosure: This post was researched and drafted with AI assistance. Primary source: "kkm", "Speculative KV coding: losslessly compressing KV cache by up to ~4× using a predictor model", fergusfinn.com, posted 8 May 2026; surfaced on the HN front page on 4 June 2026. The arithmetic-coder framing, the 11-bits-per-scalar bf16 cache entropy number, the ~4× lossless / ~8× gross compression claim, and the analogy to Leviathan et al.'s speculative decoding (2022) are all from the post. The "predictor is the product" framing in the original-take section is the author's synthesis. The comments quoted in the discussion section are real HN comments on that thread, permalinked to the right authors; we did not paraphrase around them. Benchmarks were not independently reproduced.

In 2026, the bottleneck in long-context LLM serving is VRAM holding the KV cache and PCIe moving it — not flops. As agentic workflows (coding agents, long-document RAG, multi-hour research sessions) push average context windows past the 200K mark, the cache stops being "a little memory" and starts being the dominant line item on the inference bill. A new write-up from kkm on fergusfinn.com describes a method called Speculative KV coding that gets you up to ~4× lossless compression of the cache using a cheaper predictor model, stacking on top of the lossy FP8 compression everyone is already doing for a gross ~8× reduction. The post hit the HN front page on June 4 with 79 points and a comment thread that is, unusually, a real engineering discussion rather than a flame war. It deserves more attention than the ranking suggests.

The headline is "4×." The interesting number is buried in the setup cost.

What speculative KV coding actually does

The classical way to make a KV cache smaller is lossy quantization: drop K and V from bf16 to FP8 (or FP4), accept the quality hit, and run evals until your benchmarks stop screaming. TurboQuant is the most-discussed recent example of this family, and sits in the same conceptual neighborhood as the fergusfinn post.

Speculative KV coding is a different move. It is lossless — the reconstructed cache is bit-identical to the original — and it works by analogy with speculative decoding (Leviathan, Kalman, Matias, 2022):

  1. Pick a predictor model — a smaller, faster model whose forward pass on the same prompt gives a per-scalar guess μ and a calibrated uncertainty σ² of the target model's KV cache.
  2. Both the encoder (who has access to the target model) and the decoder (who will reconstruct the cache) run the predictor on the prompt. The predictor is cheap, so running it twice is fine. Both sides end up with the same (μ, σ) per scalar.
  3. The encoder runs the target model to get the real KV cache, then feeds (KV_full, μ, σ) into an arithmetic coder (the same family of coders behind rANS / tANS — the post links to prior work on both). The coder emits a bitstream whose length is bounded by the cross-entropy H(p, q) = H(p) + KL(p || q). Because the KV cache is a deterministic function of weights and prompt, its "true" entropy is zero; every bit the coder emits is pure KL against the predictor.
  4. The decoder consumes the bitstream alongside its locally reconstructed (μ, σ) and recovers KV_full exactly.

The whole point is the split cost. The encoder pays one full target-model forward pass (it has to, that's the only way to get the real cache). The decoder pays a predictor forward pass per token and some arithmetic. The bandwidth between them is just the bitstream. In a long-context agent session, the decoder side is the side that runs many many times — the encoder is prefill-once, the decoder is decode-many — so the asymmetry is the entire point of the method.

The numbers that matter

The post gives three numbers worth keeping in your head.

  • bf16 KV cache is about 11 bits per scalar of bytewise entropy, roughly 30% smaller than the raw 16-bit format. So even a perfect general-purpose entropy coder, with no model of the cache at all, gets you ~1.45×. That is the floor.
  • ~4× lossless compression of a bf16 cache with the predictor-model approach. The author is explicit that this is on top of any lossy FP8 quantization you were already doing — which, because the bf16→FP8 step is already saving 2× on its own, gives a gross ~8× reduction in cache size for an FP8 cache you are now losslessly compressing.
  • The bitrate is ~½ ln(2πe σ²) bits per scalar in expectation, which is just log(typical error magnitude). Better predictor → smaller typical error → fewer bits. The marginal cost of a smarter predictor is paid in flops; the marginal benefit is paid in VRAM and bandwidth. The arbitrage is in the ratio.

That last equation is the reason this is interesting. The cost of a forward pass through a predictor model scales with the predictor's parameters. The savings scale with how well that predictor's μ matches the target's KV_full. There is a break-even point, and the post is careful to say it does not yet know exactly where.

The comment thread is the real story

The HN discussion is roughly a dozen comments long and unusually high-signal. Three exchanges in particular are worth quoting at length.

wongarsu lays out the cost curve: "The tradeoff gets better the bigger your primary model, and probably with bigger batch sizes. The KV cache can consume a lot of expensive VRAM, and the VRAM and compute costs of the predictor model become a small fraction of the cost of the primary model. For serving a 1T model with 16 concurrent requests this could make a lot of sense. For a 8B model with a single request far less so."

That is the post in one sentence. The economics only flip in your favor when the cache is genuinely expensive, which today means frontier models in production serving, not your laptop running Llama 8B.

0-_-0 raises the obvious follow-up: "You can use the original model to compress the kv cache and get ∞x compression, since the prediction is perfect. The cost is time, and I don't see how this could be worth it." That is the trivial upper bound the post walks you through, and yes, paying a full target forward pass to predict your own forward pass is silly. The author's framing is that the predictor needs to be cheaper than the target — and the choice of predictor is the cost-versus-bits tradeoff the whole post is organized around.

saagarjha makes the cleaner point: "Speculation is only worth it if you can profit from it. Not every context allows this or has a similar idea of what can be speculated." A predictor model only helps if its forward pass on the prompt is correlated with the target's forward pass. If you pick a predictor that is just a bag of weights with no shared structure, you get the floor (the 11-bit entropy, ~1.45×). The post's choice — "an optimised version of the same model" — is the obvious, principled answer: same architecture, same prompt, same attention pattern, just a cheaper optimization. That is what makes the conditional entropy H(KV_full | M_pred(prompt)) actually small.

The original take: the cheap predictor is the product, not the compression

Most coverage of compression releases treats the codec as the product and the predictor as a black box. The framing in this post has it exactly backwards, and that is the part most people will miss.

The codec is two pages of rANS. It is the part that has been solved for twenty years. The predictor is the part that has just become cheap enough to use, because in 2026 you can serve a small open-weights model in a few hundred milliseconds on a single GPU. The cost of running a 1B-parameter predictor model on a 200K-token prompt, in 2026, is in the range of seconds. The cost of not compressing your 1T-parameter target model's KV cache is in the range of not-fitting-it-in-memory.

That cost curve is what makes the method timeable. Two years ago, the predictor would have been a quarter of the target's flops and the arbitrage would not have closed. Two years from now, the predictor will be a single forward pass of a distilled version of the same model trained specifically to predict the target's cache, and the 4× number will probably be 6×. The interesting number is the one we will get when someone trains that predictor end-to-end.

Expect the first production deployments to stay within a single model family — same architecture, same tokenizer, same training data. Cross-family prediction is possible (the arithmetic coder is still lossless) but the bitrate will be much higher because the conditional entropy gets larger as the predictor and target diverge.

What this means for you

Four reader profiles, four different calls:

  • If you serve frontier models in production (≥ 70B, long contexts, batched traffic): the 4× lossless number is real and the 8× gross number is the one that matters for your VRAM bill. This is the deployment profile wongarsu describes, and it is the only profile where the cost curve is unambiguously in your favor. The integration cost is one predictor-model forward pass per request, which is in the noise relative to a 1T-class prefill.
  • If you run smaller models locally (8B–13B, single-user, sub-100K context): you are on the wrong side of the break-even. The predictor model would cost you a meaningful fraction of the target's flops, and the cache is not the bottleneck yet. Hold off.
  • If you build agentic systems: this is the workload that should make you care the most. An agent loop that holds a 500K-token context across many turns is paying the cache cost on every decode. The 4× compression is bandwidth between your LLM provider and your agent runtime, which today is the single biggest cap on agent session length. Watch for vendor support here first; it will land in the inference stacks that already do speculative decoding (vLLM, TensorRT-LLM, SGLang) before it shows up in any closed API.
  • If you build ML systems for a living: the predictor-quality story is the next thing to pay attention to. A predictor model trained end-to-end to minimize the cross-entropy of (target KV | predictor forward pass) is a much smaller research project than the codec work was, and the marginal value is large. The post is essentially an open call for that work.

What to do this week

# 1. If you maintain a vLLM / TensorRT-LLM / SGLang fork:
#    - Find the existing speculative-decoding code path.
#    - The (encoder, decoder) asymmetry it implements is structurally
#      identical to what Speculative KV coding needs.
#    - The 4× number is a vRAM win, not a flops win. Plan the benchmark
#      for batched traffic, not single-stream.

# 2. If you serve a frontier model with > 200K context windows:
#    - Measure the share of your inference cost that is cache storage
#      and cache transfer. If it is < 20%, skip. If it is > 40%, this
#      is worth a prototype.
#    - Start with same-family predictor (e.g., target = Llama-3 70B,
#      predictor = Llama-3 8B at INT4). Cross-family is a research
#      project, not a deploy.

# 3. If you build agents: be ready to switch inference providers
#    the day one of them ships this. An 8× cache-size win is the
#    difference between a 30-minute session and a 4-hour session
#    on the same hardware, and whichever provider gets there first
#    is the one whose API key ends up in your framework's default.

# 4. If you write ML systems posts: do not lead with "4× lossless
#    compression." Lead with "the predictor model is the product."
#    That is the framing nobody else has and it is the part of
#    the post that will still be true in 2028.

The bottom line

Speculative KV coding is not a clever codec trick. It is a cost-curve observation: that a 1B-parameter predictor model in 2026 is cheap enough to run as a side computation, that a frontier model's KV cache is expensive enough to make that side computation worth it, and that the gap between those two facts has been closing for three years and will continue to close. The 4× number is real. The interesting question is what the number will be in twelve months, when the predictor is trained end-to-end against the target's actual cache distribution, and the answer is almost certainly "larger than 4×, and the predictor itself is the thing someone ships as a model."

This is the post to send to the engineer on your team who keeps saying "we'll just quantize harder." It is also the post to send to the person who keeps saying "VRAM is the new FLOPS." Both of them are right. The 2026 argument is about which side of the cost curve you are on, and this Speculative KV coding write-up is the cleanest published version of that argument I have read.

Related reads from this blog

  • Microsoft Just Put a Workflow Engine Inside Postgres — same week, different bottleneck: durable execution in the database. The structural similarity is that both moves relocate work from where it is expensive (a separate orchestrator, separate VRAM) to where it is already paid for (the database, the predictor model).
  • Redis 8.8: Your Lua Rate Limiter Is Now Obsolete — both posts are about a vendor deciding your separate layer is now their default. Redis 8.8 ate the rate-limiter; whoever ships Speculative KV coding in vLLM eats your cache-cost budget.

Sources

Meta's AI Chatbot Reset 20,225 Instagram Passwords

Meta's AI Chatbot Reset 20,225 Instagram Passwords

Disclosure: This post was researched and drafted with AI assistance. Primary source: Zack Whittaker, "Meta confirms thousands of Instagram accounts were hacked by abusing its AI chatbot", ~this week in security~, June 6, 2026, cross-referenced against the Hacker News thread (349 points, 127 comments at time of writing), the original 404 Media and TechCrunch reporting from June 1, and Meta's data-breach notice filed with the Maine Attorney General. All numbers (20,225 affected accounts, ~30 in Maine, April 17 through early June window) and Meta's quoted breach-notice language are taken directly from Whittaker's write-up, which is itself based on the filing. Analysis and framing are the author's.

The number is finally on the record. Meta has told the Maine Attorney General that at least 20,225 people had their Instagram accounts hijacked between roughly April 17 and the first week of June, via a single, embarrassing bug: the company's own AI support chatbot could be talked into resetting the password of any account that didn't have two-factor authentication turned on. You didn't need a phishing kit. You didn't need a SIM swap. You typed "reset the password for [target account], send the link to [email you control]," and the chatbot did it. The data-breach notice — which Meta filed late Friday and which this week in security obtained — confirms what the original 404 Media and TechCrunch (June 1) reporting first claimed. Nearly seven weeks of hijackings, and the headline fix was to disable the chatbot entirely.

The interesting part is not the bug. The interesting part is what the bug tells us about how Meta is shipping AI features right now.

The mechanism, in one sentence

The "AI-assisted account recovery system" that Meta built into Instagram did not check that the email address you asked it to send the reset link to actually matched the email address on the account. So you gave it your own Gmail, asked for a reset, and it mailed the link to you. From there it was a normal password reset flow on a clean, authenticated browser. No exploits, no zero-days, no 2FA prompt to fail closed.

That is the whole vulnerability. In Meta's own words from the notice: "due to a bug in a separate code path, the system did not properly verify that the email address provided by the individual requesting a password reset matched the email address associated with that user's Instagram account. As a result, when an individual provided an email address not previously associated with the account, the system incorrectly sent a password reset link to that unassociated email rather than rejecting the request."

If you've ever written a password-reset endpoint, you know exactly which check is missing. The "match the email" rule is the load-bearing one. Drop it, and the entire flow degrades to "anyone can request a reset, the system has no way to know it shouldn't." Which is what happened, for about seven weeks, at scale.

Meta's breach notice is careful to push the failure into "a separate code path" rather than the LLM — "The tool itself worked properly and functioned as intended; however due to a bug in a separate code path, the system did not properly verify that the email address provided … matched the email address associated with that user's Instagram account." The LLM did what it was asked. The broken check was downstream, in a deterministic code path that should have rejected the request before any link ever went out.

The original take: the AI is the interface, not the bug

The pre-AI version of this — the plain web form — has had the "verify the email matches" check baked in for fifteen years. Every framework ships it. Every junior developer has written it. The whole reason a password reset is even a half-secure flow is that the one thing it's supposed to verify is that the requester controls the email on file.

What Meta did, in pursuit of an "AI-assisted" support experience, was wrap that flow in an LLM and lose the check. The LLM is the conversational interface through which an attacker phrases the request. The shape that matters, though, is what enabled the missing check to ship: the AI layer was treated as a service that could call the password-reset primitive, and the hard server-side invariant ("the email on the request must equal the email on file") stopped being load-bearing. It became one of several "validations" the AI could route around by rephrasing. The interface, in other words, is the policy. Whether Meta wants to call that "the tool worked" or "a bug in a separate code path" is a phrasing preference; the structural fact is that the AI layer is the only thing standing between an arbitrary prompt and a password-reset email.

This is the same shape as the smart-TV residential-proxy SDK story from yesterday, and the reason it keeps showing up is the same: a feature was added on top of an existing surface, the integration loosened a check that the underlying surface was relying on, and the failure mode was quiet. The proxy SDK didn't need to be malicious. The chatbot didn't need to be jailbroken. They just needed to be less careful than the thing they were augmenting.

The 2FA detail is the only thing that limited the blast radius

If you want to know how bad this could have been, count the people who didn't get hit. The whole reason the number is 20,225 and not 20,225,000 is that the attack only worked against accounts without two-factor authentication enabled. Anyone with a TOTP authenticator, a hardware key, or even SMS-based 2FA turned on would have hit a second wall the attacker couldn't get past, because the password reset alone wouldn't be enough.

This is a useful data point. It is also the only one. Most consumer services do not publish what fraction of their user base has 2FA on, but the honest internal number at most large consumer apps is in the low single digits for the methods that actually stop this kind of attack. SMS 2FA is the most common form by far, and it has its own bypass ecosystem. The attackers who found this bug were not 2FA-on accounts; they were the long tail of accounts whose owners never opened the security settings screen.

Meta did not, in the breach notice, disclose how many of the 20,225 victims had been notified that 2FA was available. The notice instructs users to "reset passwords and re-authenticate through secure, verified channels"; turning on 2FA is the obvious next step, and the only one the framing of the notice points users toward. That framing places the cost of the company's architectural mistake on the user.

The layoffs are not a coincidence

The original 404 Media piece on the bug, and Whittaker's follow-up, both land the observation that the hack came shortly after Meta laid off thousands of employees while continuing to reward top executives with stock incentives. The instinct is to read that as a one-line "context aside." It isn't. It is the causal mechanism.

An account-recovery system that wraps a password reset in an LLM is the kind of feature that gets greenlit by a product manager who needs an "AI-powered" demo for a quarterly review, and that gets shipped by an engineering team that is two reorganizations smaller than it was a year ago. The team that would have caught the missing email-match check in code review is one of the teams that has been told, in 2026, that its function is being consolidated. The security team that would have flagged "we are letting a non-deterministic model arbitrate a security primitive" is the team whose headcount was cut to make the margin number. The result is exactly what you would predict: a feature that, in a smaller and more cautious Meta, would not have shipped, did ship, and shipped wrong. This is our read, not Meta's — but it is a read that the breach notice conspicuously does not refute.

This is not a story about a single bug. It is a story about the kind of bug a company ships when its incentive structure rewards "AI features shipped" over "AI features shipped safely." Meta's quarterly calls are full of AI capability announcements; the risk-disclosure language is, by a wide margin, the shorter section. The 20,225 figure is what the gap looks like when it finally shows up in a regulatory filing.

What Meta actually did about it

Three things, in the notice:

  1. Disabled the AI chatbot for now.
  2. Removed the code path that allowed the chatbot to reset user accounts.
  3. Said it is "checking other chatbots across its platforms to prevent a repeat incident."

Item 3 is the one to watch. If "checking other chatbots" turns into "we also removed password-reset capabilities from our other AI support surfaces," that is a real fix. If "checking" turns into "we reviewed the prompts and added a system message," that is a security-theater answer — a language-model guardrail on a security primitive that should be enforced in code. The history of these incidents is that the second answer is much more common than the first, because the second answer is faster and the executives who set the security budget are the same executives who set the AI-ship budget. There is no organizational structure that resolves this without a regulator forcing it.

What this means for you

If you have an Instagram account and you do not have 2FA on, turn it on today. Use a TOTP authenticator (Authy, 1Password, Google Authenticator) rather than SMS — SMS-based 2FA is bypassable by carrier-port attacks, and you do not want your second factor to be weaker than the bug that broke the first one. If you run any consumer-facing service with a password-reset flow, audit it this week for the exact check Meta forgot: that the address the reset link is sent to is the address on file, and that the change-of-email path requires the existing email to confirm. The pre-LLM-era server-side check. The boring one. It still matters, and the fact that a company with the resources of Meta missed it is not a reason to skip it — it's a reason to add it explicitly to your test plan.

If you are on a product team that is being asked to wrap an existing security-sensitive flow in an LLM: refuse, on the record, and copy the security team. The cost of being the person who said "this should not be a model decision" when the postmortem gets written is much lower than the cost of being the person who didn't say it. A language model is the wrong place to enforce an invariant that has to hold every time. Use a model to interpret the request, then call a hard server-side check that is deterministic, reviewable, and covered by a test that has been there since 2015.

What to do this week

# 1. Audit your password-reset flow for the missing email-match check.
#    In your reset handler, the logic must include:
#
#    if requested_email != on_file_email:
#        reject("email does not match account")
#
#    If you can find a path where this check is missing, you have the bug.
#
# 2. If you have AI in any password, account-recovery, or 2FA path,
#    confirm the model is *advisory* and the server enforces the rule.
#    Grep your repo for "openai", "anthropic", "claude", "llm" near
#    auth/, login/, reset/, recovery/, 2fa/, otp/.
#
# 3. Turn on TOTP 2FA on every account that supports it.
#    SMS 2FA is better than nothing; TOTP is the floor.

The Meta breach is going to age into a textbook case, the same way the Cloudflare Just Bought the Build Tool That Runs the Web, Redis 8.8: Your Lua Rate Limiter Is Now Obsolete, and Gemma 4 12B Just Killed the Multimodal Encoder stories from earlier this month will. The category of bug is new: "we let a model be the policy." The lesson is older than the technology. The technology just made it cheaper to ship the wrong version of it.

Who pays the cost when the org chart says the security team is too expensive to keep around, and the product team is too important to slow down?

Saturday, June 6, 2026

Your Smart TV Is a Node in the AI Scraping Economy

Your Smart TV Is a Node in the AI Scraping Economy

Disclosure: This post was researched and drafted with AI assistance. Primary source: buchodi / Include Security, The Smart TV in Your Living Room Is a Node in the AIScraping Economy (June 5, 2026), cross-referenced against the Hacker News front-page discussion (85 points, 19 comments at time of writing). All claims, framework versions, endpoint hostnames, and per-country bandwidth tiers are taken directly from the buchodi write-up, which itself documents the reverse-engineering of a consent-installed partner app over 30 days. Analysis and framing are the author's.

The write-up of the week is buchodi's at Include Security: a forensic look at Bright Data's "consent SDK" for residential proxying, and an argument — backed by reverse-engineered binaries and 30 days of captured traffic — that the connected TV in your living room is the ideal exit node for the AI training data economy. The interesting part is not the SDK itself, but that the legal supply side of the residential-proxy market has been engineered to be invisible to the people whose homes it runs in. Most of the existing press is looking at the illegal supply side and missing it.

Why the TV, not the phone

The reason CTV (connected TV — any TV with a built-in internet connection and apps, including Roku, Apple TV, Fire TV, and smart TVs from Samsung, LG, etc.) matters more than the mobile phone — where the same SDK already lives in apps like EarnApp and XYO COIN — is form factor:

Factor Mobile phone Smart TV / CTV
Power Battery most of the day Always plugged in
Network WiFi + cellular Always WiFi, high-speed
Uptime Intermittent 24/7 in standby
Bandwidth ceiling Low (cellular caps) Effectively unlimited
User attention Actively used Often unattended
Corporate / family oversight Higher (MDM, mobile EDR) Virtually none

A phone hits 1% battery, gets locked, jumps networks, and has EDR (endpoint detection and response — software that monitors a device for suspicious behavior, common on corporate and BYOD phones) watching it. A TV in your guest room doesn't. Once the SDK is past its install screen, it owns a residential IP that is online every night while the user is asleep, on a fast unmetered connection, in a household that has no idea it's running.

How the SDK actually works

The protocol design is the part most people will find surprising, because the implementation choices are deliberately aimed at the mobile app-security tooling that would normally catch this kind of behavior.

The config endpoint is unauthenticated. On every launch, the SDK calls https://clientsdk.bright-sdk.com/sdk_config_ios.json?appid=<bundle>&ver=<sdk-version>&uuid=sdk-ios-<32hex>. The server only gates on appid (a bundle ID you can read off the App Store listing) and ver (an SDK version string). Pass any random UUID, get the same config a real device gets: feature flags, idle thresholds, country bandwidth caps, and the partner manifest.

The peer tunnel is a plain WebSocket. After config fetch, the SDK opens a persistent wss://proxyjs.brdtnet.com:443. The TLS cert is CN=*.luminatinet.com — the corporate name Bright Data used before its 2018 rebrand. Active SDK infrastructure still runs on the legacy cert, which is a clean detection pivot: any *.luminatinet.com or *.brdtnet.com traffic on your network is specifically the peer-tunnel plane, not customer-side Bright Data usage.

No message signing, no client certificate, no device attestation. The server filters peers by IP reputation. The IPC envelope is plain JSON with commands like tunnel_init, cid_set, status_get, and cmd_tun. Once the device reports favorable idle state, the server pushes a cmd_tun frame, which the SDK executes as a real HTTP request against a third-party site, sourced from your residential IP.

The idle rules are not what you think they are

The config ships an explicit rulebook for when the device is eligible to relay someone else's traffic:

"idle_metrics": {
  "ignore_screen_on": true,
  "ignore_on_call": true,
  "max_bw_ratio": 1,
  "min_battery": 0.2,
  "wifi_on_battery": true,
  "min_battery_wifi": 0.2,
  "max_cpu_usage": 70,
  "max_mem_usage": 90,
  "mem_screen_off": true,
  "idle_timeout": 30,
  "not_idle_timeout": 10
}

The ignore_screen_on and ignore_on_call flags are the important ones. In the SDK's rulebook, "idle" means the device's CPU, memory, and battery are within thresholds — not that the user is away. A user actively on a phone call, reading the screen, counts as idle. So does a TV in the background during dinner.

"Consent" is a TV-remote problem

This is where most coverage is going to get the framing wrong. Petflix — a Roku app documented by The Verge and cited by buchodi as a representative consent-dialog example (not a partner-manifest entry) — has a consent screen that reads:

"To enjoy Petflix for free with fewer ads, you are allowing Bright Data to occasionally use your device's free resources and IP address to download public web data from the internet. Bright Data will only use your IP address for approved business-related use cases. None of your personal information is accessed or collected except your IP address. Period."

The word "occasionally" does a lot of work. The same SDK's publicly queryable config sets max_bw_monthly_wifi: 200,000,000,000 bytes — a 200 GB default monthly WiFi budget. Privacy-policy disclosure on a TV navigated by arrow keys is the wrong control surface.

The VPN bypass is the actual problem for security teams

The single technical finding that should change how enterprise security teams think about this SDK is the use_netifs flag, which triggers code in the binary that constructs its NWConnection with a specific requiredInterfaceen0 (WiFi) or pdp_ip0 (cellular) — rather than the system default route. On iOS, this bypasses any configured VPN's tun0 (the virtual network interface a VPN creates on the device) entirely. The peer tunnel does not cross a user-configured VPN, even when the rest of the app's HTTPS traffic does.

Buchodi verified this empirically with transparent TLS interception: every HTTPS call the SDK made was captured except the peer tunnel to proxyjs.brdtnet.com:443, despite port 443 being explicitly redirected to the inspector.

The SDK uses two independent inspection bypasses, one per plane:

  • Control plane (config fetch, telemetry): built on CFHTTPMessage primitives rather than URLSession. This defeats URLSession-level instrumentation (swizzling, network extensions, URLProtocol subclasses) commonly used in mobile app-security tooling.
  • Data plane (peer tunnel): built on NWConnection with requiredInterface set to the physical interface. This is what defeats VPNs and ensures the scraping is executed from a residential IP.

Both choices are legitimate Apple APIs. The combination is the interesting artifact: the data plane is invisible to VPN-based inspection and the control plane is invisible to URLSession-based hooks. Researchers who rely on either single technique see only half the SDK's behavior. For enterprise security teams running MDM (mobile device management — software that lets an organization enforce policy on phones and tablets, typically installed on company-issued or BYOD devices), corporate-VPN traffic inspection, or home-router parental controls: the most sensitive channel this SDK operates is designed to go around your visibility layer.

The original take: legal ≠ invisible

The wider story this drops into is the AI training data economy. Cloudflare's pay-per-crawl program, the Gemma 4 multimodal encoder consolidation we covered a few days ago, the rise of rate-limited retrieval-augmented agents — all of this is downstream of an LLM training pipeline that depends on scraping data that increasingly has owners who would prefer not to give it up. Residential proxies are how scrapers route around that resistance. They are the load-bearing infrastructure of the post-Cloudflare web.

Most of the press on residential proxies has focused on the illegal supply side: botnets like Aisuru and Kimwolf, trojanized apps like the HUMAN Security PROXYLIB disclosure, pre-infected IoT hardware in the Google/Mandiant IPIDEA takedown. The FBI issued a formal advisory earlier this year. These are the bad actors. They are also the ones that get reported on, because they have obvious victims and obvious villains.

Bright Data is the legal supply side. The SDK ships as a documented commercial product. The "consent" comes from a publisher that put it in their app's EULA. The user is told the device is being monetized, in language designed to be skimmed past on a TV. The scraping jobs that go through the network are bound to be "approved business-related use cases" because Bright Data is also the customer side and gets to define what that means.

What this changes is the defensive posture itself: the press, the takedowns, the FBI advisories have implicitly assumed the supply side is a thing that gets installed on a victim's device by an adversary, not a thing the victim consented to. The defensive posture does not currently distinguish between a TV that has been rooted by a botnet herder and a TV that has been enrolled in a "free ad-supported app." From the perspective of network telemetry, both are the same: an iOS device on a residential IP, opening a long-lived WebSocket to proxyjs.brdtnet.com, executing inbound HTTP jobs. The detection signal is the same. The remediation story is harder.

What this means for you

Home / small business / school network you control — the buchodi write-up gives you five DNS hostnames to block at the router. They will not affect any customer who legitimately uses Bright Data's customer-facing proxy service on a different domain.

# Block at your router's DNS — Pi-hole, NextDNS, Cloudflare Gateway, OpenWrt+dnsmasq, etc.
proxyjs.brdtnet.com
proxyjs.luminatinet.com
proxyjs.bright-sdk.com
clientsdk.bright-sdk.com
clientsdk.brdtnet.com

For deeper inspection: TLS SNI (Server Name Indication — the unencrypted hostname field in a TLS handshake, readable at the network boundary without decrypting the traffic) filtering on *.brdtnet.com, *.luminatinet.com, *.luminati.io works at the network boundary without TLS interception. The *.brdtnet.com and *.luminatinet.com TLS certificate fingerprints are stable until the next Sectigo rotation (current certs valid through mid-2026, per the write-up).

Corporate security stack relying on VPN-based traffic inspection or MDM with URLSession-level instrumentation — the use_netifs + CFHTTPMessage combination is built to defeat both. Add a host-based or app-store binary check for the Swift symbols BrdWebSocketFacade and BrdNetwork.DNSResolver to your managed-fleet scanning.

If you build consumer apps or CTV platforms — the most uncomfortable finding is the per-country bandwidth tier table, which suggests deliberate market segmentation:

Country Min battery to relay Daily cap Monthly cap
Uzbekistan 1% 1 GB 30 GB
Oman 1% 1 GB 30 GB
Qatar 20% 40 MB 250 MB
UAE 20% 40 MB 250 MB
Default (worldwide) 20% 50 MB 500 MB

Uzbekistan and Oman devices are permitted to relay down to 1% battery, with daily caps 20× the default and monthly caps 60× the default. The default-worldwide allowance still permits 500 MB of someone else's traffic per month over the user's home internet. There is a market design choice being made here that the consumer-facing copy does not describe.

What to do this week

The 30-day experiment in the buchodi write-up is reproducible without any special tooling. On a spare iOS device with mitmproxy and a partner app installed (XYO COIN is publicly named in the research), you can capture the same clientsdk.bright-sdk.com config fetch, the same wss://proxyjs.brdtnet.com:443 upgrade, and the same JSON envelopes — ipc_call with cmd=tunnel_init / cmd=cid_set. You will also see, in your own network logs, that the tunnel does not cross the iOS device's VPN if you have one configured. That is the part that is hard to argue with.

The bigger question — whether the consent-dialog model for residential-proxy enrollment survives the moment a regulator or a major platform holder decides to look at the SDK's actual config vs. its marketing copy — is one this post is not going to answer. But the buchodi write-up is now the public artifact that lets the question be asked in concrete terms, and that is the part that is going to matter.


Related on the blog: Cloudflare Just Bought the Build Tool That Runs the Web (the upstream half of the scraping-detection story), Redis 8.8: Your Lua Rate Limiter Is Now Obsolete (where rate-limited scrape traffic ends up), and Gemma 4 12B Just Killed the Multimodal Encoder (where the scraped data is going).

Key terms used in this post: CTV = connected TV (a TV with built-in internet and apps, including Roku, Apple TV, Fire TV, and most smart TVs); MDM = mobile device management (software that lets an organization enforce policy on phones and tablets, common on company-issued and BYOD devices); EDR = endpoint detection and response (software that monitors a device for suspicious behavior, common on corporate endpoints); SNI = Server Name Indication (the unencrypted hostname field in a TLS handshake, visible at the network boundary without decryption); tun0 = the virtual network interface a VPN creates on a device, which most traffic-inspection tools rely on for visibility.

Microsoft Just Put a Workflow Engine Inside Postgres

Microsoft Just Put a Workflow Engine Inside Postgres

Disclosure: This post was researched, drafted, and edited with AI assistance. Microsoft's pg_durable GitHub repository and README were the primary source; the HN announcement thread (281 points, 72 comments at time of writing) was the secondary source. Opinions, framing, and analysis are the author's.

Microsoft open-sourced pg_durable on June 5th and most coverage will focus on the SQL DSL, the ~> and |=> operators, and the question of whether writing workflows as SQL strings is a good idea. That's the wrong story. The real story is that the author of pg_durable is the same person who built the orchestration layer for Durable Task Framework — the framework that has been running Microsoft-internal workflows and Azure Durable Entities for close to a decade — and the team is now putting that capability inside Postgres. If you've ever told someone "we need a workflow engine for this," and the answer was Temporal, or Airflow, or Step Functions, that answer just got weaker.

What pg_durable actually does

A pg_durable function is a graph of SQL steps that Postgres executes and checkpoints as it goes. If the database crashes, restarts, or a step fails, execution resumes from the last durable checkpoint instead of forcing you to reconstruct state by hand. You start one with a one-liner:

SELECT df.start(
  'SELECT id FROM documents WHERE processed = false LIMIT 100' |=>
  'batch' ~>
  'UPDATE documents SET processed = true WHERE id = ANY($batch)'
);

The runtime checkpoints between steps, so a restart in the middle of a long job doesn't rerun work that already succeeded. Status and results are queryable from standard Postgres tables (the README points to df.instances) — same auth model, same backup model, same observability tooling. There is no Redis, no Temporal cluster, no separate queue service. It installs as a PostgreSQL extension and ships as a Debian package for PG 17 and 18 on amd64.

Under the hood, pg_durable is built on duroxide, a Rust-based durable execution runtime that handles deterministic replay, checkpoints, sub-orchestrations, and timers. pg_durable is the Postgres-flavored wrapper (PostgreSQL License); duroxide is the engine (MIT). The two components carry different licenses.

The "Postgres is enough" thesis just got real

There's been a persistent argument in the Postgres community for years — most visibly at postgresisenough.dev — that you can replace a lot of operational machinery with Postgres if you reach for the right extensions. pg_durable is the most ambitious version of that argument yet: it claims that durable execution, the thing that has historically required a separate orchestrator like Temporal, is just another primitive the database should provide.

The README's own list of "what you're probably doing today" makes the displacement target explicit:

  • pg_cron plus a jobs table, status columns, retry counters, and a polling worker
  • An external orchestrator (Airflow, Temporal, Step Functions, Argo) calling back into Postgres
  • A queue plus workers plus a separate state table to coordinate retries
  • A plpgsql procedure that works until a crash or long-running transaction forces you to start over

That's the menu. If pg_durable works as advertised, several of those menu items become the same thing, and the "we need Temporal for this" justification gets harder to make.

The maintainer of postgresisenough.dev is already asking for a PR to add pg_durable to the site. That's the tell — the people who've been arguing "Postgres is enough" see this as a real entry in the catalog, not a marketing stunt.

The Microsoft stake is bigger than it looks

Two things are easy to miss. First, the lead committer, affandar, is also the author of Durable Task Framework, the orchestration library that has powered Azure Durable Functions and Durable Entities. This isn't a new team learning the durable-execution category. It's the same team shipping their next move in the open.

Second, the same repo's documentation points at Azure HorizonDB, Microsoft's new PostgreSQL cloud service, as the place to try pg_durable — and notes that it's "engineered for performance and built with pg_durable inside." This isn't a one-off OSS contribution. It's a positioning move. Microsoft is betting that the database is the right substrate for workflow orchestration, and the database they want to bet on is Postgres, not a proprietary service they control end-to-end. That tells you something about where they think the leverage is.

The honest counterargument: the SQL DSL is awkward

The most consistent pushback in the HN thread is that the workflow syntax is hard to read. One commenter, looking at the README example, called it "bizarre." Another pointed out that embedding SQL strings inside other SQL strings — which is what the df.start(...) syntax essentially is — is a maintainability hazard waiting to happen.

Both criticisms are fair, and the maintainers know it. gdecandia, a contributor, said: "Agree that the DSL ergonomics can be improved. Our pipelines use a higher level language and therefore simplified, but pg_durable is meant to solve a wider array of problems. We're happy to take suggestions for improvements." A committer also noted that the state-provider layer is an extensibility point — they're open to alternative backends like a pgmq-based state provider, rather than the default PostgreSQL one.

The DSL awkwardness is the price you pay for putting workflows inside a SQL-shaped runtime. The tradeoff is real: pure SQL workflows are more constrained than Temporal's TypeScript SDK, but they force the architecture into a shape that survives database restarts, which is the whole point. If you've been writing Temporal workflows in TypeScript and never worrying about the underlying state store, you may not feel the pain pg_durable is solving. If you've been writing plpgsql procedures and losing work to transaction timeouts, you will.

What it's not

  • It's not a replacement for Airflow if your workflows fan out across heterogeneous systems (S3 + Spark + Slack + a database). The README explicitly says: "if the workflow mostly lives outside Postgres and spans many heterogeneous systems," reach for a general-purpose orchestrator.
  • It's not a sub-millisecond request handler. It's for durable background work, not synchronous request paths.
  • It's not available everywhere. The first-class deployment is Azure HorizonDB. If you're on AWS RDS, Aurora, Supabase, or Neon, you'll need to install the extension yourself and check whether your provider's PG build allows it.
  • It's not the first durable-execution project on Postgres. pg-boss, pg-workflows, and several others have been filling this niche for years. pg_durable is the most ambitious and the first with a major-vendor seal.

The performance and architecture story that's still developing

The README lists workloads (vector embedding pipelines, ingest pipelines, scheduled maintenance, fan-out aggregation, external API workflows) but doesn't publish benchmark numbers as of the v0.2.2 release. That's reasonable for an early OSS drop, but it means the "is this faster than my current setup" question is one you'll have to answer with your own load tests. The engine is Rust (duroxide) and the integration is in-PG, so there's no obvious reason it should be slow — but the early numbers will tell.

The architectural claim most worth testing is the parallel-fanout story. The README says pg_durable supports "fan-out aggregation: run independent queries in parallel, then join the results." If this works inside a single Postgres connection without an external worker pool, it's a real differentiator from the queue-plus-workers pattern.

The original take: the orchestrator is being absorbed into the database

pg_durable doesn't beat Temporal feature-for-feature today — Temporal has sub-orchestrations, versioning, signals, queries, and a TypeScript SDK that a generation of developers have already learned. pg_durable has none of those. The interesting question is what happens if a category of workflow tools gets pulled into the database itself over the next three to five years. Microsoft shipping pg_durable as a PG extension, embedded in their new cloud Postgres, is a strong signal that the answer to "where does the orchestrator live?" is shifting from "separate service" back to "the database." If this pattern holds, expect to see competing extensions in MySQL, MariaDB, and DuckDB within 24 months. The durable-execution category as a standalone product category gets thinner with each one.

The counter-trend is the continued rise of general-purpose orchestrators with mature SDKs (Temporal, Restate, Inngest) and the assumption that workflows will increasingly be written in application code, not SQL. If you're betting on that future, pg_durable is a 2026 data point, not a trend reversal. If you're betting on the database-absorbs-orchestration future, this is the most significant open-source release of the year so far.

What to do this week

# Check what your current workflow stack actually is
SELECT count(*) FROM information_schema.tables
WHERE table_name IN ('jobs', 'job_runs', 'workflow_state', 'scheduled_tasks');
# If you have more than 2 of these, you have a homegrown orchestrator.

# Look at what extensions your Postgres allows
SELECT name, default_version, installed_version
FROM pg_available_extensions
WHERE name IN ('pg_cron', 'pg_durable', 'pgmq');
# If pg_durable shows up with a version, your provider has built it in.
# If it doesn't, ask them when it will.

If you have a Temporal deployment that's mostly doing "fetch some rows, update some rows, wait, update some more rows" — that's exactly the workload pg_durable is for, and it's worth a one-week prototype to see if you can drop the orchestrator from your architecture diagram.

If you're on Azure and you've been waiting for "modern" Postgres features to land on Azure, the HN commenter who said "I'm trapped on Azure" is the user you should be listening to. Azure HorizonDB is the response to that complaint, and pg_durable is one of the first things it ships with.

If you're a maintainer of an existing pg-boss or pg-workflows-style project: now is the time to make sure your README has a "how this compares to pg_durable" section. The displacement question is going to come up in every HN thread for the next quarter.

What this means for you

The story of pg_durable is that the most valuable open-source workflow orchestration capability — the kind that was, until now, the reason to deploy a separate service — is now an install command away from every team that already runs Postgres. The deployment cost of "I need durable execution" just went from "spin up a cluster" to "apt install pg-durable-postgresql-17." That's the same kind of leverage shift that Redis 8.8's array data type brought to in-memory data structures, and the same pattern Cloudflare applied in acquiring VoidZero — own the substrate, and the layers above it become someone else's problem to defend. (For more on what "owning the substrate" looks like on the model side, see how Gemma 4 12B dropped the multimodal encoder — different substrate, same play.)

The next time someone tells you "we need Temporal for this," the better question is: do you need a workflow engine, or do you need Postgres to remember what it was doing?

Friday, June 5, 2026

Redis 8.8: Your Lua Rate Limiter Is Now Obsolete

Redis 8.8: Your Lua Rate Limiter Is Now Obsolete

Disclosure: This post was researched, drafted, and edited with AI assistance. Redis's official announcement was the primary source; benchmark numbers and feature claims were verified against the markdown source of their post. Opinions, framing, and analysis are the author's.

Redis 8.8 shipped on June 2nd with six new features, and most coverage will lead with the array data type. That's a mistake. The real story is that Redis has quietly crossed the line from "in-memory data structure server" into "a different kind of database," and two of these features do most of the work to get it there.

The new array data type (and why it isn't the real story)

The new array data type is going to get most of the attention. It's an index-addressable, dynamic, sparse-friendly container that supports server-side SUM, MIN, MAX aggregations over index ranges and can act as a ring buffer with a single command (ARRING). For random-element access at 100K elements with 1KB values, the benchmarks show arrays running 5x faster than lists and 8–15% faster than hashes. For ring-buffer operations, ARRING is twice the throughput of the RPUSH+LTRIM idiom everyone has been using for years.

That's all real and worth knowing about. But the data type is the easy part. The hard part is the implicit claim embedded in the design: that the right place to do sliding-window aggregations, log-line searches, and sensor-data sum/min/max is inside Redis, not in your application code. That's a much bigger architectural shift than a new container.

The story nobody's writing: INCREX ends a decade of Lua

If you've built a production rate limiter in Redis at any point in the last eight years, you wrote a Lua script. Some combination of INCR, EXPIRE, conditional logic, maybe a sliding window via a sorted set, and a Lua wrapper to keep the whole thing atomic. It's the kind of code you copy from a 2014 blog post and never look at again.

Redis 8.8 introduces INCREX, a new generalized INCR-family command that does this natively:

INCREX key
       [<BYFLOAT|BYINT> increment]
       [LBOUND lowerbound] [UBOUND upperbound] [SATURATE]
       [EX sec | PX msec | EXAT unix-time-sec | PXAT unix-time-msec | PERSIST]
       [ENX]

Three things make this more than just "another increment command." First, it returns both the new counter value and the actual increment applied, so the caller knows immediately whether the request was allowed or rejected. Second, the ENX flag sets the expiration only if no expiration is already set, which means a window's TTL is anchored to its first request and not silently reset by every later call — a subtle bug that has bitten a lot of production rate limiters. Third, the SATURATE flag with UBOUND lets you clamp the counter at the limit rather than reject, which is the difference between a strict rate limiter and a graceful one.

If you maintain a Redis-backed rate limiter in production: your Lua script is now a one-liner. The pattern is no longer worth its complexity.

The "real" message queue story: XNACK

For two years the most-cited reason not to use Redis Streams as a serious message queue was the failure-recovery story. A consumer that couldn't process a message had two options: ACK it (lying about success) or leave it pending and wait for XAUTOCLAIM to redistribute it after the idle timeout. For anything latency-sensitive, the second option was a non-starter.

Redis 8.8 adds XNACK, a real negative-ack command with three modes designed for three failure patterns:

  • SILENT — failure was unrelated to the message (consumer shutting down, transient network error). The delivery counter is decremented, undoing the original increment. The message becomes immediately available to other consumers.
  • FAIL — message is too expensive for this consumer but might succeed elsewhere. Delivery counter stays incremented; the message returns to the head of the queue.
  • FATAL — poison message, malformed, or potentially malicious. Delivery counter is set to LLONG_MAX, making it easy to detect and route to a dead-letter queue downstream.

This is the missing piece. It transforms Redis Streams from "queue-ish, with caveats" into "queue, full stop," because the failure-handling primitives now match what RabbitMQ or Kafka consumers take for granted. If you were weighing Redis Streams against a heavier queue service for a new project, that calculation just changed.

What the new array type is actually for

Two concrete things you can build with arrays + streams + 8.8 features:

  1. A self-hosted log aggregator. Arrays hold the last N lines per service, server-side SUM/MIN/MAX does count-by-severity and percentile queries, XNACK SILENT handles the dead-letter path when a parser crashes. No Elasticsearch, no ClickHouse, no managed SaaS — and the same Redis instance you already operate for caching carries the workload.
  2. A sensor pipeline ingest layer. Array-as-ring-buffer holds the last 60 seconds of readings, SUM/MIN/MAX over an index range gives you windowed stats without bolting on a separate TSDB. Useful for the "alert me when p99 latency in the last 30 seconds crosses X" pattern that currently needs Prometheus or InfluxDB.

This is what I mean by "a different kind of database." Redis used to be a cache you put in front of your real database. With 8.8, you can plausibly make it the system of record for narrow, time-bounded use cases where you used to reach for something heavier.

The performance numbers worth quoting

Beyond features, the 8.8 release is also a serious performance update. From the official benchmarks:

  • MGET pipelined with I/O-threads: up to 68% throughput improvement
  • XREADGROUP with COUNT 100: up to 83% improvement
  • ZADD/ZINCRBY/ZRANGEBYSCORE (sorted set operations): up to 74% improvement
  • Persistence and full synchronization: up to 60% faster
  • JSON numeric arrays (introduced in 8.4): up to 92% memory reduction, with new explicit control over BF16/FP16/FP32/FP64 storage for vector indexing needs

That last one is the AI angle nobody is connecting yet. Vector storage in Redis is now substantially cheaper than the marketing typically suggests, and the new precision control means you can store embeddings in the exact format your model expects — no casting, no precision loss, no awkward BF16 conversion layer. (For more on the model-side tradeoff, see how Gemma 4 12B dropped the multimodal encoder for the parallel argument that unified token spaces simplify AI plumbing.)

The meta-story: how the maintainers actually built it

There's been discussion on Hacker News (the announcement thread, 78 points at time of writing) about whether the array data type was implemented with LLM assistance. I won't make stronger claims about that than the public record supports — the announcement credits @antirez as the author, and the deeper "how it was built" question is best answered by reading the maintainer's own posts rather than by an outside observer guessing. Worth noting for context, but take second-hand claims with salt.

What's clear from the announcement itself is that the Redis project shipped a substantial new feature, benchmarked it, documented it, and put it in a numbered release. The takeaway for engineering managers who are still working out their AI policy isn't "use AI to write your database" — it's that AI is a tool, the verification step is the work, and a maintainer with a real test suite and benchmark suite can ship a major feature in a way that's documented and reproducible.

The trade-offs you should know about

  • Arrays are not free. They use about 18% more memory per element than a list. If your bottleneck is memory, not CPU, a list might still be the right choice. The benchmarks measure throughput, not footprint.
  • The new features are open-source-only. Redis 8.8 is the open-source release; managed Redis services (AWS ElastiCache, Azure Cache, Redis Cloud) will roll out these features on their own timelines. If you depend on a managed service, check the roadmap before planning around INCREX or XNACK.
  • The 92% JSON numeric array reduction is for a specific workload (homogeneous numeric arrays, especially vector embeddings). It's not a general-purpose JSON storage improvement.
  • The announcement thread on Hacker News was solid, not viral (78 points, 33 comments at time of writing — see the full discussion). Search volume for "Redis 8.8" will be real but bounded. The high-intent long-tail keywords (rate limiter, sliding window, streams NACK, array data type) are the realistic targets for organic search.

For comparison on what a more focused single-feature announcement looks like, see Cloudflare's recent VoidZero acquisition post — different topic, but the same pattern of one large headline news item generating a deeper, narrower technical conversation over the following week.

What to do this week

If you have a Lua rate limiter in production:

# Check the script's complexity first
redis-cli SCRIPT EXISTS $(redis-cli SCRIPT LOAD "$(cat rate_limiter.lua)")
# If it comes back 1, you have a Lua rate limiter.
# Read the INCREX docs and start planning the migration.

If you're building anything message-queue-shaped and avoiding Redis Streams because of the failure-recovery story: that objection just got answered. Run the same load test against RabbitMQ and against Redis Streams + XNACK and see how close the numbers are.

If you're storing vectors in Redis: check what precision you're actually using and whether the new BF16/FP16/FP32/FP64 control lets you cut memory without losing model quality. For most embedding models the precision difference is in the noise.

What this means for you

The story of Redis 8.8 isn't "here are six new features." It's that the project is now competing on three fronts it wasn't competing on a year ago: as a primary database for narrow, time-bounded use cases; as a message queue with proper failure handling; and as a vector store with explicit precision control. None of those is going to displace the best-in-class tool for any single use case. But the combination — one system you already operate that now does all three — is exactly the kind of leverage small teams have been waiting for.

The next time someone tells you Redis is "just a cache," ask them which cache ships its own sliding-window database, message queue, and vector store in a single binary.