A research firm called depthfirst ran an autonomous security agent across FFmpeg's source and came back with 21 zero-days, 8 of them now assigned CVEs, with a total compute bill of roughly $1,000. Anthropic's Mythos scan of the same codebase ran ten times that. FFmpeg is one of the most heavily fuzzed open-source C codebases in the world, and the oldest of depthfirst's bugs has been in the tree since 2003. The number to argue about is not 21, and the comparison to argue about is not $1k versus $10k. The interesting number is the 23-year latency, and the interesting question is what the agent is actually finding that the last twenty years of fuzzing wasn't.
The bug that ships in one RTSP command
The one that makes security people stop what they are doing is a heap buffer overflow in FFmpeg's AV1 RTP depacketizer, in libavformat/rtpdec_av1.c. It is reachable from the network with no flags, no authentication, and no special media setup. A victim runs ffmpeg -i rtsp://attacker/stream — the most ordinary FFmpeg command that exists — and a single 183-byte packet is enough to redirect execution. depthfirst's write-up shows the cursor poisoning step by step: when the depacketizer sees a Temporal Delimiter OBU, the spec says to "ignore and remove" it, and the code skips it but advances the write cursor by the attacker-declared obu_size without allocating any memory for that advance. The next OBU is then written past the end of the heap buffer, into the next AVBuffer struct on the heap, where the free callback lives — at offset 152 from the start of the data buffer. By tuning the math so the overflow hits the function pointer but leaves the refcount intact at 1, the exploit gets a reliable call to a hijacked function pointer on the next buffer release. The post shows the released-build crash with #0 0x00000000deadbeef in ?? (). That is the ceiling of what a memory-corruption bug can offer: a controlled offset, a controlled value, and a controlled trigger.
The path to the bug is also why the post is getting attention on HN. The classes of systems that run ffmpeg -i rtsp://attacker/stream against untrusted or partially-trusted URLs are not obscure: media-ingest pipelines that accept user-supplied stream URLs, surveillance and CCTV gateways pulling RTSP feeds, transcoding services processing remote AV1-over-RTP sources, and a long tail of "convert this link for me" web tools. As HN commenter nemothekid put it: "Wow this is actually pretty serious - I'm even surprised its being published. There are several services where I can imagine this is exploitable today." A heap write primitive against a function pointer, on a network-reachable code path, with a 183-byte proof of concept. That is not a finding the FFmpeg team wants published.
Twenty years of fuzzing, and a 23-year-old bug
Eight of the 21 findings have CVE numbers (CVE-2026-39210 through CVE-2026-39218); the other thirteen are fixed but pending identifiers. The list is, by itself, a tour of the things that have always been wrong with C parsers: missing length checks, signed-to-unsigned wraparounds, integer overflows bypassing bounds checks, a strlen-of-an-empty-string producing SIZE_MAX, a return value of -1 used as an array index, a size - 4 called without verifying size >= 4. Every one is a class of bug fuzzers have been finding in other projects for a decade.
What is interesting is the latency. The SDT (Service Description Table) bug in mpegts.c was introduced in 2003, in the original SDT implementation. The MPEG-4 AAC RTP depacketizer bug in rtpdec_mpeg4.c dates to 2005, a 21-year latency the write-up calls "over two decades." The SDP parser, the TS demuxer, the swscale, and the LATM bugs all date to 2010. The JPEG depacketizer, RTMP SWF hash, and RTSP ANNOUNCE bugs are from 2012, 2012, and 2021. The recent regressions (the VP9 decoder buffer miss in 2025, the AVIF overlay path in 2025, the option parser regression in 2025) show that the project is still introducing memory-safety bugs at a steady rate. Latency here is not a story about ancient code rotting; it is a story about the bug class still being introduced by the same patterns that produced it twenty years ago.
This is where the comparison to Google's Big Sleep and Anthropic's Mythos matters. Both have produced public findings on FFmpeg. depthfirst's claim is not that their agent is "smarter." The claim is that it produces concrete, reproducible PoC inputs at a fraction of the cost — $1k versus the $10k Anthropic is reported to have spent. The agent found the same kinds of bugs the fuzzers were finding, plus the regressions, plus the latent ones, in a single pass with reproducible PoCs across the set. The bet is that the cost-per-finding is the variable the industry needs to move, not the cleverness of the auditor.
The threat model the agent builds
A security agent is not a coding agent with a security hat. A coding agent is interactive: a human gives it a task, it writes code, it stops. A security agent has a narrower objective: find real, exploitable security issues in an existing system, without specific instructions. It starts by threat-modeling the codebase — identifying the exposed parsers and protocol handlers, mapping where attacker-controlled input enters — and then audits the attack surface code directly, following data flow through the components instead of treating the repository as a flat collection of files. The "concrete, reproducible PoC input" framing is what makes the result actionable. The agent does not just point at a line of code and say "this looks suspicious." It builds a 183-byte RTSP packet, sends it at a vulnerable ffmpeg -i rtsp://... invocation, and produces a backtrace that points at the function pointer it just corrupted. A finding without a reproducer is a suggestion. A finding with a reproducer is work for someone, and the amount of work is bounded.
The HN discussion surfaced the obvious pushback. wavemode notes the RCE on its own does not give arbitrary code execution in the presence of ASLR and modern mitigations: "You would need there to be some writable and executable page of memory lying around." fizzynut adds the general complaint about LLM overconfidence. Both are right, and both miss the point. An agent that produces reproducible PoCs against a real, network-reachable invocation is not the same as a "the root cause is simple" prose finding. The pushback reads as: a PoC is not yet an exploit chain. That is true, and the write-up is careful to call the finding a "primitive" rather than a "weaponized RCE."
The original take: latency is the product, not the cost
The $1k-versus-$10k comparison is the headline depthfirst wants. It is also the wrong argument. A 23-year-old bug in a codebase with continuous Google fuzzing for a decade is not a story about how cheaply an LLM can find bugs. It is a story about what those audits are actually doing differently from the fuzzers. Two possibilities, with very different implications.
The first: the agent is finding bugs the fuzzers are not finding, by reading the code instead of throwing inputs at it. The 23-year latency on the SDT bug, the 21-year latency on the AAC RTP depacketizer, the 16-year latency on the SDP control-URI handling, the 16-year latency on the LATM depacketizer — those are not bugs a fuzzer was going to find. Fuzzers excel at code that takes an attacker-controlled buffer and does arithmetic on it. They struggle with code that takes a long-lived attacker-influenced stream and accumulates state across many frames, which is most of what a media demuxer does. If depthfirst's agent is good at stateful parser bug classes that fuzzers have structurally missed, the implication is that the industry has been under-investing in semantic analysis of media parsers for fifteen years.
The second: the agent is finding the same bugs, cheaper. The 2025 regressions in the VP9 decoder, the AVIF overlay path, and the option parser are exactly the kind of bugs a fuzzer would catch quickly. If that is the case, the headline is still correct as an economic story but the strategic one is uninteresting: the supply of bug classes in FFmpeg is essentially infinite, the cost of finding them was always the bottleneck, and a $1k tool is just a $10k tool with cheaper electricity.
The bet worth making is the first one, and the bet worth hedging is the second. The way to tell them apart over the next year is the regression rate: if LLM-driven audits keep finding bugs the previous fuzzer campaigns did not, the field has been structurally under-audited. If they mostly find 2025 regressions at $1k each, the field has been correctly audited and we are just spending less to do it. The depthfirst write-up has too many long-latency bugs to settle the question, but the next 6-12 months of public findings will.
The framing the security industry will reach for is "LLMs help human auditors." That framing is wrong, and the FFmpeg run is the receipt. The agent threat-modeled the codebase, picked its own attack surface, audited the attack-surface code directly, generated its own test inputs, ran them, and produced a backtrace. The human in the loop wrote the prompt and published the write-up. The work the auditor used to do is what the agent did; the work the human auditor now does is reviewing the PoC, deciding which findings are worth a CVE, and writing the disclosure. The economic story is not "auditors are 10x more productive." It is "the auditor's job moved up the stack, and the floor of the new job is reviewing reproducible PoCs, not generating them." A team that could afford to disclose ten FFmpeg-class bugs a year can now find and disclose two hundred. The bottleneck is no longer finding the bug. The bottleneck is fixing the class, which is a C-language problem and a code-review problem and a "stop introducing signed-to-unsigned wraparound" problem. None of those bottlenecks are agent-shaped. The next twenty-one zero-days are already in the tree, in 2003, in 2010, in 2025, waiting to be found by whichever $1k audit run gets to them first.
What this means for you
- If you run
ffmpegon untrusted media, assume the process is hostile. Run it in a sandbox.gVisor, a dedicated VM, or abwrap/Landlock-seccomp profile is the floor. HN commenter jacobgold put it directly: "I can't think of a program more worthy of sandboxing when run with untrusted input than ffmpeg." - If you ship a service that transcodes user-submitted URLs, the
ffmpeg -i rtsp://attacker/streampattern is what you need to defend, not the file-upload path. The interesting threat model in 2026 is the "paste a link and we will transcode it" web tool. The network-reachable code path is the under-defended one. - If you maintain a C parser, the bug class is the same as it was in 2003: missing length checks, signed/unsigned wraparound, return values used as indices, strlen of empty strings,
size - Nwithout verifyingsize >= N. The list is so consistent across the depthfirst findings that it is worth a project-wide audit pattern, not a per-bug one. The next 21 zero-days will be the same shape as the last 21. - If you are a security vendor or CISO, the cost-per-finding is the metric that just moved. The pitch is no longer "we have a research team." The pitch is "we have a research team with a $1k cost-per-CVE and reproducible PoCs for each." The RFP question is now "what is your cost per confirmed, reproducible zero-day in code we care about, and what is your regression rate on re-audit." The question is going to get specific fast.
What to do this week
# 1. Find every place you invoke ffmpeg on a URL or file whose
# source you do not fully control. ffmpeg is also linked
# into VLC, Audacity, OBS, Kodi, HandBrake, Streamlink.
which -a ffmpeg
grep -r "avformat_open_input\|avformat_network_init" \
--include='*.c' --include='*.go' --include='*.rs' \
--include='*.py' --include='*.ts' /srv 2>/dev/null | head -20
# 2. If you maintain a media-ingest pipeline, the defensive
# change is a sandbox boundary, not a ffmpeg upgrade. The
# exploits being published in 2026 reach the function
# pointer, not the integer check; a patch closes the
# specific primitive but not the class. Sandbox the binary.
# Minimum: seccomp + Landlock + non-root user.
# Better: a gVisor runsc container per ingest.
# Best: a firecracker microVM with no network egress.
# 3. If you maintain libavformat, the list of 21 bugs is your
# project-level checklist. Every finding is a "we forgot to
# bounds-check X" pattern; a project-wide audit against
# "every place that subtracts before bounds-checking" and
# "every place that takes a return value as an array index
# without checking for -1" will find more of the same.
# 4. If you evaluate an LLM-driven security product, the
# question to ask is not "what did you find in FFmpeg." The
# question is "what did you find in our codebase that a
# fuzzer campaign would not have found in the same wall-
# clock time, and can you produce a reproducer for each
# one." Reproducer-first is the new bar.
Disclosure
Drafted with AI assistance. Primary source: depthfirst, "21 Zero-Days in FFmpeg," 2 June 2026, https://depthfirst.com/research/21-zero-days-in-ffmpeg. HN thread: https://news.ycombinator.com/item?id=48510046 (53 points, 24 comments at fetch time). The 21 zero-day count, the $1k cost figure, the $10k comparison to Anthropic's Mythos run, the 23-year latency on CVE-2026-39214, the 21-year latency on DFVULN-122, the eight CVE identifiers (CVE-2026-39210 through CVE-2026-39218), and the 183-byte AV1 RTP depacketizer PoC are all from the depthfirst write-up. The internal tracking IDs for the fixed-but-pending-CVE findings (DFVULN-116 through DFVULN-127) are also from the write-up. The Google Big Sleep team and Anthropic Mythos references are also from the write-up; the exact count of 13 vulnerabilities disclosed by Big Sleep is from the write-up, not from a separate Google source I verified. The HN comments quoted — nemothekid on the seriousness of public disclosure, wavemode on ASLR, fizzynut on LLM confidence, jacobgold on sandboxing — are taken from the HN thread as fetched on 13 June 2026. The gVisor / firecracker / Landlock / seccomp recommendations in the "What to do this week" section are the author's defensive recommendations, not from the depthfirst write-up.
Sources
- depthfirst, "21 Zero-Days in FFmpeg," 2 June 2026 — https://depthfirst.com/research/21-zero-days-in-ffmpeg
- HN discussion, item 48510046 — https://news.ycombinator.com/item?id=48510046
- NVD entries for the eight assigned CVEs (not yet indexed at the time of writing; the CVE IDs are from the depthfirst write-up)
- Google Project Zero Big Sleep disclosures on FFmpeg (general) — referenced by depthfirst, not directly cited
- Anthropic Mythos security-audit work (general) — referenced by depthfirst, not directly cited
- gVisor (application kernel for containers) — https://gvisor.dev/
- Firecracker microVM — https://firecracker-microvm.github.io/
Related reads
- AMD's AutoUpdate: The Bug Bounty Says It's Not a Bug — the same "the disclosure process is the product" frame, applied to a vendor patch that did the cheapest defensible thing
- Miasma Worm: Your Settings.json Is a Shell Prompt Now — the supply-chain frame, applied to a worm that turned trusted config files into an attack surface
- Miasma Worm Just Hit Microsoft Azure. The 6/8 Post Was the Trailer. — the disclosure-timeline frame, applied to a worm that exposed the gap between "we have a PSIRT" and "we have a fix"
No comments:
Post a Comment