Programming guides for beginner...
Any comments are welcomed....
I hope it helps!!! Thanks for drop by...

Friday, June 19, 2026

10,000 GitHub Repos Distribute Trojans. Reddit Saw It First.

10,000 GitHub Repos Distribute Trojans. Reddit Saw It First.

A solo investigator who goes by the handle "theorchid" published a forensic writeup on 18 June 2026 documenting 10,000 GitHub repositories that distribute Trojan malware. The campaign is not new. A Reddit thread in r/github from February 2025 — sixteen months earlier — describes the same scheme, with the same file layout, and the same "this is the second time I've seen a clone of my repo with a malicious link in the README" complaint. GitHub has had the pattern on its own platform, in plain English, for over a year. The writeup is on Hacker News as item 48583928 (635 points, 144 comments as of 19 June 2026 09:00 UTC+8 via the Algolia API). The numbers that matter are in the article, and the gap between the warning and the response is the story.

The pattern, exactly

Each malicious repository is a clean clone of a real, recently-created public repository. The commits, contributor list, and project description are preserved verbatim. Two to ten times a day, a single automated commit is pushed: it deletes the previous README and re-pushes a new one that is byte-identical except for one change — a link to a ZIP archive, hosted off-platform, added inline to the description. The commit message is "Update README.md" every time. The commit author is the cloned repo's owner, whose credentials have been compromised, or a fresh account that has been added as a contributor.

The ZIP archive contains four files, with names that vary per campaign wave but the structure is stable:

  • Application.cmd or Launcher.cmd — a Windows batch file that runs the executable
  • loader.exe, luajit.exe, or another .exe — the actual payload, typically a LuaJIT-compiled dropper
  • random_name.cso or random_name.txt — an encrypted/encoded blob, opaque to static scanning
  • lua51.dll — the LuaJIT runtime the executable depends on

The trick the malware authors care about: the link in the README looks clean to most scanners. The OrchID investigator submitted the link itself to VirusTotal and got back zero detections. The same investigator submitted the file the link points to and got back multiple hits for a Trojan. The URL-as-delivery-vector is the gap. Anyone clicking the README link gets a clean "this URL is safe" verdict from a scanning service, and the ZIP lands on disk with the executable waiting to run.

This is the same pattern Hexastrike's Maurice Fielenbach documented on 18 April 2026 in a parallel campaign ("Cloned, Loaded, and Stolen: How 109 Fake GitHub Repositories Delivered SmartLoader and StealC") — 109 repos at that point, with the SmartLoader/StealC infostealer chain attached to the LuaJIT runtime. The OrchID writeup, published two months later, found the pattern at 100× the scale and traced it to a much wider set of payload families, not just SmartLoader/StealC. Two independent researchers, two months apart, two orders of magnitude apart in scope, the same scheme.

Why the campaign clones new repositories, not popular ones

The targeting decision is the part that should change how you think about GitHub discovery. The campaign does not clone torvalds/linux, facebook/react, or kubernetes/kubernetes. It clones new repos with no stars, no contributors, and project names that match low-volume long-tail search terms — exactly the population of repositories that Google and Bing surface for searches where the searcher is the only person who has ever made that exact query. The campaign does not need to outcompete react. It needs to outcompete the three other one-week-old projects with similar names.

The "high rank for low-volume terms" strategy is the SEO weaponization. A new repo with a unique name, a stolen commit history, and a clean contributor list is, to a search engine, indistinguishable from a legitimate new repo. The README link to the malware ZIP is, to the search engine, just a link. The user who clicks it is the target — and the user is typically a developer who is early in the search funnel, looking for an off-the-shelf implementation of something they want to build. The malware authors are not trying to phish the open-source-curious. They are trying to phish the developer who Googled "C++ WebSocket client implementation" at 11 PM and clicked the first result that was not a Stack Overflow answer.

This is also why the contributor list and commit history are preserved. When you visit a repository, the first thing you see is "Contributors: 4, Commits: 47." A real-looking contributor graph is the trust signal. The campaign's authors are not building a community — they are building a profile. The bot is doing the same work that a real maintainer does, on a tighter schedule, with the malware payload stapled to the README.

The Reddit thread that flagged it 16 months ago

The pattern is not novel. In February 2025, a Reddit thread in r/github titled "If you're creating new repositories, they are being spoofed to host malware" was posted (linked from the OrchID writeup, "Update 3"). The thread describes the same scheme: a developer's brand-new repo gets cloned, a malicious commit is added, the clone is reachable via the same long-tail search. The thread received comments, the comments received upvotes, GitHub Support was tagged in the thread by multiple commenters, and the campaign continued.

The 16-month gap between the Reddit thread and the OrchID writeup is the substantive part of the story. The pattern is recognizable, has been publicly named, and has been sitting on a platform GitHub actively moderates. The malware authors have not changed tactics. The defenders have not built a detector. The gap is not technical. The gap is organizational.

GitHub's automated abuse detection is good at catching the things it has been trained on: phishing landing pages in repo descriptions, secret-token commits, dependency-confusion attacks. The OrchID campaign slips through because the content of the README is clean — it is the same README as the cloned legitimate repo, plus a single URL. The URL is not on the GitHub platform. The download is not on the GitHub platform. From GitHub's perspective, the repository contains a README, source code, and a commit history. That is what a repository is.

The original take: rate limits are the wrong frame for the defender

The OrchID investigator's tooling is a strong read on the scale of the problem, and also a tell on what the real defender capability is. The investigator worked within the public GitHub API's 5,000 requests-per-hour rate limit, used gharchive.org to filter the event stream down to "repos with 1-24 commits per 24 hours from a non-bot author," and then made targeted API calls. The result: 10,000 matches out of 40,000 candidate repos, which is 25% of the high-frequency-commit population. The investigator is explicit: the script does not cover the long tail. The real number is larger.

GitHub, the investigator notes, does not have a 5,000-requests-per-hour rate limit. GitHub can scan all 500 million repositories, enumerate the URLs in every README, fetch every linked archive, and submit every archive to every antivirus engine. The cost of running that scan once is, in 2026, on the order of a single engineering team-week. The cost of not running that scan is, conservatively, the same 10,000 repos re-pushed every week for the next year.

The investigator is asking, correctly, for someone with direct access to the security team to forward the article. The investigator also acknowledges in "Update 2" that, by the time the writeup went to press, GitHub had begun deleting the repos the script found. The automated sweep is happening. It is happening 16 months after the first public report, and it is happening on a list a single investigator built with a public API key. The right takeaway is that the capability was always there. The decision to deploy it is the news.

What this means for you

If you ship open-source code, the immediate action is short. Pick the most recent repo you created — something from the last six months — and search for it on Google and Bing. If you find a clone with the same name, the same description, and a README that is "your README plus one link," that is the campaign. The link is the giveaway. Do not click it. The fix is the same one you would use for any other malicious clone: report it via the GitHub abuse form, link to the original repo, and explicitly call out the README-link as the vector. The "Update 2" in the OrchID writeup suggests the current response time, once a report is filed, is "weeks, not days." Build that into your timeline.

If you are a developer searching for code to use, the defensive move is to treat the first search-engine result for a niche term as a candidate, not a recommendation. The campaign specifically targets the population of searches where the legitimate answer is low-volume and the searcher is willing to click a result that is "good enough." Check the contributor graph, check the commit count, check the age of the repo. A repo that is three days old, with a clean commit history and a download link in the README, is the danger profile. Walk away, or git clone into a sandbox.

If you are a security team at a platform that hosts user content, the OrchID writeup is a public audit of a specific failure mode, and the failure mode generalizes. The 16-month delay is not a fluke. It is what happens when a platform's automated abuse pipeline is trained on the previous generation of attacks, the public report of the new generation is not on a channel the security team is monitoring, and the abuse team has no public metric for "repos with URLs in their README." The fix is not more scanning. The fix is one engineer spending a week on a "for every README URL, fetch and AV-scan the target" job, and then turning it on by default. The cost of doing it is small. The cost of not doing it is on a measurable clock.

What to do this week

STEP 1. Audit your own recent repos for clones you didn't make. Google "[your project name] github" and look for results that are not your repo. Click through. If the README is yours plus a link, that is the campaign. (Reference: the OrchID writeup, "Introduction" section, on what the comparison looks like in practice.)

STEP 2. Run the git-malware-finder script against a topic you care about. The investigator published the detection script as github.com/orchidfiles/git-malware-finder. It is read-only — it produces a list, it does not take action on the listed repos.

STEP 3. If you find a clone, file an abuse report. The pattern is identical across all 10,000 repos in the current set, so one good report is reusable as a template. Confirm the suspect with gh repo view <user>/<repo>, then file at github.com/contact/report-content → "Malicious content on a repository" → paste the repo URL, the original repo URL, the "this README link is the vector" note. Reference the OrchID writeup (orchidfiles.com/github-repositories-distributing-malware/) as the campaign's public documentation.

STEP 4. For platform security teams: spend the time. The 16-month gap is a known, named, repeatedly-reported failure mode. The detection job is a one-engineer-week. The next campaign will not wait for another solo investigator to publish a list.

STEP 5. If your CI runs a git clone of a third-party repo as part of an integration test, sandbox it. The current campaign's loaders are Windows executables, but the next one will not be. The cost of running an untrusted git clone inside a container with no network egress and a read-only filesystem is small. The cost of running it in your CI host's working directory is the same 10,000 repos the campaign is currently trying to get you to clone.

# Concrete, copy-pasteable audit (run from a clean machine).
gh repo view <your-handle>/<your-repo>
google_search="https://www.google.com/search?q=%22$(echo your-repo | tr ' ' '+')%22+site%3Agithub.com"
curl -sL --compressed --max-time 20 -A "Mozilla/5.0" "$google_search" \
  | grep -oE 'github\.com/[A-Za-z0-9_-]+/[A-Za-z0-9_.-]+' \
  | sort -u > /tmp/clone-candidates.txt
# Manually diff /tmp/clone-candidates.txt against your own repos.
# Anything that is not yours is a clone candidate; if the README
# has a download link, file an abuse report.

Disclosure

Drafted with AI assistance. Primary source: "I discovered a large-scale malware distribution campaign on GitHub," OrchID Files (handle: theorchid), 18 June 2026 — curl -sL --compressed on 2026-06-19. The 10,000 / 40,000 / 25% figures, the 5,000 requests-per-hour rate-limit note, the four-file ZIP layout (cmd / exe / cso-or-txt / lua51.dll), the VirusTotal link-vs-file detection-gap finding, the 16M-commit-pushes / 3,000 high-frequency-candidates figures, and the "Update 2" GitHub-sweep confirmation are all from the OrchID writeup. Hacker News item 48583928, "I found 10k GitHub repositories distributing Trojan malware," 635 points and 144 comments as of 2026-06-19 09:00 UTC+8 via the Algolia HN Search API (/api/v1/search endpoint; the /api/v1/items/<id> endpoint returns num_comments: null and only points, so the comment count was sourced from the search endpoint, not the items endpoint); the original HN submission timestamp is 2026-06-18T11:45:43Z. Secondary source: Maurice Fielenbach, "Cloned, Loaded, and Stolen: How 109 Fake GitHub Repositories Delivered SmartLoader and StealC," Hexastrike Cybersecurity, 18 April 2026 — 109 repos, SmartLoader/StealC infostealer, LuaJIT + Polygon-based C2. The Reddit thread (r/github, February 2025, "If you're creating new repositories, they are being spoofed to host malware") is linked from the OrchID writeup's "Update 3" but was not re-fetched for this post; the date and title are from the OrchID citation. The git-malware-finder script is referenced from the OrchID writeup; the script URL (github.com/orchidfiles/git-malware-finder) is the same. The "one engineer-week" cost estimate in the "What this means for you" section is this blog's directional read of the README-URL scan job, not a sourced claim from the OrchID article or from GitHub. The "weeks, not days" response-time figure is this blog's read of the OrchID timeline, where the original report took "two weeks" for an initial non-response and a further month-plus for the initial repo deletion; that is a sample size of one, not a verified SLA. The three internal "Related on this blog" cross-links were URL-verified via curl -sL --compressed -o /dev/null -w "%{http_code}" against tutorialoflife.blogspot.com on 2026-06-19; the Anubis, Miasma, and Recruiter URLs all returned HTTP 200.

Sources

  • "I discovered a large-scale malware distribution campaign on GitHub," OrchID Files, 18 June 2026, 10,000-repo forensic writeup, with the search pattern, the file layout, the VirusTotal link-vs-file test, the API rate-limit discussion, and the full repos list (linked from the article): https://orchidfiles.com/github-repositories-distributing-malware/
  • Hacker News, item 48583928, "I found 10k GitHub repositories distributing Trojan malware," 635 points and 144 comments as of 2026-06-19 09:00 UTC+8 (Algolia API value; numbers move as the thread ages) — https://news.ycombinator.com/item?id=48583928
  • Algolia HN Search API metadata for item 48583928 (canonical point/comment counts and the 2026-06-18T11:45:43Z submission timestamp) — https://hn.algolia.com/api/v1/items/48583928
  • Maurice Fielenbach, "Cloned, Loaded, and Stolen: How 109 Fake GitHub Repositories Delivered SmartLoader and StealC," Hexastrike Cybersecurity, 18 April 2026 — 109 repos, SmartLoader/StealC, LuaJIT + Polygon-based C2 (the prior, smaller-scale documentation of the same pattern): https://hexastrike.com/resources/blog/threat-intelligence/cloned-loaded-and-stolen-how-109-fake-github-repositories-delivered-smartloader-and-stealc/
  • git-malware-finder, the detection script OrchID published alongside the writeup, plus the full 10,000-repo list (read-only tooling, no automated action against the listed repos): https://github.com/orchidfiles/git-malware-finder
  • Related on this blog: "The Recruiter's Repo. The npm install Was the Backdoor." — supply-chain malware precedent on a different vector (npm, not git clone); the trust model failure is the shared theme: https://tutorialoflife.blogspot.com/2026/06/the-recruiters-repo-npm-install-was.html
  • Related on this blog: "Miasma Worm Just Hit Microsoft Azure. The 6/8 Post Was the Trailer." — the largest hyperscaler-side supply-chain compromise to date, same trust-model failure at a different layer (config files, not repos): https://tutorialoflife.blogspot.com/2026/06/miasma-worm-just-hit-microsoft-azure-68.html
  • Related on this blog: "Anubis Moved PoW to WebAssembly. The Compiler Broke It." — the reproducible-builds angle, distinct problem, same supply-chain-trust framing: https://tutorialoflife.blogspot.com/2026/06/anubis-moved-pow-to-webassembly.html

No comments:

Post a Comment