We stopped AI bot spam in our GitHub repo using Git's –author flag
几个月前,GitHub 宣布创下贡献记录,称每秒都有新开发者加入,AI 推动 TypeScript 登顶。但 Archestra 团队认为这个庆祝忽视了一个关键问题:贡献质量在严重下滑。他们亲身遭遇了这种情况——为一个新功能悬赏 900 美元时,最初有真实贡献者参与,但很快被 AI 机器人淹没:涌入了 253 条评论,把真正的讨论淹没在噪音中,甚至对维护者出言不逊。垃圾信息蔓延到整个仓库,团队不得不花大量时间清理未经测试的拉取请求和虚构的问题,最终令项目对真诚的贡献者变得不友好。
为应对这种"AI 垃圾",团队先试着做一个信誉机器人来识别真实用户,随后又推出"AI 警长"自动关闭可疑活动,但它有时也误伤真实贡献者。意识到这些手段不足后,他们启用了"核选项":强制贡献者入职流程。现在,新用户必须在 Archestra 网站完成五步流程(包括 AI 伦理规则和验证码),才能与仓库互动。此举把质量放在首位,而不是追逐那些风投支持的初创公司常见的膨胀指标,确保项目对负责任的 AI 用户和工程师仍是安全的空间。
因为 GitHub 没有直接的白名单功能,团队不得不借助"限制为之前的贡献者"这一设置做变通。利用 Git 能把提交归属给不同作者的机制,他们写了一个 GitHub Action,在新用户的账户下自动向主分支提交,从而让 GitHub 识别这些人为"之前的贡献者",赋予发表评论和提 PR 的权限。整个流程包括网站入职、将用户加入外部贡献者文件的 GitHub Action,以及推到 main 的自动提交,实际上达到了白名单的效果。
尽管 GitHub 报告 AI 驱动的活动带来大规模增长,像 Archestra 这样的开源团队却要承担维护可信度的重担。这些"垃圾"不仅打击真实贡献者的积极性,还带来安全风险——正如 LiteLLM 等项目里,攻击者用 AI 机器人操纵讨论那样。团队呼吁社区正视 AI 对开源生态的负面影响,认为现在应当把质量与安全置于数量之上,确保开源对所有人依然友好。
A few months ago, GitHub celebrated record-breaking contribution metrics, highlighting that a new developer joins every second and AI drove TypeScript to the top spot. However, the team at Archestra felt this celebration missed a critical point: the severe degradation in contribution quality. They experienced this firsthand when they posted a $900 bounty for a new feature. While legitimate contributors initially engaged with the issue, AI bots soon flooded the conversation with 253 comments, burying real discussions under noise and even displaying aggression toward maintainers. This spam extended across the repository, forcing the team to spend significant time cleaning up untested pull requests and hallucinated issues, ultimately making the project hostile to genuine contributors.
To combat this "AI slop," the team first attempted to build a reputation bot to identify legitimate users, followed by an "AI sheriff" to auto-close suspicious activity, though it occasionally caught real contributors in the net. Realizing these measures were insufficient, they implemented a "nuclear option": mandatory contributor onboarding. Today, new users must complete a five-step process involving ethical AI rules and a CAPTCHA on the Archestra website before they can interact with the repository. This approach prioritizes quality over the inflated metrics that VC-backed startups are often measured by, ensuring the project remains a safe space for responsible AI users and engineers.
Since GitHub lacks a straightforward whitelist feature, the team had to engineer a workaround using the "Limit to prior contributors" setting. By utilizing Git's ability to attribute commits to different authors, they created a GitHub Action that automatically commits to the main branch under a new user's account. This tricks GitHub into recognizing them as a "prior contributor," granting them access to comment and submit pull requests. The full flow involves website onboarding, a GitHub Action that adds the user to an external contributors file, and an automated commit pushed to main, effectively whitelisting the user.
While GitHub reports massive growth driven by AI-generated activity, open source teams like Archestra are left doing the heavy lifting to maintain legitimacy. The "slop" not only demotivates real contributors but also introduces security risks, as seen in other projects like LiteLLM where attackers used AI bots to manipulate discussions. The team urges the community to have a serious conversation about the negative impact AI is having on the open source ecosystem. They believe it is time to prioritize quality and safety over quantity, ensuring that open source remains a comfortable environment for everyone.
218 comments • Comments Link
• 有人提议对每次提交的 pull request 收取 10 美元押金,若被接受则退还,作为简单的经济过滤手段;若 PR 被拒,这笔费用可用于分诊工作,以阻挡低质量或机器人自动生成的 PR 。
• 对提交 PR 收小额费用类似于 2004 年 SomethingAwful 论坛采用的成熟策略,通过引入经济摩擦来减少机器人和恶意用户。
• 怀疑者认为,恶意提交 AI 生成垃圾内容的人不会被金钱门槛吓退,遭到拒绝时可能还会升级策略。
• 有人建议为贡献者建立类 ELO 的评分系统,分数反映其在各项目中以往贡献的质量和影响力,以便优先处理高价值的 PR 和 issue 。
• 类似的声誉系统容易被操纵,尤其会被协同行为者或 AI 利用;一旦获得初始访问权限,恶意贡献者可能借此提升地位。
• 将声誉与通过为其他项目贡献获得的付费货币挂钩,能创建去中心化的信任网络,但会引发谁发行初始货币以及如何启动的问题。
• 这类系统有复制 StackExchange 等平台缺陷的风险,包括精英化、版主小团体以及对新人的进入壁垒。
• 围绕 GitHub 的贡献者权限存在安全隐患:即便一个看似无害的 PR 被合并,也可能导致权限提升,被恶意行为者利用。
• GitHub 缺乏限制 AI 生成 PR 的动力,因为其母公司 Microsoft 从生成此类提交的 AI 工具(如 Copilot)中获利。
• 有团队通过 CI 加入了 CAPTCHA 验证入职流程,验证通过后自动提交一个小 commit;在第一周内成功拦截了数百个机器人。
• 虽然这能减少 PR 垃圾,但会把噪音转移到提交历史——某个仓库中超过 10% 的近期提交成了自动入职的产物,尽管它们不会触发通知或干扰 issue 跟踪。
• 批评者指出,重复完成 CAPTCHA 以获得白名单的做法脆弱,只是增加了摩擦而非长久之计,因为坚定的机器人或人类仍能完成这些步骤。
• 根本问题可能是自找的:在同一工作流中混合临时贡献者和核心维护者会造成不必要的暴露;像 git 的邮件补丁提交等旧模型提供了更精细的控制。
• 赏金制度按设计会吸引垃圾:为未明确规范的问题提供金钱激励,即使没有 AI 也会引发大量低质量尝试,正如对"实现计划"类请求的反感所示。
• 有建议让 GitHub 限制那些被拒绝率极高的账户,类似于限制滥用行为,尽管账户创建仍然容易。
• 开源中的经济激励可能扭曲行为;对匿名或无名贡献者而言,认可与尊重可能比赏金更可持续地激励贡献。
• 有人指出,一家使用 .ai 域名却抱怨 AI 滥用的公司有讽刺意味,另一些人则认为这更多是务实而非虚伪。
• Tangled 等工具展示了自动化贡献者审核的可行例子,利用社交担保来门控贡献,提供了超出 GitHub 原生功能的替代信任模型。
讨论表明社区对 AI 生成的 pull request 垃圾广泛担忧,正在寻找既实用又低摩擦的威慑手段而不疏远真实贡献者。经济壁垒、声誉系统和基于 CAPTCHA 的入职流程都很受关注,但各有弊端:要么容易被利用,要么产生新的噪音,要么无法触及更深层的激励问题。许多评论者强调了 GitHub 的商业利益——从 AI 编码工具中获利——与社区对垃圾控制需求之间的张力,暗示没有外部压力平台层面的解决方案难以到位。最终共识是不存在单一灵丹;分层摩擦、更合理的工作流设计以及转向非货币化的认可,可能是更可持续的方向。 • A $10 deposit per pull request, refunded upon acceptance, is proposed as a simple economic filter to deter low-effort or bot-generated PRs, with the added benefit of funding triage if the PR is rejected.
• Charging a small fee to submit PRs mirrors a proven strategy used by the SomethingAwful forum in 2004 to reduce bots and trolls by introducing financial friction.
• Skeptics argue that bad-faith actors submitting AI-generated slop won't be deterred by monetary barriers and may escalate their tactics when faced with rejection.
• An ELO-like reputation system for contributors is suggested, where scores reflect the quality and impact of past contributions across projects, helping prioritize high-signal PRs and issues.
• Reputation systems like ELO are vulnerable to manipulation, especially by coordinated actors or AI, and could be exploited to elevate malicious contributors once initial access is gained.
• Tying reputation to a paid currency—earned by contributing to other projects—could create a decentralized trust network, but raises questions about who issues the initial currency and how bootstrapping works.
• Such systems risk replicating known flaws in platforms like StackExchange, including elitism, moderator cliques, and barriers to entry for newcomers.
• Security concerns exist around GitHub's contributor permissions: even a single merged innocuous PR grants elevated access, which could be exploited by malicious actors.
• GitHub has little incentive to curb AI-generated PRs because its parent company, Microsoft, profits from AI tools like Copilot that generate such submissions.
• One team implemented a CAPTCHA-gated onboarding flow via CI that co-authors a tiny commit after verification, successfully blocking hundreds of bots in the first week.
• While effective at reducing PR spam, this approach shifts noise into commit history—over 10% of recent commits in one repo are automated onboarding artifacts—though they don't trigger notifications or clutter issue trackers.
• Critics note that whitelisting via repeated CAPTCHA solves is fragile and merely adds friction, not a permanent solution, since determined bots or humans can still complete the steps.
• The root problem may be self-inflicted: mixing casual contributors with core maintainers in the same workflow creates unnecessary exposure; older models like git's email-based patch submission offer finer control.
• Bounty systems attract spam by design—offering money for under-specified issues invites low-effort attempts, even without AI, as seen in the backlash against implementation plan requests.
• Some suggest GitHub should limit accounts with extremely high PR rejection rates, similar to rate-limiting abusive behavior, though account creation remains easy.
• Financial incentives in open source can distort behavior; recognition and respect may be more sustainable motivators than bounties, especially for anonymous or unknown contributors.
• There's irony in a company using a .ai domain while complaining about AI misuse, though others argue it reflects pragmatism rather than hypocrisy.
• A working example of automated contributor vetting exists in tools like Tangled, which use social vouching to gate contributions, showing alternative trust models beyond GitHub's native features.
The discussion reveals a widespread concern over AI-generated pull request spam and a search for practical, low-friction deterrents that don't alienate genuine contributors. Economic barriers, reputation systems, and CAPTCHA-based onboarding emerge as popular ideas, but each faces criticism for either being easily gamed, creating new noise, or failing to address deeper structural incentives. Many commenters highlight the tension between GitHub's business interests—profiting from AI coding tools—and the community's need for spam controls, suggesting platform-level solutions are unlikely without external pressure. Ultimately, the conversation underscores that no single fix exists; instead, layered friction, better workflow design, and a shift toward non-monetary recognition may offer the most sustainable path forward.