Project Glasswing: what Mythos showed us
Cloudflare 近期参与了 Project Glasswing——与 Anthropic 合作测试其名为 Mythos Preview 的新型安全导向大语言模型。该模型被用于扫描 Cloudflare 自身超过五十个代码库以查找漏洞。结果显示,AI 在安全研究中的辅助能力大幅提升,尤其是在构建漏洞链和生成可行概念验证(PoC)方面。
Mythos Preview 能将多个低严重性漏洞串联成一个高严重性漏洞。与那些可能能发现漏洞却未能判断其可利用性的模型不同,Mythos Preview 能推理出将漏洞原语链在一起所需的步骤。它还有一个"证明生成"循环:模型会编写触发可疑漏洞的代码,进行编译和运行以确认漏洞存在;如果首次尝试失败,它会调整方法继续尝试。
然而研究也暴露出若干挑战。一个主要问题是所谓的"模型拒绝"——AI 会在某些情况下主动回绝合法的安全研究请求,而且这种拒绝表现并不一致,往往取决于任务的表述或上下文。此外,模型的"信噪比"仍然较低:它倾向于用"可能""潜在"等模糊措辞过度报告问题,从而可能淹没人工审查团队。
团队发现,简单把通用的编码代理指向代码库无法实现全面覆盖。于是他们开发了一个专门的执行框架来管理 AI 的运行。该框架把流程拆成多个狭义且并行的任务,而非一次性穷尽式搜索,包含侦察、基于限定提示的搜寻、对抗性验证以降低噪音,以及追踪以确认漏洞是否真正可被攻击者利用。
展望未来,Cloudflare 强调仅靠更快修补不足以应对这些日益强大的 AI 能力。他们建议从架构上增强防御,例如在应用前端部署更严密的防护,并设计系统以避免单一缺陷导致整体沦陷。 Cloudflare 还计划进一步披露其产品将如何帮助客户实施这些防御策略。
Cloudflare recently participated in Project Glasswing, a collaboration with Anthropic to test their new security-focused LLM called Mythos Preview. The model was tasked with scanning over fifty of Cloudflare's own codebases to identify vulnerabilities. The results showed a significant leap forward in AI-assisted security research, particularly in the model's ability to construct exploit chains and generate working proofs of concept.
Mythos Preview demonstrated a unique capability to combine multiple low-severity bugs into a single, high-severity exploit. Unlike previous models that might identify a bug but leave the question of exploitability open, Mythos Preview can reason through the steps required to chain primitives together. It also features a "proof generation" loop where it writes code to trigger a suspected bug, compiles it, and runs it to confirm the vulnerability, adjusting its approach if the initial attempt fails.
However, the research highlighted several challenges. One major issue is "model refusals," where the AI organically pushes back on legitimate security research requests. These refusals were found to be inconsistent, often depending on how a task was framed or the specific context provided. Additionally, the "signal-to-noise" ratio remains a hurdle, as models tend to over-report potential issues with hedged language like "possibly" or "potentially," which can overwhelm human triage teams.
The team discovered that simply pointing a generic coding agent at a repository is ineffective for comprehensive coverage. Instead, they developed a specialized "harness" to manage the AI's execution. This harness breaks down the process into narrow, parallel tasks rather than one exhaustive search. It includes stages for reconnaissance, hunting with scoped prompts, adversarial validation to reduce noise, and tracing to confirm if a bug is actually reachable by an attacker.
Looking ahead, Cloudflare emphasizes that simply patching faster is not a sufficient defense against these advancing AI capabilities. They advocate for architectural changes that make exploitation harder, such as implementing robust defenses in front of applications and designing systems where a single flaw cannot compromise the entire infrastructure. Cloudflare plans to share more on how their products will help customers implement these defensive strategies in the future.
130 comments • Comments Link
这篇博客被批评为含糊、像由 AI 生成的广告,基本只是重复了 Mythos 的发布公告,却没有给出具体数据,难以将新模型与先前版本进行比较。
对于所谓"阶跃函数"改进的说法,部分人怀疑其成功归因于大量计算资源的持续投入,而不是一个根本不同的基础模型。
评论者还指出一个日益明显的趋势:企业博客越来越由 LLM 撰写,导致独特声音丧失,充斥着空洞术语和平淡的企业语气。
讨论中有人担忧出现"模型崩溃"或反馈循环:AI 生成的内容被回收进训练数据,可能扼杀人类创造力和表达多样性。
尽管 Mythos 因能把低严重性漏洞串联成复杂利用链而获得赞扬,批评者指出它仍然高度依赖人类专业知识来验证发现,以避免误报或幻觉问题。
文中所谓的"经验教训"被认为不够新颖,但对抗性审查被视为改进 AI 工作流的有价值手段。
安全专家强调,虽然 AI 降低了发现漏洞的成本,但解决之道不是更多 AI,而是编写安全的代码并采用内存安全的语言,以减少不可避免的人为错误。
有人引用 cURL 维护者对 Mythos 的评估作为证据,认为该模型在经过良好审计的代码库上可能表现令人失望,尽管它仍对大量"糟糕"或不安全的代码构成重大威胁。
缺乏关于漏洞严重性和误报率的数据,进一步加深了对该工具实际效用的怀疑。
讨论还提到 Anthropic——一家专注于 AI 安全的公司,可能也在用 AI 做营销,其产品如 Claude Code 仍存在重大缺陷。
总体而言,话题反映出对企业 AI 公告的深刻怀疑,尤其是在缺乏可验证证据或明显以营销为导向的情况下。社区虽承认 AI 在攻防能力上带来了重大变化,但仍强调人类专业知识在验证和基本编码实践方面不可替代的作用。该线程还暴露了由于 LLM 在专业交流中广泛采用而导致的写作风格同质化和由此产生的文化疲劳。 • The blog post is criticized for being a vague, AI-generated advertisement that rehashes the Mythos announcement without providing concrete data, making it difficult to compare the new model to predecessors.
• There is skepticism regarding the "step function" improvement claims, with some attributing the model's success to compute-intensive, always-on operation rather than a fundamentally different base model.
• Commenters highlight a growing trend of corporate blogs being written by LLMs, leading to a loss of distinct voice and an increase in "load-bearing" terminology and bland corporate style.
• The discussion raises concerns about "model collapse" or a feedback loop where AI-generated content becomes training data, potentially stifling human creativity and diversity of expression.
• While Mythos is praised for its ability to chain low-severity bugs into complex exploits, critics note that it still requires significant human expertise to verify findings and avoid false positives or hallucinated issues.
• The post's "lessons learned" are viewed as obvious, though the concept of adversarial review is noted as a valuable takeaway for improving AI workflows in various fields.
• Security professionals emphasize that while AI lowers the cost of finding vulnerabilities, the solution is not more AI, but rather writing secure code and using memory-safe languages to mitigate inevitable human error.
• The cURL maintainer's evaluation of Mythos is cited as evidence that the model may be underwhelming against well-audited codebases, though it remains a significant threat to the vast amount of "shitty" or insecure code in the wild.
• There is frustration over the lack of transparency and hard numbers regarding the severity of vulnerabilities found and the ratio of false positives, which fuels skepticism about the tool's actual efficacy.
• The conversation touches on the irony of Anthropic, a company focused on AI safety, potentially using AI for their own marketing while their products like Claude Code still exhibit significant bugs.
The discussion reflects a deep-seated skepticism toward corporate AI announcements, particularly when they lack empirical evidence or appear to be marketing-driven. While there is an acknowledgment that AI represents a significant shift in offensive security capabilities, the community stresses the irreplaceable role of human expertise in verification and the necessity of fundamental secure coding practices. The thread also reveals a cultural fatigue with the homogenization of writing styles caused by the widespread adoption of LLMs in professional communication.