这是一份精选的 CUDA 编程书目,覆盖从入门到高级的资料,包含 C++ 与 Python 相关书籍,侧重架构、性能优化以及 2024–2026 年的最新出版物。书目按类别组织:初学者指南、核心架构、实战手册、高级优化、 Python 与高层 CUDA,以及近年出版物。 This is a curated list of major books on CUDA programming, covering resources from beginner to advanced levels, including C++ and Python, with a focus on architecture, optimization, and recent releases from 2024 to 2026. The list is organized into categories such as beginner guides, core architecture, practical hands-on guides, advanced optimization, Python and high-level CUDA, and modern releases. Notable beginner books include "CUDA by Example" (2010), "Learn CUDA Programming" (2019), and "CUDA for Engineers" (2016), which are example-driven and suitable for newcomers. Core architecture resources feature "Programming Massively Parallel Processors" (3rd edition, 2022), described as the definitive GPU architecture bible used in universities worldwide. Practical guides include "Programming in Parallel with CUDA" (2022) with real-world scientific examples, "Professional CUDA C Programming" (2014) for production-level multi-GPU and streams, and "GPU Parallel Program Development Using CUDA" (2018) focusing on libraries like cuBLAS and Thrust. Advanced references include "The CUDA Handbook" (2013) for deep API details, "CUDA Programming" (2013) for parallel algorithms and optimization, and "CUDA Application Design and Development" (2011) for research applications. Python-focused books are "Hands-On GPU Programming with Python and CUDA" (2018) for Numba and CuPy, and "GPU Programming with C++ and CUDA" (2024) with modern C++20 and Python interop. Modern releases from 2022 to 2026 include updated editions and specialized titles like "CUDA C++ Optimization" (2024), "CUDA C++ Debugging" (2024), and "High-Performance Computing with C++26 and CUDA 13" (2026). The list emphasizes pairing books with the free official CUDA C++ Programming Guide (v13.x, 2026) due to rapid changes in CUDA. Contributions are welcome via pull requests, with preferences for post-2018 books or relevant classics that include substantial code and examples. The repository is part of the Awesome series and includes related lists for CUDA tools, GPU resources, and parallel computing.
这是一份精选的 CUDA 编程书目,覆盖从入门到高级的资料,包含 C++ 与 Python 相关书籍,侧重架构、性能优化以及 2024–2026 年的最新出版物。书目按类别组织:初学者指南、核心架构、实战手册、高级优化、 Python 与高层 CUDA,以及近年出版物。
入门推荐包括 "CUDA by Example"(2010)、 "Learn CUDA Programming"(2019)和 "CUDA for Engineers"(2016),均以示例为主,适合初学者。核心架构类以 "Programming Massively Parallel Processors"(3rd ed., 2022)为代表,常被高校作为 GPU 架构的权威教材。
实战类有 "Programming in Parallel with CUDA"(2022),包含真实科学示例;"Professional CUDA C Programming"(2014),面向生产环境的多 GPU 与 streams 使用;以及 "GPU Parallel Program Development Using CUDA"(2018),侧重 cuBLAS 、 Thrust 等库的应用。高级参考如 "The CUDA Handbook"(2013)提供深入 API 细节,"CUDA Programming"(2013)覆盖并行算法与优化,"CUDA Application Design and Development"(2011)则面向研究型应用设计。
关注 Python 的书籍包括 "Hands-On GPU Programming with Python and CUDA"(2018),介绍 Numba 和 CuPy;以及 "GPU Programming with C++ and CUDA"(2024),涉及现代 C++20 与 Python 互操作。近年重要出版物(2022–2026)有多部更新版与专题书,如 "CUDA C++ Optimization"(2024)、 "CUDA C++ Debugging"(2024)和 "High-Performance Computing with C++26 and CUDA 13"(2026)。
由于 CUDA 变动快速,建议将这些书籍与官方免费文档 CUDA C++ Programming Guide(v13.x, 2026)配合阅读。欢迎通过 pull request 提交推荐,优先收录 2018 年以后的书籍或仍具大量示例代码的经典著作。该仓库属于 Awesome 系列,并附有关于 CUDA 工具、 GPU 资源与并行计算的相关列表。
This is a curated list of major books on CUDA programming, covering resources from beginner to advanced levels, including C++ and Python, with a focus on architecture, optimization, and recent releases from 2024 to 2026. The list is organized into categories such as beginner guides, core architecture, practical hands-on guides, advanced optimization, Python and high-level CUDA, and modern releases. Notable beginner books include "CUDA by Example" (2010), "Learn CUDA Programming" (2019), and "CUDA for Engineers" (2016), which are example-driven and suitable for newcomers. Core architecture resources feature "Programming Massively Parallel Processors" (3rd edition, 2022), described as the definitive GPU architecture bible used in universities worldwide. Practical guides include "Programming in Parallel with CUDA" (2022) with real-world scientific examples, "Professional CUDA C Programming" (2014) for production-level multi-GPU and streams, and "GPU Parallel Program Development Using CUDA" (2018) focusing on libraries like cuBLAS and Thrust. Advanced references include "The CUDA Handbook" (2013) for deep API details, "CUDA Programming" (2013) for parallel algorithms and optimization, and "CUDA Application Design and Development" (2011) for research applications. Python-focused books are "Hands-On GPU Programming with Python and CUDA" (2018) for Numba and CuPy, and "GPU Programming with C++ and CUDA" (2024) with modern C++20 and Python interop. Modern releases from 2022 to 2026 include updated editions and specialized titles like "CUDA C++ Optimization" (2024), "CUDA C++ Debugging" (2024), and "High-Performance Computing with C++26 and CUDA 13" (2026). The list emphasizes pairing books with the free official CUDA C++ Programming Guide (v13.x, 2026) due to rapid changes in CUDA. Contributions are welcome via pull requests, with preferences for post-2018 books or relevant classics that include substantial code and examples. The repository is part of the Awesome series and includes related lists for CUDA tools, GPU resources, and parallel computing.
作者认为,人工智能并不会自动加快组织流程,这挑战了仅靠引入 AI 工具就能提升吞吐量的普遍认知。重读 The Toyota Way 和 The Goal 后,作者指出流程优化常被过度简化并且方向错误:关键不在于盲目加速最慢的步骤,而是要弄清楚该步骤为何缓慢。 The author argues that AI will not automatically make organizational processes faster, challenging the common assumption that simply adding AI tools will improve throughput. Drawing on insights from re-reading "The Toyota Way" and "The Goal," the article emphasizes that process optimization efforts are often too simplistic and misdirected. The core issue isn't just about speeding up the slowest step, but understanding why it's slow in the first place.
作者认为,人工智能并不会自动加快组织流程,这挑战了仅靠引入 AI 工具就能提升吞吐量的普遍认知。重读 The Toyota Way 和 The Goal 后,作者指出流程优化常被过度简化并且方向错误:关键不在于盲目加速最慢的步骤,而是要弄清楚该步骤为何缓慢。
作者用甘特图说明典型的项目进度,表面上看软件开发是耗时最长的环节,似乎成了显而易见的瓶颈,但真正的问题常常出在上游。漫长的开发周期往往源于需求不清或不完整,像"销售完成后给用户发送邮件"这种模糊需求若没有对邮件内容、异常处理和完成标准的明确规定,开发者就会把大量时间花在澄清问题上,而不是在实现功能上。
文章批判了那种指望 AI 生成代码就能绕开这些问题、让开发者转为项目经理的想法。实际上,AI 生成正确代码仍然需要极其详尽和精确的指令;AI 在编码上节省的时间常被对更详尽文档和领域专家密集指导的需求所抵消。这也反映了人类开发者长期以来对更清晰、更全面项目说明的渴望。
结论是,要加快流程,必须确保执行者拥有完成工作所需的一切条件,也就是为瓶颈提供高质量、可预测的输入。无论是法律审批还是软件开发,如果根本问题是信息不完整或不清楚,增加人手或 AI 工具都无济于事。流程自动化的第一步应是提升输入的质量与清晰度,而不是单纯追求更快的执行。
The author argues that AI will not automatically make organizational processes faster, challenging the common assumption that simply adding AI tools will improve throughput. Drawing on insights from re-reading "The Toyota Way" and "The Goal," the article emphasizes that process optimization efforts are often too simplistic and misdirected. The core issue isn't just about speeding up the slowest step, but understanding why it's slow in the first place.
The author uses a Gantt chart to illustrate a typical project timeline, where software development appears to be the longest phase. While this might seem like the obvious bottleneck, the real problem often lies upstream. The lengthy development time is frequently due to unclear or incomplete requirements, such as vague feature requests like "send mail to a user once a sale is completed." Without detailed specifications on content, error handling, and completion criteria, developers spend significant time clarifying the problem rather than solving it.
The article critiques the expectation that AI-generated code will bypass these issues, allowing developers to become project managers. However, AI still requires extremely detailed and precise instructions to produce correct code. The author points out that the time saved by AI coding is often offset by the increased need for detailed documentation and handholding from domain experts. This mirrors the longstanding desire of human developers for clearer, more comprehensive project outlines.
Ultimately, the author concludes that speeding up processes requires ensuring that workers have all the necessary means to do their jobs effectively. This means providing high-quality, predictable inputs to bottlenecks, as emphasized in "The Goal." Whether in legal approvals or software development, adding more people or AI tools won't help if the underlying issue is incomplete or unclear information. The first step in process automation should be improving the quality and clarity of inputs, not just accelerating the execution.
模糊的需求一直是软件开发的瓶颈,而大语言模型(LLM)非但没有解决这个问题,反而在某种程度上放大了它。和人类开发者一样,LLM 也需要精确的指令来构建正确的产品;不同的是,人类团队通常会质疑模糊的需求,而 LLM 往往会欣然生成看起来合理但可能完全偏离目标的代码。
当面对模糊性时,LLM 的反应和人类开发者不同。对于"获取数据并交给用户"这样的含糊指令,人类会提出澄清性问题,而 LLM 更倾向于基于假设直接生成代码。这种行为在快速原型阶段可能有利,能让用户立刻看到具体成果并做出反馈;但在安全性、可维护性等终端用户不可见的关键问题上却很危险。
值得注意的是,较新的模型(如 ChatGPT 5.5)在收到模糊提示时开始主动提出澄清问题,询问数据来源、格式等要求。这是一种改进,但它仍然假定用户知道要回答哪些问题、哪些细节重要。
产品经理往往喜欢 LLM,因为这些工具不像人类开发者那样挑战模糊需求。程序员会追问边缘情况并要求明确性,但 LLM 接受模糊输入并生成看似令人信服的输出——问题只有在细致审查后才会显现。这造成了一种危险动态:糟糕的需求被转换成看起来合理但可能错误的实现。
问题的根源不仅在于需求是否明确。即便有良好的规格说明,LLM 仍可能给出模糊的解读。要靠这项技术替代对需求进行严密思考的承诺还远未实现。结果常常是一系列平庸的妥协,而非追求卓越的产品,因为技术本身无法在无人类引导的情况下弥合人类意图与实现之间的差距。
几十年前,Fred Brooks 在其 1986 年的论文《没有银弹》中就预见了这一模式。他描述了专家系统和自动化编程在窄领域内可能带来的初步前景,但在扩展时只能带来有限的生产力提升。当前对 LLM 的体验与他的预测非常吻合。
LLM 擅长从现有代码中复制模式,但要高效工作仍需要类似开发者的规格说明和任务拆解。当问题有大量训练数据支撑时,它们表现最佳。这意味着 LLM 最适用于已有解决方案的常见问题,而不擅长需要创造性思维的新挑战。
一个实际案例说明了 AI 辅助开发的潜力与局限。一位开发者使用 Claude 在几周内重建了一个 Hacker News 克隆,性能达到了生产版本的五分之一以内。但这过程需要对 AI 输出进行严格管理以防代码库变得不可读,最终成果仍缺失原始版本中大约一百个功能。
LLM 的价值在不同组织中差异很大。对于能为每个角色聘请专家的大公司来说,AI 带来的增益相对有限;但对小团队和独立开发者而言,能让一个人勉强担当多个角色,相较于完全没有能力来说,已是巨大的飞跃。
AI 对软件开发的最大影响可能并不是单纯加速编码,而是让组织能以更精简的方式运作、减少人员,从而缓解大型企业常见的角色错位和沟通问题。生产力提升更多来自组织结构的简化,而非单纯的编码速度提升。
当前用 AI 开发的方式更像瀑布式开发或自动补全,这两种模式都不是理想的协作方式。真正意义上的人机结对编程——人类与机器迭代并肩工作——仍然难以实现,但若能做到,有望同时提高速度与准确性。
实际使用 AI 编码助手的经验显示,其效用比炒作所宣称的要温和得多。开发者在最初用 AI 快速恢复对不熟悉语言的熟练度后,常会进入一个阶段:最后的 10% 工作往往占用 90% 的时间。整体提速 10%–20% 比较常见,虽然有价值,但远未达到革命性的程度。
总体上,讨论揭示了 AI 真正能力与膨胀期望之间的张力。 LLM 在某些方面确实能加速开发,尤其是对定义明确的问题和缺乏专业角色的小团队。但这项技术更多是放大了需求明确性和组织功能障碍的问题,而非解决它们。 AI 最成功的应用通常需要经验丰富的开发者提供严格监督和领域专业知识,把工具当作倍增器而非替代判断的人。更广泛的组织影响——通过简化结构来减少大型企业固有的错位——可能最终比单纯提升编码速度更为重要。
• Vague requirements have always been the bottleneck in software development, and LLMs amplify this problem rather than solving it. Just as human developers need detailed specifications to build the right thing, LLMs require equally precise instructions. The difference is that human teams typically push back on ambiguous requirements, while LLMs will happily generate plausible-looking code that misses the mark entirely.
• LLMs differ from human developers in their response to ambiguity. When faced with a vague request like "get data and give it to the user," a human developer would ask clarifying questions, while an LLM will simply produce code based on assumptions. This can be beneficial for rapid prototyping where users can immediately see and react to something tangible, but it's dangerous for critical under-the-hood concerns like security and durability that aren't visible to end users.
• Some newer LLMs like ChatGPT 5.5 have started asking clarifying questions when given vague prompts, requesting specifics about data sources, formats, and other requirements. This represents an improvement, though it still requires users to know what questions to answer and what details matter.
• Product people often love LLMs precisely because these tools don't challenge vague requirements the way human developers do. While a programmer would probe edge cases and demand clarity, an LLM accepts ambiguous inputs and produces outputs that look convincing until examined closely. This creates a dangerous dynamic where poor requirements get transformed into plausible-looking but potentially wrong implementations.
• The fundamental problem extends beyond just requirement clarity. Even with good specifications, LLMs still produce vague interpretations of problems. The promise that this technology would eliminate the need for precise thinking hasn't materialized. Everything becomes a mediocre compromise rather than an exceptional product, because the technology can't bridge the gap between human intent and implementation without human guidance.
• This pattern was predicted decades ago by Fred Brooks in his 1986 essay "No Silver Bullets," which described how expert systems and automatic programming would show initial promise in narrow domains but deliver only modest productivity gains as they expanded. The current experience with LLMs closely matches his predictions.
• LLMs excel at reproducing patterns from existing code but require developer-like specification and task breakdown to do so effectively. They work best when there's abundant training data for the problem at hand. This means they're most useful for well-understood problems with existing solutions, not for novel challenges requiring creative problem-solving.
• A practical example demonstrates both the potential and limitations of AI-assisted development. One developer recreated a Hacker News clone in weeks rather than years using Claude, achieving performance within a factor of five of production. However, this required careful management of the AI's output to prevent the codebase from becoming an unreadable mess, and the result still lacked around a hundred features present in the original.
• The value of LLMs varies dramatically based on organizational context. For large companies that can already hire specialists for every role, AI provides relatively inconsequential benefits. But for small teams and independent developers, being able to do a mediocre job at five different roles represents a huge leap over having no capability in those areas at all.
• AI's greatest impact may come not from speeding up development itself, but from enabling organizations to run leaner with fewer people, thereby reducing the misalignment and communication problems that plague large corporations. The productivity gains come from organizational simplification rather than raw coding speed.
• The current approach to AI in software development resembles either waterfall methodology or autocomplete, neither of which represents an optimal pairing experience. True pair programming with AI, where the human and machine collaborate iteratively, remains elusive but would likely improve both speed and accuracy.
• Real-world experience with AI coding assistants suggests more modest benefits than the hype implies. After initial productivity gains from using AI to get back up to speed in an unfamiliar language, developers often find themselves in a phase where the last 10% of work takes 90% of the time. Overall speedups of 10-20% are common, which, while valuable, falls short of revolutionary claims.
The discussion reveals a tension between AI's genuine capabilities and the inflated expectations surrounding it. There's broad agreement that LLMs can accelerate certain aspects of development, particularly for well-defined problems and for small teams lacking specialized roles. However, the technology amplifies existing challenges around requirements clarity and organizational dysfunction rather than solving them. The most successful uses of AI appear to involve experienced developers who can provide tight oversight and domain expertise, treating the tool as a force multiplier rather than a replacement for human judgment. The broader organizational impact may ultimately matter more than raw coding speed, with AI enabling leaner structures that reduce the misalignment inherent in large enterprises.
在苹果芯片上本地运行大型语言模型时,真正的成本不是电费,而是硬件。作者分析了在配备 64GB 内存的 M5 MacBook Pro 上运行 Gemma 4 31b 的经济性,该机零售价为 4299 美元。在满载功耗 50–100 瓦、电价约 0.18–0.20 美元 / 千瓦时的情况下,每小时电费约 0.02 美元;若全天满负荷推理,每天约 0.48 美元,几乎可以忽略。真正的开销是机器本身及其折旧速度。 When it comes to running large language models locally on Apple Silicon, the real cost isn't the electricity. It's the hardware. The author breaks down the economics of running a model like Gemma 4 31b on an M5 MacBook Pro with 64GB of RAM, which retails for $4,299. At 50-100 watts under load and electricity costs around $0.18-0.20 per kWh, the power bill comes out to roughly $0.02 per hour, or about $0.48 per day if running inference at full tilt. That's negligible. The real expense is the machine itself, and how quickly you depreciate it.
在苹果芯片上本地运行大型语言模型时,真正的成本不是电费,而是硬件。作者分析了在配备 64GB 内存的 M5 MacBook Pro 上运行 Gemma 4 31b 的经济性,该机零售价为 4299 美元。在满载功耗 50–100 瓦、电价约 0.18–0.20 美元 / 千瓦时的情况下,每小时电费约 0.02 美元;若全天满负荷推理,每天约 0.48 美元,几乎可以忽略。真正的开销是机器本身及其折旧速度。
作者考虑了硬件使用寿命为 3 年、 5 年和 10 年的三种情形。以 5 年为中位数时,机器每小时成本约为 0.098 美元,合并电费后约为 0.12 美元 / 小时。关键在于这段时间内能处理多少 token 。对于类似 Gemma4:31b 这样的大模型,M5 Max 的速度大约在每秒 10 到 40 个 token 之间。按每秒 10 个 token 算,每小时能处理 36000 个 token,相应每百万 token 的成本在 1.61 到 4.79 美元之间(取决于寿命假设)。按每秒 40 个 token 且寿命为 10 年估算,每百万 token 的成本可降到约 0.40 美元。
相比之下,OpenRouter 上运行 Gemma4 31b 的价格约为每百万 token 0.38 到 0.50 美元。在最乐观的假设下,MacBook Pro 勉强能与云端价格持平;但在更现实的假设下,苹果芯片上的本地推理成本大约是从 OpenRouter 租用算力的三倍。而且 OpenRouter 的供应商通常能达到每秒 60 到 70 个 token,远快于 M5 Max 的本地表现。
从纯成本角度看结论很清楚:对于使用工作笔记本的人来说,他们的薪水远高于 token 成本(大约高出一千倍),因此付费使用 Anthropic 或通过 OpenRouter 租用算力比把一切都放在本地更划算。不过作者仍觉得值得惊讶的是,消费级笔记本居然能运行出接近 Anthropic Sonnet 级别性能的模型,哪怕目前在经济性上还不完全划算。
When it comes to running large language models locally on Apple Silicon, the real cost isn't the electricity. It's the hardware. The author breaks down the economics of running a model like Gemma 4 31b on an M5 MacBook Pro with 64GB of RAM, which retails for $4,299. At 50-100 watts under load and electricity costs around $0.18-0.20 per kWh, the power bill comes out to roughly $0.02 per hour, or about $0.48 per day if running inference at full tilt. That's negligible. The real expense is the machine itself, and how quickly you depreciate it.
The author walks through three possible lifespans for the hardware: 3, 5, and 10 years. At 5 years, which seems like a reasonable middle ground, the hourly cost of the machine comes to about $0.098. Add electricity and you're looking at roughly $0.12 per hour. The question then becomes how many tokens you can squeeze out of that time. For a serious model like Gemma4:31b, the M5 Max seems to manage somewhere between 10 and 40 tokens per second. At the low end of 10 tokens per second, that's 36,000 tokens per hour, which works out to somewhere between $1.61 and $4.79 per million tokens depending on your assumed lifespan. At the optimistic end of 40 tokens per second and a 10-year lifespan, you could get down to around $0.40 per million tokens.
Compare that to OpenRouter, where Gemma4 31b runs about $0.38 to $0.50 per million tokens. On the most optimistic assumptions, the MacBook Pro barely breaks even with cloud pricing. On more realistic assumptions, local inference on Apple Silicon runs about three times more expensive than just renting compute from OpenRouter. And OpenRouter providers are pushing 60-70 tokens per second, which is several times faster than what the M5 Max manages locally.
The conclusion is pretty straightforward from a pure cost perspective. For someone using a work laptop, their salary dwarfs the token costs by a factor of about a thousand, so paying Anthropic or using OpenRouter makes far more financial sense than trying to run everything locally. That said, the author finds it remarkable that a consumer laptop can run models approaching Anthropic Sonnet-level performance at all, even if the economics don't quite pencil out yet.
• 前沿 AI 公司以巨额亏损价格出售推理服务,烧掉数千亿美元抢占市场份额,并在被迫提价前不计成本,这使个人在纯成本竞争中几乎没有胜算。
• 云服务商通过工业电价、批发硬件定价、多租户利用率和专用芯片获得远超个人设备的效率,使得消费级硬件在每 token 成本上几乎无法竞争。
• 整个推理栈受到风险资本补贴:例如 OpenRouter 以 13 亿美元估值融资,国内模型如 DeepSeek 和 Qwen 采取激进定价,因为北京系资本更看重市场份额而非利润率,这意味着当前的低价并非稳定均衡。
• Anthropic 和 OpenAI 等公司宣称"推理盈利"的说法站不住脚:他们往往忽视持续训练所需的投入、资本成本、折旧以及用户流失带来的费用,这些都需要数十亿美元,使得所谓"盈利的推理"不过是一种误导性的成本隔离。
• 用"种橙子"的比喻并不恰当:推理更像是在卖橙子,模型构建才是种植果园;真实的动态更像跑步机——停止训练就会过时,而不是一次性投资就一劳永逸。
• 本地推理在经济上合理的主要情形是硬件已被用于其他用途:在现有笔记本上运行模型的边际成本基本上只是电费,而不是再买一台新机器的全部花费。
• 本地模型的主要价值并非单纯节省成本,而是控制权、隐私、保密性、数据主权、抗中断能力,以及免受模型贬值或意外定价调整的影响;这些好处无法通过简单的每 token 成本比较体现。
• 对于典型的智能体工作负载,输入 token 往往占主导成本,通常比输出 token 高出约十倍。本地推理能使输入 token 成本几乎为零,且本地提示缓存更可靠,这显著改变了这些场景中对本地部署有利的成本计算。
• 将 MacBook Pro 与云服务直接比较存在缺陷,因为这种比较把整台笔记本的成本全部归于推理;而大多数用户本来就拥有硬件,笔记本还提供超出 token 生成的通用计算价值。
• 像 Qwen 3.6 27B 这样的中小型开源模型在许多基准上正缩小与大型前沿模型的差距,并能在消费级硬件上以可用速度运行,这使得本地推理成为有吸引力的选择,挑战了"云始终更好"的假设。
讨论揭示了本地与云 AI 推理之间,基于纯每 token 成本的经济学与更广泛价值考量之间的根本张力。从每 token 成本角度看,云推理凭借规模经济、工业化效率和大量风投补贴占优,使得当前定价长期看并不稳定,因此云端明显有优势。然而,参与者普遍强调,把比较简化为单纯成本对许多用户而言是失之偏颇的。隐私、数据主权、抗中断、对模型行为的控制以及避免被供应商锁定,都是云服务难以提供的重大非货币价值。更为细致的观点认为:当硬件已被占有、工作负载对隐私高度敏感或以输入密集型智能体任务为主时,本地推理最有意义;而在追求原始性能、访问最前沿模型或优先便利性的用户群体中,云端仍更具优势。共识是,选择不仅仅取决于经济性,而是高度依赖个人优先级——成本只是众多因素之一,还包括信任、保密性和长期可预测性。
• Frontier AI companies are selling inference at a massive loss, burning through hundreds of billions of dollars to capture market share before they're forced to raise prices, making it irrational for individuals to compete on cost alone.
• Cloud providers achieve far superior efficiency through industrial electricity rates, wholesale hardware pricing, multi-tenant utilization, and specialized chips, making it nearly impossible for consumer hardware to compete on a pure cost-per-token basis.
• The entire AI inference stack is subsidized by venture capital, with companies like OpenRouter raising at $1.3B valuations and Chinese models like DeepSeek and Qwen pricing aggressively because Beijing-adjacent capital prioritizes market share over margins, meaning current low prices are not a stable equilibrium.
• Claims of profitability from companies like Anthropic and OpenAI ring hollow when they ignore the billions required for continuous model training, capital costs, depreciation, and user churn, making "profitable inference" a misleading ringfencing of expenses.
• The analogy of growing oranges breaks down because inference isn't the farm, it's selling the oranges, while model building is growing the farm, and the real dynamic is a treadmill where stopping training means obsolescence, not a one-time investment.
• Local inference makes economic sense primarily when the hardware is already owned for other purposes, as the marginal cost of running models on an existing laptop is essentially just electricity, not the full purchase price of a new machine.
• The primary value of local models isn't cost savings but control, privacy, confidentiality, data sovereignty, resilience against outages, and freedom from model deprecation or unexpected pricing changes, which are benefits that can't be captured in a simple cost-per-token comparison.
• For typical agentic workloads, input tokens dominate costs by a large margin (often 10x output costs), and local inference makes input tokens essentially free while prompt caching is more reliable on local hardware, significantly shifting the cost calculus in favor of local for these use cases.
• The comparison between a MacBook Pro and cloud services is flawed because it allocates the entire laptop cost to inference when most users already own the hardware for other purposes, and a laptop provides general-purpose computing value beyond just token generation.
• Smaller open models like Qwen 3.6 27B are closing the gap with larger frontier models on many benchmarks while running at usable speeds on consumer hardware, making them a compelling option for local inference that challenges the assumption that cloud is always superior.
The discussion reveals a fundamental tension between pure cost economics and broader value considerations in the local versus cloud AI inference debate. On a straight cost-per-token basis, cloud inference wins decisively due to economies of scale, industrial efficiency, and heavy venture subsidization that makes current pricing unsustainable in the long term. However, participants consistently emphasize that reducing the comparison to cost alone misses the point for many users. Privacy, data sovereignty, resilience against outages, control over model behavior, and freedom from vendor lock-in represent significant non-monetary values that cloud services cannot provide. The most nuanced perspective acknowledges that local inference makes the most sense when hardware is already owned for other purposes, when workloads are privacy-sensitive, or when input-heavy agentic tasks dominate, while cloud remains superior for raw performance, access to frontier models, and users who prioritize convenience over control. The consensus suggests that the choice isn't purely economic but depends heavily on individual priorities, with cost being just one factor among many that include trust, confidentiality, and long-term predictability.
作者是位拥有近二十年 macOS 和 iOS 开发经验的资深工程师,坦率地分享了在用苹果原生框架构建富文本界面时遇到的局限。他尝试用纯 Swift 和 SwiftUI 实现一个支持 Markdown 的简单聊天功能,却屡屡受阻:SwiftUI 对基础界面足够,但在复杂文本处理上力不从心,像是无法从其原语上直接选中整个由 Markdown 组成的文档,这类限制是其设计所致。 The author, a veteran macOS and iOS developer with nearly two decades of experience, shares a candid reflection on the limitations of native Apple frameworks when building rich text interfaces. He recounts his attempt to implement a simple Markdown-supported chat feature using pure Swift and SwiftUI, only to encounter persistent roadblocks. While SwiftUI performs adequately for basic screens, it falls short for complex text handling, such as selecting an entire Markdown document built from its primitives, a limitation inherent to its design.
作者是位拥有近二十年 macOS 和 iOS 开发经验的资深工程师,坦率地分享了在用苹果原生框架构建富文本界面时遇到的局限。他尝试用纯 Swift 和 SwiftUI 实现一个支持 Markdown 的简单聊天功能,却屡屡受阻:SwiftUI 对基础界面足够,但在复杂文本处理上力不从心,像是无法从其原语上直接选中整个由 Markdown 组成的文档,这类限制是其设计所致。
为此他尝试了其他原生方案,先从 NSTextView 和 TextKit 2 入手,但要把它们和现有的 SwiftUI 代码整合非常困难,而且流式文本(现代聊天的常见需求)会引起明显的 CPU 峰值。接着他试了成熟且性能优良的 NSCollectionView,却发现单元格会不可预测地闪烁,仍然是设计层面的问题。即便用纯 TextKit 2 做底层原型,性能能接受,但在文本流式传输和与现代开发实践的兼容性方面仍然捉襟见肘。
令他最沮丧的是,要把 macOS 上那些被期望的文本行为做齐做全,需要投入巨大的工程量:上下文菜单、词典查询、平滑的选区、无障碍支持和直观的文本交互等,可能要耗费数月时间。于是他试用了 WebKit 来渲染 Markdown,发现排版和可控性都很好。但最出乎意料的是最后一次实验:他生成了一个简单的 Electron 项目,本以为会有妥协,结果却发现文本操作、 Markdown 渲染和排版开箱即用且表现出色,性能甚至超过了他最好的原生实现。
基于这一系列经验,他不得不做出一个艰难的结论:即便在 SwiftUI 、 AppKit 、 TextKit 和 WebKit 都很精通的情况下,他也无法仅用苹果的原生工具构建一个功能齐全、面向长篇富文本且排版灵活的聊天界面。对于以长篇富文本和灵活排版为核心的应用——这是当下主流的界面模式——苹果原生 SDK 很可能不再是优势而成了限制。他同时承认 Swift 在性能关键场景下依然出色,但像 Electron 或 React Native 这样的框架通过原生互操作性能提供相当的性能,同时在文本处理和渲染模型上更胜一筹。最终,他的体会是:针对这个特定且关键的使用场景,基于 Web 的技术目前没有可行的原生替代方案。
The author, a veteran macOS and iOS developer with nearly two decades of experience, shares a candid reflection on the limitations of native Apple frameworks when building rich text interfaces. He recounts his attempt to implement a simple Markdown-supported chat feature using pure Swift and SwiftUI, only to encounter persistent roadblocks. While SwiftUI performs adequately for basic screens, it falls short for complex text handling, such as selecting an entire Markdown document built from its primitives, a limitation inherent to its design.
Faced with these constraints, the author explores alternative native solutions, starting with `NSTextView` and TextKit 2. However, integrating these with existing SwiftUI code proves difficult, and streaming text, a standard feature in modern chat applications, causes significant CPU spikes. He then turns to `NSCollectionView`, a mature and performant AppKit component, but discovers that cells blink unpredictably, another design-level issue. Even a lower-level prototype using pure TextKit 2 yields acceptable performance but still struggles with text streaming and compatibility with modern development practices.
The core of the author's frustration lies in the immense effort required to achieve basic macOS text behaviors. Replicating expected functionalities like context menus, dictionary lookups, smooth selection, accessibility, and intuitive text interactions would take months of work. This stark reality leads him to experiment with WebKit for Markdown rendering, which performs well with good typography and control, but it is his final experiment that proves most revealing. In a moment of desperation, he generates a simple Electron project, expecting compromise, but instead finds that text operations, Markdown rendering, and typography work flawlessly out of the box, with performance surpassing his best native implementations.
This experience forces a difficult conclusion. Despite his deep expertise in SwiftUI, AppKit, TextKit, and WebKit, he cannot build a functional, feature-rich chat interface using Apple's native tools. The author argues that for applications centered on long-form rich text and flexible typography, the dominant interface pattern of the current era, native Apple SDKs are no longer an advantage but a constraint. He acknowledges that Swift remains excellent for performance-critical tasks, but contends that frameworks like Electron or React Native offer comparable performance through native interoperability while providing a far superior text and rendering model. The post ends with a sobering realization that for this specific, crucial use case, there is no viable native alternative to web-based technologies.
- 在经历了十多年针对 GPU 加速和复杂 Web 应用的实践与压力测试后,浏览器渲染引擎已经显著成熟,因此单纯以性能为由支持原生 API 的论点不再像过去那样有说服力。
- SwiftUI 受到性能方面的批评:有人指出,苹果自家的 System Preferences 在切换分区时会卡顿,尽管争论焦点在于这是 SwiftUI 的问题还是更广泛的原生开发问题。
- SwiftUI 与原生开发之间的差异很重要:正确设计的 Qt C++/QML 应用相比类似的 Web 应用,通常在性能和内存使用上有明显优势。
- 报道称,SwiftUI 在 xOS 26 中性能有所改善,而且大多数非平凡的 SwiftUI 应用都会结合使用 UIKit/AppKit 来补充 SwiftUI 尚不完善的功能。
- 跨平台的本地 UI 开发仍然充满挑战;对于希望其应用能在未来数十年内无需重写而保持原生体验的开发者,wxWidgets 被建议为更稳健的长期选择。
- 尽管浏览器引擎已成熟,原生应用与基于浏览器的应用在性能上仍存在显著差异,尤其是在旧款 Chromebook 等低功耗设备上,浏览器运行表现较差。
- SwiftUI 在对大型数据集的增量更新方面存在困难,这也是苹果历史上缺乏可用 SwiftUI 文本视图组件的原因之一,虽然后来在 2025 年随着增强型 TextEditor 的推出有所改善。
- 内存使用已成为优先选择原生 API 而非 Web 视图的主要原因之一:即便原始性能差距缩小,内存效率仍是显著的分水岭。
- 像 VS Code 这样的复杂应用在性能上仍低于原生应用的上限,说明简单的 Web 应用或许能匹配原生性能,但高要求的应用仍然受益于原生开发。
- V8 通过把 JavaScript 做到极快而非在浏览器中运行原生代码的路径取得成功;在渲染方面的类似进展也使得 Web 技术对大多数 GUI 需求(除去专门的硬件密集型应用)成为可行选择。
- 在性能和资源消耗方面,AppKit 仍优于 SwiftUI 和 Web 渲染,这表明老旧的苹果框架对高要求应用仍更为适合。
- 富文本渲染(包括恰当的双向文本支持、字形塑造、混合内容和自然选择)仍是软件中最难的问题之一,浏览器引擎是唯一能正确处理所有复杂性的实现。
- 在 macOS 上用 WebKit 渲染 Markdown 从技术上是合适的:Markdown 本质上会被转成 HTML,而 WebKit 是原生的 HTML 渲染器,但每个 WKWebView 实例都带来显著的内存和性能开销。
- Web 和原生 UI 工具包之间的成熟度差距源自投资模式:大量开发工作流向 Web 技术,因为它们"能直接工作",形成了正反馈循环,导致原生框架受关注较少,难以完善。
- iOS 开发的历史背景很重要:即便在 Objective-C/UIKit 时代,实现段落中可点击链接这类基本任务也非常困难,开发者因此期待 SwiftUI 在文本处理方面能达到与 Web 相当的能力。
- HTML/CSS 依然是生产力和性能最强的 GUI 系统:Web 作为文档呈现机制的本质使其在文本密集型应用中具有天然优势。
- 对于需要流式传输 Markdown 并支持文本选择的 AI 聊天应用,原生方案仍存在问题:多个文本编辑组件在渲染时会出现卡顿和 UI 锁定,因而尽管有内存和性能代价,Web 视图成为更务实的选择。
- macOS 上的 TextKit 2 公共 API 存在重大问题,开发者不得不为一些本不应出现的问题做变通,部分人甚至在探索完全绕开 TextKit 的方案。
- 现代苹果开发栈在为聊天型 UI 设计架构时显得尴尬:NSTextView 不能自然地与 SwiftUI 整合,迫使开发者要么深度依赖 AppKit,要么与 SwiftUI 的数据模型抗争。
- 跨平台原生 GUI 开发相比 Web 开发仍属于利基市场;Chrome 代表了全球资金最充足的软件项目,这解释了为什么 Web 技术"能直接工作",而原生框架在一致性和完善度上难以匹敌。
- Electron 在老旧硬件上表现欠佳;虽然原生 UI 框架自身也有问题,但追求高性能文本渲染的开发者可能需要降低对字体复杂性的要求,并避免带有 JavaScript 生态包袱的基于 JSON 的配置。
- 性能的定义因用例而异:一些开发者更看重流式传输时的稳定 FPS 和平滑滚动,而不是极致的内存效率,认为为节省数百兆甚至数 GB 内存而牺牲用户体验对日常使用数小时的应用并不划算。
- 原生移动开发的开发者体验仍然较差:像创建平滑滚动元素这样的简单任务可能需要数小时的变通和大量 Stack Overflow 研究,这也推动开发者转向虽有开销但更高效的 Web 技术。
讨论表明,原生技术与 Web 技术之间的性能差距在许多场景已明显缩小,但关键差异依然存在。浏览器渲染引擎已能处理许多原生框架仍在挣扎的复杂文本渲染与布局任务,尤其是富文本、 Markdown 和流式内容;但这并不意味着原生开发会过时,因为在低功耗设备上的内存效率和需要深度平台集成的专业应用仍倾向于原生做法。核心矛盾在于:Web 技术在文本处理能力和生态成熟度上占优,而原生开发在资源效率和平台集成上有优势。 SwiftUI 因性能问题和 API 不完整受到批评,但对一些场景来说它仍是方便的选择,并可在需要时由 AppKit/UIKit 补充。对于像 AI 聊天界面这样的文本密集型应用,Web 视图已成为务实之选,尽管在资源受限或需深度平台集成的环境中,原生开发仍然是首选。
• Browser rendering engines have matured significantly with GPU acceleration and over a decade of stress-testing by complex web apps, making the traditional performance argument for native APIs less compelling than it once was.
• SwiftUI specifically faces criticism for poor performance, with Apple's own System Preferences app showing lag when switching sections, though some argue this is a SwiftUI issue rather than a native development problem overall.
• The distinction between SwiftUI and native development matters, as Qt C++/QML applications demonstrate significantly better performance and lower RAM usage compared to similar web apps when properly engineered.
• SwiftUI's performance profile has reportedly improved with xOS 26, and most non-trivial SwiftUI apps also incorporate UIKit/AppKit for capabilities not yet available in SwiftUI.
• Native UI development remains challenging across platforms, with wxWidgets suggested as a more stable long-term choice for developers committed to native development who want their apps to run for decades without complete rewrites.
• Despite browser engine maturity, there remains a stark performance difference between native apps and browser-based applications, particularly on low-powered devices like older Chromebooks where browsers run poorly.
• SwiftUI struggles specifically with incremental changes to large data sets, which is why Apple has historically lacked a usable SwiftUI text view component, though improvements arrived in 2025 with the enhanced TextEditor.
• RAM usage has become the primary reason to prefer native APIs over web views, as memory efficiency remains a significant differentiator even when raw performance gaps have narrowed.
• Complex applications like VS Code still hit performance ceilings lower than native apps, suggesting that while simple web apps may match native performance, demanding applications still benefit from native development.
• The V8 JavaScript engine's approach of making JavaScript extremely fast rather than finding ways to run native code in browsers proved successful, and similar progress has occurred on the rendering side, making web technologies viable for most GUI needs except specialized hardware-intensive applications.
• AppKit still outperforms both SwiftUI and web rendering in terms of performance and resource consumption, suggesting that Apple's older frameworks remain superior for demanding applications.
• Rich text rendering with proper bidirectional text support, glyph shaping, mixed content, and natural selection remains one of software's hardest problems, with browser engines being the only implementations that come close to handling all complexities correctly.
• Using WebKit to render Markdown on macOS is technically appropriate since Markdown transpiles to HTML and WebKit is the native HTML renderer, though each WKWebView instance carries significant memory and performance overhead.
• The maturity gap between web and native UI toolkits stems from investment patterns, with most development effort flowing to web technologies because they "just work," creating a cycle where native frameworks receive less attention and remain less polished.
• Historical context matters for iOS development, as even basic tasks like rendering clickable links within paragraphs were extremely difficult in Objective-C/UIKit, and developers expected SwiftUI to provide better text handling parity with web capabilities.
• HTML/CSS remains the most powerful GUI system for productivity and performance, with the web's fundamental nature as a document presentation mechanism giving it inherent advantages for text-rich applications.
• For AI chat applications requiring streaming Markdown with text selection, native solutions remain problematic, with developers reporting slow, laggy rendering and UI lockups across multiple text editor components, making web views the pragmatic choice despite their overhead.
• TextKit 2 on macOS has significant issues with its public API, requiring developers to implement workarounds for problems that shouldn't exist, though some developers are researching solutions that bypass TextKit entirely.
• The modern Apple development stack creates awkward architectural choices for chat-style UIs, as NSTextView doesn't integrate naturally with SwiftUI, forcing developers to choose between deep AppKit integration or fighting SwiftUI's model.
• Cross-platform native GUI development remains niche compared to web development, with Chrome representing the best-funded software effort globally, explaining why web technologies "just work" while native frameworks struggle with consistency.
• Electron performs poorly on older hardware, and while native UI frameworks have their own issues, developers seeking performant text rendering may need to reduce expectations regarding font complexity and avoid JSON-based configuration that carries JavaScript ecosystem baggage.
• Performance definitions vary by use case, with some developers prioritizing stable FPS and smooth scrolling during streaming over raw memory efficiency, arguing that sacrificing UX to save hundreds of megabytes or even gigabytes of RAM isn't worthwhile for apps used hours daily.
• The developer experience for native mobile development remains poor, with simple tasks like creating smooth scrollable elements requiring hours of workarounds and Stack Overflow research, driving developers toward web technologies despite their overhead.
The discussion reveals a nuanced landscape where the performance gap between native and web technologies has narrowed significantly for many use cases, yet important distinctions remain. Browser rendering engines have indeed matured to handle complex text rendering and layout tasks that native frameworks still struggle with, particularly for rich text, Markdown, and streaming content. However, this doesn't render native development obsolete, as performance on lower-powered devices, memory efficiency, and specialized applications still favor native approaches. The core tension appears to be between web technologies' superior text handling capabilities and mature ecosystem versus native development's resource efficiency and platform integration. SwiftUI specifically draws criticism for performance issues and incomplete APIs, though some defend it as a convenient option that can be supplemented with AppKit/UIKit when needed. The conversation suggests that for text-heavy applications like AI chat interfaces, web views have become the pragmatic solution despite their overhead, while native development remains preferable for resource-constrained environments and applications requiring deep platform integration.
目前,所有主要 AI 提供商在企业订阅上都在亏损,而且这是刻意为之。 OpenAI 、 Anthropic 、 Google 等公司在推行一场前所未有的行业性亏本策略,以远低于真实服务成本的价格出售强大 AI 能力。企业为这些订阅支付的费用与实际交付成本之间不是小幅差异,而是巨大的裂口;凡是把关键工作流程建立在这些补贴价格之上的组织,都站在悬崖边上。 Every major AI provider is currently losing money on enterprise subscriptions, and they are doing it deliberately. OpenAI, Anthropic, Google, and others are running an unprecedented industry-wide loss-leader program, selling powerful AI capabilities at prices that bear no relation to the actual cost of serving them. The gap between what companies pay for AI subscriptions and what those subscriptions truly cost to deliver is not a minor discrepancy. It is a massive gulf, and every organization that has built critical workflows on top of these subsidized prices is standing right on the edge of it.
目前,所有主要 AI 提供商在企业订阅上都在亏损,而且这是刻意为之。 OpenAI 、 Anthropic 、 Google 等公司在推行一场前所未有的行业性亏本策略,以远低于真实服务成本的价格出售强大 AI 能力。企业为这些订阅支付的费用与实际交付成本之间不是小幅差异,而是巨大的裂口;凡是把关键工作流程建立在这些补贴价格之上的组织,都站在悬崖边上。
数据很直白。 Claude Pro 每月 20 美元,但一个重度使用它做文档分析、撰写报告和处理数据的知识工作者,每周就可能消耗数百万 token 。按真实 API 费用计算,同样工作量每个席位每月要花 200 到 400 美元。据报道,微软在 GitHub Copilot 上每位用户每月亏损超过 20 美元,重度用户在 10 美元订阅下的实际成本可达 80 美元。有分析发现,Anthropic 用户每赚取 1 美元订阅收入,需消耗高达 8 美元的算力成本。 ChatGPT Plus 三年一直维持每月 20 美元,尽管模型能力和功能大幅提升,价格却未调整;那些在此期间锁定价格的企业买家拿到的是无法长期维持的便宜票。
所有主要厂商玩的都是同一套。 Google 把 Gemini Advanced 按消费级价格捆绑进 Google One,但对开发者却按真实 API 价格收费。 Meta 免费放出 Llama,完全靠广告收入补贴数亿次查询。 xAI 的 Grok 把 API 价压到每百万输入 token 0.20 美元,明显是以亏损换取市场份额的策略。行业普遍模式是:先用低价吸引采用、锁定企业、让 AI 成为日常工作负载,再慢慢处理账单。对企业而言,"以后"正在到来。据称 OpenAI 正从消费者业务向企业业务倾斜,因为企业端的单位经济稍好一些,而在冲刺 IPO 的过程中公司也错过了关键的营收和用户目标。
向智能体(agentic)AI 的转变,把原本就不合理的补贴算术变成了灾难性的账目。聊天机器人时代,token 消耗较可预测,一次对话可能只消耗几千 token;但智能体会自主长时间运行,token 消耗远超对话场景。有用户反映,在不到 90 分钟内就耗尽了五小时速率额度。 GitHub 决定在 2026 年 6 月 1 日改为按使用计费,正是因为扁平订阅在智能体负载下崩溃。当多个 AI 代理并行处理同一项目时,token 消耗不是对话使用的简单倍增,而是呈数量级增长,而相应席位的订阅价却没变。
大多数企业尚未做好准备。过去两年里,成千上万家公司已将 AI 订阅深度嵌入营销、工程、研发、客户成功和财务等业务流程。这些已不再是试验,而是支撑业务运转的核心流程,大部分公司仍按当前订阅价做预算。一个 50 人团队用 Claude Pro 每月只要 1000 美元,在损益表上只是个小数目;但按等量 API 使用计费,那支团队每月要花 1.5 万到 4 万美元。价格一旦调整,那些把 20 美元 / 月的 AI 视为永久廉价投入的公司,将面临未预算的巨额账单,而此时相关工作流程已深度嵌入、难以拆除。 KPMG 发现,美国企业预计未来 12 个月平均 AI 支出为 2.07 亿美元,几乎是上一年的两倍;高盛的调查也显示,许多大公司已经大幅超支其 AI 预算。
推动重新定价的机制已在运转。 OpenAI 和 Anthropic 都在为 IPO 做准备。据报道,Anthropic 年化收入已超过 300 亿美元,高于 2025 年底的 90 亿美元;OpenAI 的收入轨迹约为 250 亿美元。但成本端则更为严峻。 OpenAI 预计到 2029 年累计现金消耗为 1150 亿美元,并承诺到 2030 年投入 6650 亿美元的算力支出。 Oracle 在一个财年内举债 430 亿美元为 OpenAI 建数据中心。公司一旦上市,缩小订阅价与实际成本之间差距就成了生死问题:公开市场要利润、分析师要合理的单位经济、投资者要不依赖无尽融资的盈利路径。要最快弥合差距,最直接的办法就是涨价、设限或转为按用量计费。
信号已经很清楚。 GitHub 将自 6 月 1 日起改为按使用量计费,用基于 token 的 AI Credits 取代固定费用的高级请求额度。微软在四年内已两次上调 Microsoft 365 价格,最新一轮直接与 AI 基础设施成本挂钩。 OpenAI 推出了 100 美元的 Pro 层,定位为重度用户的"真实"价格;Anthropic 每月 200 美元的 Max 层也预示着补贴结束后真实使用成本的可能水平。正如一位行业高管所言,AI 领域的圈地竞争规模巨大,主导这一新世界的代价同样巨大。将这些服务货币化并回收投资,将迫使商业模式和定价快速发生重大变化。
企业领导者现在就必须行动:审计各团队的实际 token 消耗,而不仅仅统计席位数;建立情景模型,测算在当前价格的 2 倍、 5 倍或 10 倍下 AI 成本的走向;在技术栈中构建供应商可选性,避免单一提供商的定价变动一夜之间毁掉预算;并在账单到来前与财务团队进行坦诚对话。如今企业为 AI 支付的价格与 18 个月后将要支付的价格之间的差距,很可能成为多数公司历史上最具破坏性的成本跳增之一。补贴时代正在走向终结,倒计时已经开始,而大多数企业甚至还没开始这场对话。
Every major AI provider is currently losing money on enterprise subscriptions, and they are doing it deliberately. OpenAI, Anthropic, Google, and others are running an unprecedented industry-wide loss-leader program, selling powerful AI capabilities at prices that bear no relation to the actual cost of serving them. The gap between what companies pay for AI subscriptions and what those subscriptions truly cost to deliver is not a minor discrepancy. It is a massive gulf, and every organization that has built critical workflows on top of these subsidized prices is standing right on the edge of it.
The math is stark. Claude Pro costs $20 a month, but a knowledge worker using it heavily for document analysis, report drafting, and data work can easily burn through several million tokens per week. At actual API rates, that same workload would run between $200 and $400 per seat per month. Microsoft was reportedly losing over $20 per user per month on GitHub Copilot, with power users costing $80 a month on a $10 subscription. One analysis found Anthropic users consuming upwards of $8 in compute for every $1 of subscription revenue. ChatGPT Plus has been $20 a month for three years, even as the models became dramatically more capable and feature-rich. The price never moved, and enterprise buyers who locked in rates during this window got a deal that cannot last.
Every major provider is playing the same game. Google bundles Gemini Advanced into Google One at consumer prices while charging developers real API rates for the same models. Meta gives away Llama for free, subsidizing hundreds of millions of queries entirely through ad revenue. xAI's Grok undercuts everyone on API pricing at $0.20 per million input tokens, a number that only makes sense as a market-share grab funded by losses. The pattern across the board is identical. Price for adoption, lock organizations in, make AI a load-bearing part of daily workflows, and worry about the bill later. For enterprises, "later" is arriving. OpenAI is reportedly pivoting away from consumer bets toward enterprise, where the unit economics are slightly less ruinous, and the company missed key revenue and user targets in its sprint toward an IPO.
The shift to agentic AI has turned bad subsidy math into catastrophic math. When AI was a chatbot, token consumption was relatively predictable. A conversation might run a few thousand tokens. But agentic sessions run autonomously for extended periods, burning through tokens at rates that dwarf conversational usage. Users have reported exhausting five-hour rate limit windows in under 90 minutes. GitHub is moving to usage-based billing on June 1, 2026, specifically because the flat-fee model collapsed under agentic workloads. When multiple AI agents work in parallel on a single project, the token burn is not a multiple of chat usage. It is an order of magnitude more, while the subscription price on that seat has not changed.
Most enterprises are not prepared for what is coming. Over the past two years, thousands of companies have woven AI subscriptions deep into operations across marketing, engineering, research, customer success, and finance. These are not experiments anymore. They are load-bearing workflows, and most companies are still budgeting at current subscription prices. A team of 50 on Claude Pro costs $1,000 a month, a rounding error on the P&L. But the equivalent API usage for that same team would run between $15,000 and $40,000 a month. When prices adjust, the companies that treated $20-a-month AI as a permanently cheap input will get hit with bills they never budgeted for, at a time when the workflows are too embedded to rip out. KPMG found U.S. organizations projecting average AI spending of $207 million over the next 12 months, nearly double the previous year, while a Goldman Sachs survey found many large companies already overrunning their AI budgets by orders of magnitude.
The specific mechanism forcing repricing is already in motion. Both OpenAI and Anthropic are preparing for IPOs. Anthropic has reportedly surpassed $30 billion in annualized revenue, up from $9 billion at the end of 2025, while OpenAI is on pace for roughly $25 billion. But the cost side tells a different story. OpenAI projects $115 billion in cumulative cash burn through 2029 and has committed to $665 billion in compute spending by 2030. Oracle took on $43 billion in debt in a single fiscal year to build data centers for OpenAI. When these companies go public, the pressure to close the gap between subscription price and actual cost becomes existential. Public markets demand margins, analysts demand unit economics, and investors demand a path to profitability that does not depend on infinite fundraising. The fastest way to close that gap is to raise prices, impose usage caps, or shift to consumption-based billing.
The signals are already visible. GitHub is moving to usage-based billing on June 1, replacing flat-rate premium requests with token-based AI Credits. Microsoft has raised Microsoft 365 prices twice in four years, with the latest round tied directly to AI infrastructure costs. OpenAI has introduced a $100 Pro tier positioned as the new real price for heavy users. Anthropic's Max tier at $200 a month provides a preview of what committed usage will actually cost when subsidies end. As one industry executive put it, the AI land-grab is on a colossal scale, and the price tag for dominating this new world is equally colossal. Monetizing these services and recouping investment is going to force significant changes in business models and pricing, and those changes are likely to happen fast.
Enterprise leaders need to act now. That means auditing actual token consumption across teams, not just counting seats. It means modeling what AI costs look like at two, five, or ten times current prices. It means building vendor optionality into the stack so no single provider's pricing change can blow up the budget overnight. And it means having an honest conversation with the finance team before the bill arrives. The gap between what organizations pay for AI today and what they will pay in 18 months is going to be one of the most disruptive line-item increases most companies have ever absorbed. The subsidy era is ending, the clock is running, and most enterprises have not even started the conversation.
• 关于"AI 订阅是定时炸弹"的核心论点在多方面被质疑。评论者指出,本地运行最前沿的模型需要极高的硬件配置(例如 1.5–6 Ti 的显存),在可预见的未来,云端托管在成本效率上仍优于本地部署;此外,本地模型普遍落后领先模型 6–18 个月,尽管计算效率可能提升,但硬件成本的下限仍然很高。
• 对 AI 公司靠代币销售是否盈利存在重大争议。有人引用 Brad Gerstner 的话说代币并非亏本出售,但反对者指出这忽略了庞大的研发、训练与基础设施开销。有证据表明,高达 70% 的算力支出用于研发,像 Anthropic 这样的公司尽管估值高企,仍在不断烧钱。
• 对商业模式的批评主要集中在补贴获客:AI 实验室用低于成本的价格锁定企业客户,期望日后提价以收回成本。但也有人认为,这与其说是对企业的"定时炸弹",不如说是投资者承担的风险——若市场无法整合或实现盈利,投资者可能永远收不回数万亿美元的投入。
• 企业的付费模式使"订阅"论述更复杂。许多公司通过按使用量计费的 API(如经由 Azure 或 Bedrock)结算,而非固定订阅费。订阅在中小企业或影子 IT 中更常见,但大型企业通常谈判基于使用量的合同。真正的风险在于那些把补贴性 AI 深度嵌入核心工作流、却没有为未来成本变化做规划的组织。
• 开源和中国的模型(如 GLM 、 Kimi 、 DeepSeek)被视为潜在竞争压力,但在西方企业中的采用受限于地缘政治、法律和信任问题。即便在技术上可比,这类模型仍因数据主权和监管风险而难以被广泛接受,造成可负担且值得信赖的替代方案缺口。
• 模型架构效率的提升(例如更小的激活参数、更好的量化方法)有望逐步降低成本。像 Qwen 27B A3B 这种性能接近更大模型的例子表明,性价比会提升,可能推动更多本地或边缘部署,进而减少对集中式提供商的依赖。
• 讨论中反映出对 AI 炒作的普遍怀疑。多人将原文斥为"AI 废话"——戏剧化、论证薄弱,甚至可能是 AI 生成,批评点包括措辞重复、缺乏证据,以及未能区分消费者订阅与企业计费模式。
• 有观察者将此与历史科技周期类比:先以低价抢占市场,再转为按量计费,类似云计算的发展路径。也有人警告,如果当前以股权驱动、持续融资为特征的模式在实现可持续收入前崩溃,可能引发更广泛的债务或资本危机。
• 讨论中还含有文化层面的批评:AI 生成的语言(如"load-bearing"、"the unlock")在企业场景常被视为表演性信号,领导层鼓励使用但技术人员往往嗤之以鼻,反映了关于真实性、技能贬值和沟通商品化的紧张关系。
• 尽管对成本与可持续性有所担忧,许多人承认 AI 工具确实能带来显著价值——尤其是像 Claude Code 或 Codex 这样的编码助手——即便这些工具在一定程度上被补贴。企业面临的挑战是如何在不陷入对不透明且可能波动的定价模式的不可逆依赖下,战略性地利用这些工具。
• 总体来看,讨论显示出对原文耸人听闻框架的强烈怀疑,参与者强调 AI 经济的复杂性、消费者与企业计费的差异,以及模型效率的持续演进。尽管担忧供应商提价与锁定风险是合理的,很多人认为真正的风险更多落在投资者一方,尤其是在开源替代方案和硬件改进持续重塑格局的背景下。对话还反映出人们对 AI 生成内容和企业术语日益增长的厌倦,呼唤更实证、务实的讨论。
• The core argument that AI subscriptions are a "ticking time bomb" is challenged on multiple fronts. Several commenters point out that running frontier-quality models locally requires extraordinary hardware (e.g., 1.5–6 Ti of VRAM), making cloud hosting far more cost-efficient for the foreseeable future. Others note that local models currently lag behind frontier models by 6–18 months, and while efficiency improvements are expected, the hardware cost floor remains high.
• There is significant skepticism about whether AI companies are currently profitable on token sales. While one commenter cites Brad Gerstner's claim that tokens aren't sold at a loss, others counter that this ignores massive R&D, training, and infrastructure costs. Evidence suggests that up to 70% of compute spending goes to R&D, and companies like Anthropic are still burning cash despite high valuations.
• The business model critique centers on subsidized adoption: AI labs are pricing below cost to lock in enterprise users, with the expectation of raising prices once dependence is established. However, some argue this is less a "time bomb" for enterprises and more a risk for investors, who may never recoup trillions in capital if the market doesn't consolidate or monetize effectively.
• Enterprise usage patterns complicate the subscription narrative. Many companies already use metered API billing (e.g., via Azure or Bedrock), not flat-rate subscriptions. Subscriptions are more common among SMEs or via shadow IT, but large enterprises typically negotiate usage-based contracts. The real risk lies in organizations that deeply integrate subsidized AI into core workflows without planning for future cost shifts.
• Open-source and Chinese models (e.g., GLM, Kimi, DeepSeek) are seen as competitive pressures, but adoption in Western enterprises is limited by geopolitical, legal, and trust concerns. Even when technically comparable, these models face resistance due to data sovereignty and regulatory risks, leaving a gap in affordable, trusted alternatives.
• Efficiency gains in model architecture (e.g., smaller active parameters, better quantization) are expected to reduce costs over time. Examples like Qwen 27B A3B performing nearly as well as much larger models suggest that capability-per-dollar will improve, potentially enabling more on-premise or edge deployment and reducing reliance on centralized providers.
• The discussion reflects broader skepticism about AI hype, with multiple commenters dismissing the original article as "AI slop"—overly dramatic, poorly argued, and possibly generated by AI itself. Criticisms include repetitive phrasing, lack of evidence, and failure to distinguish between consumer subscriptions and enterprise billing models.
• Some observers draw parallels to historical tech cycles: land-grab pricing followed by metered billing, similar to cloud computing's evolution. Others warn of a potential debt crisis if current funding models (equity-fueled, circular investment) collapse before sustainable revenue is achieved.
• There's a cultural critique embedded in the discussion: the use of AI-generated language (e.g., "load-bearing," "the unlock") is seen as performative signaling within corporate environments, often encouraged by leadership but viewed with disdain by technical staff. This reflects tensions around authenticity, skill depreciation, and the commodification of communication.
• Despite concerns about cost and sustainability, there's acknowledgment that AI tools provide significant value—especially in coding assistants like Claude Code or Codex—even if heavily subsidized. The challenge for enterprises is to leverage these tools strategically without becoming irreversibly dependent on opaque, potentially volatile pricing models.
The discussion reveals deep skepticism toward the original article's alarmist framing, with participants emphasizing the complexity of AI economics, the distinction between consumer and enterprise billing, and the ongoing evolution of model efficiency. While concerns about future price increases and vendor lock-in are valid, many argue that the real risks lie with investors rather than end users, especially as open-source alternatives and hardware improvements continue to reshape the landscape. The conversation also highlights growing fatigue with AI-generated content and corporate jargon, underscoring a desire for more substantive, evidence-based discourse.
the UK's Department for Science, Innovation and Technology 正在就帮助年轻人应对数字世界征求意见,尤其关注用户规避 Online Safety Act 所要求的年龄验证问题。该部正在考虑的一项建议是对虚拟专用网络(VPN)设置年龄门槛。 Mozilla 在回应中明确表示:VPN 是至关重要的隐私与安全工具,不应被限制,尤其不能限制年轻人使用。 The UK's Department for Science, Innovation and Technology is currently consulting on measures to help young people navigate the digital world, particularly in light of users circumventing age assurance systems required under the Online Safety Act. One proposal under consideration is age-gating virtual private networks, or VPNs. Mozilla has responded to this consultation with a clear stance: VPNs are essential privacy and security tools that should not be restricted, especially for young people.
the UK's Department for Science, Innovation and Technology 正在就帮助年轻人应对数字世界征求意见,尤其关注用户规避 Online Safety Act 所要求的年龄验证问题。该部正在考虑的一项建议是对虚拟专用网络(VPN)设置年龄门槛。 Mozilla 在回应中明确表示:VPN 是至关重要的隐私与安全工具,不应被限制,尤其不能限制年轻人使用。
Mozilla 的立场基于其核心使命:互联网应保持开放与可及,在线隐私与安全是基本人权。虽然 Mozilla 承认保护未成年人是当下最紧迫的挑战之一,但它认为,诸如强制年龄验证或限制 VPN 访问等生硬手段并不能切实提升安全,反而会损害所有用户的基本权利,且无法解决根本问题。
VPN 为各年龄段用户提供重要保护:通过隐藏 IP 地址来保护位置信息、减少追踪并防止基于 IP 的画像。人们使用 VPN 的理由多种多样,从远程连接学校或公司网络,到规避审查等均属合法需要。虽然这些工具对活动家、异见人士和记者等脆弱群体尤为关键,但它们同样能提升每个人的基本在线安全。
年轻人在网络上面临特殊脆弱性,包括被追踪、被定向投放广告,以及个人数据在未获充分同意或缺乏透明度的情况下被商业化收集和处理。随着越来越多年轻人从更早年龄开始使用数字技术,限制他们获得 VPN 等隐私保护工具,反而与培养他们安全且熟练上网的目标相悖。要让年轻人形成自主性和负责任的数字习惯,应在他们接触网络时教授最佳实践以及必要的安全与隐私工具。
Mozilla 认为,与其对 VPN 等技术设限,不如把精力放在解决在线危害的根源:追究平台责任、鼓励负责任地使用家长控制功能,并通过社会各界共同参与的方式投资数字技能教育、促进数字福祉。 Mozilla 已向 the UK's Department for Science, Innovation and Technology 提交完整回应,敦促决策者在保护年轻人的同时,不要破坏开放网络或削弱必要的隐私工具。
The UK's Department for Science, Innovation and Technology is currently consulting on measures to help young people navigate the digital world, particularly in light of users circumventing age assurance systems required under the Online Safety Act. One proposal under consideration is age-gating virtual private networks, or VPNs. Mozilla has responded to this consultation with a clear stance: VPNs are essential privacy and security tools that should not be restricted, especially for young people.
Mozilla's position is rooted in its core mission that the internet must remain open and accessible, with privacy and security online recognized as fundamental human rights. While the organization acknowledges that protecting young people online is one of the most pressing challenges of our time, it argues that blunt interventions like mandatory age assurance and restricting VPN access are ineffective at actually improving safety. These measures, Mozilla warns, undermine the fundamental rights of all users without addressing the real problems.
VPNs serve critical functions for users of all ages. By hiding IP addresses, they protect users' locations, reduce tracking, and prevent IP-based profiling. People use VPNs for a wide range of legitimate purposes, from connecting to school or employer networks remotely to avoiding censorship. While these tools are especially vital for vulnerable groups like activists, dissidents, and journalists, they improve baseline online protection for everyone.
Young people face particular vulnerabilities online, including tracking, targeted advertising, and the risks that come from personal data being collected and processed commercially without adequate consent or transparency. Since young people are engaging with digital technologies from increasingly early ages, restricting their access to privacy-protecting tools like VPNs actually conflicts with the goal of equipping them to navigate the internet safely and competently. To develop agency and responsible digital habits, young people need to be introduced to best practices and key safety and privacy tools as they engage with the online world.
Rather than age-gating technologies like VPNs, Mozilla believes regulators should focus on addressing the root causes of online harm. This means holding platforms accountable, encouraging responsible use of parental controls, and investing in digital skills education through a whole-of-society approach to digital wellbeing. Mozilla has submitted its full response to the Department for Science, Innovation and Technology, urging policymakers to pursue solutions that protect young people without compromising the open web or undermining essential privacy tools.
• 澳大利亚政府一方面通过其 eSafety Commissioner 的指南鼓励使用 VPN,另一方面又在推进年龄验证法,而 VPN 正好能规避这些规定,暴露出其数字政策中的内在矛盾。
• 英国的《在线安全法案》名义上是保护儿童,实质上却转向了明确的"控制在线话语"。 Ofcom 在法案通过后的第二天就承认了这一意图,揭示了立法背后的威权动机。
• VPN 是重要的隐私工具,但各国政府越来越倾向于限制甚至禁止它们,因为 VPN 让公民能够绕过以儿童保护为名的监控基础设施;欧盟也在考虑类似的限制。
• Mozilla 提倡使用 VPN 存在利益冲突的疑虑——它同时也是 VPN 服务的经销商。尽管将基金会与公司分离可以缓解部分担忧,但更公开透明地处理两者关系更合乎伦理。
• 通过 JavaScript 浏览器指纹等数据融合技术,即便用户使用 VPN 也可能被去匿名化,这削弱了这些工具在对抗复杂追踪系统时的隐私保护作用。
• 互联网已从去中心化、用户驱动的空间,彻底变成由广告科技公司主导的掠夺性生态。这些公司雇佣心理学家和工程师,通过触发机制最大化用户成瘾和参与度,与早期危害较小的网络相比判若两界。
• 那些在互联网早期就接触网络的几代人并非"毫发无伤":社交技能下降、 Gen Z 性行为减少以及普遍逃避成年责任等现象,挑战了互联网无害的说法。
• 历史表明,广泛的互联网限制往往在公众亲身感受到其影响之前就能获得支持。 HN 用户曾支持监管,直到身份验证要求让监控的实际影响变得具体可感。
• 英国精英阶层长期存在威权倾向,过去多被细微差别掩盖,但在新冠封锁期间暴露无遗。公众对保姆式国家政策的支持更多反映了对被统治的某种依赖,而非真正关怀民众福祉。
• 终结 PIPA 和 SOPA 的运动曾展示出对权力滥用的集体抵抗,但当前的数字监控基础设施表明,这种抵抗力已经削弱,或在企业与政府联合时显得无力。
讨论总体揭示了既定政策目标与实际执行之间的根本张力,儿童保护常被用作西方民主国家扩张监控基础设施的借口。参与者一致指出监管俘获的模式——像 Meta 这样的公司在表面上推诿责任的同时,却能左右政府行为。互联网从去中心化、用户驱动的空间演变为以广告技术为核心的掠夺性生态,带来了实质性的危害,使"互联网好 / 坏"的简单二元论不再成立。多数人认为,当代几代正面临前所未有的心理操纵,但关于这是否构成新的危机,还是仅是围绕新技术的又一轮道德恐慌,仍有分歧。讨论最终反映出对机构动机的深切怀疑;参与者注意到,所谓的保护性话语一旦立法通过,便很快让位于对言论的控制。
• The Australian government paradoxically promotes VPN usage through its eSafety Commissioner's guide while simultaneously enforcing age verification laws that VPNs help circumvent, creating an inherent contradiction in their digital policy approach.
• The UK's Online Safety Act represents a shift from child protection rhetoric to explicit "controlling online discourse," with Ofcom admitting this goal just one day after the act passed, revealing the true authoritarian intent behind the legislation.
• VPNs are essential privacy tools that governments increasingly want to ban because they enable citizens to bypass surveillance infrastructure disguised as child protection measures, with the EU also considering similar restrictions.
• Mozilla's advocacy for VPN usage appears conflicted given their role as a VPN reseller, though separating the foundation from the corporation mitigates this concern, but transparency about this relationship would be more ethical.
• Data fusion techniques using JavaScript browser fingerprinting can de-anonymize users even when they use VPNs, undermining the privacy benefits these tools provide against sophisticated tracking systems.
• The internet has fundamentally changed from a decentralized, user-driven space to a predatory environment engineered by adtech companies employing psychologists and programmers to maximize addiction and engagement through triggering content, making it incomparable to the earlier, less harmful version.
• Two generations who grew up with early internet access did not turn out "fine," as evidenced by declining social skills, reduced sexual activity among Gen Z, and widespread avoidance of adult responsibilities, contradicting the narrative that internet exposure was benign.
• Historical patterns show that blanket internet restrictions gain public support until individuals experience their direct impact, with HN users previously advocating for regulation until ID requirements made the surveillance implications tangible.
• UK elites have consistently maintained an authoritarian streak, previously masked by subtlety but exposed during Covid lockdowns, with public support for nanny-state policies reflecting a societal desire for being ruled rather than genuine concern for welfare.
• The movement to terminate PIPA and SOPA demonstrated collective action against overreach, but current digital surveillance infrastructure suggests such resistance has diminished or become ineffective against coordinated corporate-government alignment.
The discussion reveals a fundamental tension between stated policy goals and actual implementation, where child protection serves as pretext for expanding surveillance infrastructure across Western democracies. Participants consistently identify patterns of regulatory capture, with corporations like Meta influencing government actions while maintaining plausible deniability. The evolution of the internet from a decentralized, user-driven space to a predatory adtech ecosystem has created genuine harms that complicate simplistic "internet good/bad" narratives. There's broad recognition that current generations face unprecedented psychological manipulation, though disagreement persists about whether this constitutes a novel crisis or merely the latest iteration of moral panics surrounding new technologies. The conversation ultimately reflects deep skepticism toward institutional motives, with participants noting how quickly protective rhetoric gives way to control objectives once legislation passes.
Tesla 的 Solar Roof 曾被誉为能用美观一体化的太阳能瓦片取代整片屋顶、变革住宅太阳能的"革命性"产品。 Elon Musk 在 2016 年提出这一概念并许下宏愿,包括到 2019 年底实现每周安装 1,000 套,并声称其成本低于传统屋顶加传统太阳能板。不过近十年过去,Tesla 仅累计安装约 3,000 套 Solar Roof,已停止披露部署数据,并悄然将重心转向传统太阳能板。 Tesla's Solar Roof, once heralded as a revolutionary product that would transform residential solar by replacing entire roofs with beautiful, integrated solar tiles, is effectively on life support. Elon Musk unveiled the concept in 2016 with ambitious promises, including a target of 1,000 installations per week by the end of 2019 and claims that it would cost less than a conventional roof plus traditional panels. Nearly a decade later, Tesla has installed only about 3,000 Solar Roof systems total, stopped reporting deployment numbers entirely, and has quietly shifted its focus to conventional solar panels.
Tesla 的 Solar Roof 曾被誉为能用美观一体化的太阳能瓦片取代整片屋顶、变革住宅太阳能的"革命性"产品。 Elon Musk 在 2016 年提出这一概念并许下宏愿,包括到 2019 年底实现每周安装 1,000 套,并声称其成本低于传统屋顶加传统太阳能板。不过近十年过去,Tesla 仅累计安装约 3,000 套 Solar Roof,已停止披露部署数据,并悄然将重心转向传统太阳能板。
承诺与现实之间的差距十分明显。该公司直到 2020 年才实现小规模量产,较原计划晚了三年;在 2022 年第二季度的峰值期,每周仅部署约 23 套,离每周 1,000 套的目标相差 97.7% 。自 2022 年第四季度起,整体太阳能部署量至少连续四个季度下降,到 2024 年第一季度,Tesla 直接从季度报告中删除了太阳能部署数据,此后几乎不再在公开场合提及 Solar Roof 。
现有 Solar Roof 业主的处境尤为令人沮丧。 Tesla 基本退出了直营安装,不再提供在线报价,而是把客户导向一小批第三方认证安装商。在部分地区(如 Florida),Tesla 甚至取消了太阳能项目,所有可用施工队都转去做维修。这种第三方模式导致结构性矛盾:安装方指责 Tesla 的设计问题,Tesla 又将责任推给安装方,客户成了夹缝中的受害者。客户服务投诉普遍存在,Tesla Energy 在 SolarReviews 上的评分仅为 2.5 分(满分 5 分),论坛里充斥着客户反映服务等待数月、难以联系支持团队的案例。
产品本身也存在未解决的技术问题。 Solar Roof 采用组串式逆变器,而非微逆或功率优化器,这意味着局部遮阴可能导致整条串路停产,竞争对手则通过组件级优化解决了这一问题。业主反映系统发电量比合同预估低 20% 甚至更多,而 Tesla 有时以天气为由拒绝服务请求。经济性自始便成问题:未计补贴的平均 Solar Roof 造价约为 106,000 美元,而传统屋顶加传统太阳能板约为 60,000 美元,导致 Solar Roof 的回收期在 15–25 年之间,而传统方案约为 7–12 年。 2023 年,Tesla 为一起客户指控其存在"诱饵调包"定价的集体诉讼以 600 万美元和解。
Tesla 自身的动作也证明了战略转向。官方 X 账号上最后一次专门发布 Solar Roof 的内容是在 2023 年 6 月,之后公司在社交媒体上更多推广 Powerwall 、 Megapack 及其新款传统太阳能板。财报电话会议上几乎听不到对 Solar Roof 的讨论;当能源工程副总裁在 2025 年第三季度财报会上发布新的住宅太阳能产品时,推出的也是 TSP-420 这种传统太阳能板,而非 Solar Roof 的更新。
如今 Tesla 已全面押注传统太阳能板。 TSP-420 于 2026 年初在 Gigafactory New York(位于 Buffalo)组装面世,采用 18 区域功率优化系统,恰好能解决困扰 Solar Roof 组串式逆变器的遮阴问题。 Elon Musk 在达沃斯宣布,Tesla 计划在美国建设每年 100 GW 的太阳能制造产能,并据称正洽谈购买价值 29 亿美元的中国太阳能设备以实现该目标。公司五年来首次扩充太阳能团队,并推出新的太阳能租赁产品,这些动作都集中在传统太阳能板的制造与推广上,而非 Solar Roof 瓦片。
从商业角度看,转向传统太阳能板可能是正确之举:制造成本更低、安装更快、消费者经济性更好。但这并不能抹去一个事实:Tesla 曾就产量、能源独立性和使用寿命等方面向 Solar Roof 客户做出具体承诺,随后却在未作公开说明的情况下悄然放弃。公司在数据变得尴尬时停止披露,改由第三方安装并将能源团队转向其他产品。 Solar Roof 尚未被官方宣告终止,但已被束之高阁,留下客户与承诺之间的空白,而 Tesla 则继续追逐下一个热点。
Tesla's Solar Roof, once heralded as a revolutionary product that would transform residential solar by replacing entire roofs with beautiful, integrated solar tiles, is effectively on life support. Elon Musk unveiled the concept in 2016 with ambitious promises, including a target of 1,000 installations per week by the end of 2019 and claims that it would cost less than a conventional roof plus traditional panels. Nearly a decade later, Tesla has installed only about 3,000 Solar Roof systems total, stopped reporting deployment numbers entirely, and has quietly shifted its focus to conventional solar panels.
The gap between Tesla's promises and reality is stark. The company didn't reach even small-scale volume production until 2020, three years behind schedule, and at its peak in Q2 2022, it deployed only about 23 roofs per week, 97.7% short of the 1,000-per-week target. Tesla's overall solar deployments declined for at least four consecutive quarters after Q4 2022, and in Q1 2024, the company simply removed solar deployment figures from its quarterly reports. Since then, Tesla has virtually stopped mentioning the Solar Roof tiles in any public communications.
For existing Solar Roof owners, the situation is particularly frustrating. Tesla has largely exited direct installation, no longer providing online quotes and instead directing customers to a small network of third-party certified installers. In some regions, like Florida, Tesla has canceled solar projects entirely, with all available crews devoted to repairs. This third-party model creates a structural problem where installers blame Tesla's design, Tesla blames the installers, and customers are left in the middle. Customer service complaints are widespread, with Tesla Energy holding a 2.6 out of 5 rating on SolarReviews and forums filled with reports of months-long service waits and unreachable support teams.
The product itself has unresolved technical issues. Tesla's Solar Roof uses string inverters rather than micro-inverters or power optimizers, meaning partial shading on any section can shut down production for an entire string, a problem competing installers address with panel-level optimization technology. Owners have reported systems underperforming contracted estimates by 20% or more, and Tesla has reportedly declined some service requests, attributing underperformance to weather conditions. The economics were also problematic from the start, with an average Solar Roof costing approximately $106,000 before incentives compared to roughly $60,000 for a traditional roof plus conventional panels, resulting in a payback period of 15-25 years versus 7-12 years for traditional panels. Tesla settled a $6 million class-action lawsuit in 2023 after customers accused the company of bait-and-switch pricing.
Tesla's own behavior confirms the strategic pivot. The last dedicated Solar Roof post on Tesla's official X account was in June 2023, nearly two years ago, while the company regularly promotes Powerwall, Megapack, and its new conventional solar panels on social media. On earnings calls, Solar Roof barely registers, and when Tesla's VP of Energy Engineering announced a new residential solar product during the Q3 2025 earnings call, it was the TSP-420 conventional panel, not a Solar Roof update.
Tesla has now fully committed to conventional panels, launching the TSP-420 assembled at Gigafactory New York in Buffalo in early 2026, featuring an 18-zone power optimization system that ironically addresses the shading problem plaguing Solar Roof's string inverter architecture. Elon Musk announced at Davos that Tesla aims to build 100 GW per year of US solar manufacturing capacity, reportedly in talks to buy $2.9 billion in Chinese solar equipment to achieve this goal. The company has expanded its solar team for the first time in five years and launched a new solar lease product, all focused on conventional panel manufacturing rather than Solar Roof tiles.
While the pivot to conventional panels is likely the right business decision given their lower manufacturing costs, faster installation, and better consumer economics, it doesn't change the fact that Tesla made specific promises to Solar Roof customers about production levels, energy independence, and lifetime durability, then quietly walked away from those commitments without public acknowledgment. The company stopped reporting numbers when they became embarrassing, shifted installations to third parties, and redirected its energy team to different products. Solar Roof isn't officially dead, but it's been left to fade away while Tesla pursues its next headline.
Tesla 的太阳能屋顶在经济性上存在根本性问题。标价约为 10.6 万美元,比传统屋顶加太阳能面板的组合高出约 4.6 万美元,导致投资回收期长达 15–25 年,而传统太阳能系统通常只需 7–12 年即可回本。
该产品似乎是在 2016 年匆忙推向市场,主要用于证明 Tesla 收购 SolarCity 的合理性。 SolarCity 是 Elon Musk 的表亲经营的一家失败公司,收购后继续用股东资金推进开发。
客户服务长期表现不佳,Tesla Energy 在五分制评分中仅得 2.6 分。公司因诱饵式调价达成了约 600 万美元的集体诉讼和解;有客户合同价从 7.2 万美元翻至 14.6 万美元。
小瓷砖设计带来了重大技术挑战,包括大量连接点影响可靠性、需要专业劳动力的复杂安装,以及相比在现有屋顶上改装标准面板更高的成本。
标准太阳能板已变得非常便宜且高效,使得一体化屋顶在经济上难以竞争。在 United Kingdom,一个 9.2kW 系统的面板费用现在约为 1000 英镑,这使得手工铺装的太阳能瓷砖在大众市场上缺乏经济可行性。
一体化太阳能屋顶目前唯一看得通的用途,可能是那些受严格文物保护或 HOA 美学限制的地区。但即便普通太阳能屋顶通常也要十年才回本,Tesla 的高价仍难以自洽。
Sunstyle 、 Invisible Solar 和 Roofit.solar 等公司提供替代的一体化方案,采用更大尺寸的瓷砖或面板,可与屋面材料齐平安装,同时具备更好的经济性。
各地区的太阳能经济性差异很大。由于税收优惠,United Kingdom 的安装可在约 14 个月内回本;Ireland 的安装受益于政府补助;在 Australia,一套 6.6kW 系统的费用约为 4500–6000 美元。
从太阳能屋顶到自动驾驶再到隧道工程,Tesla 经常提前宣称革命性产品,这种模式更像是为影响股价而非等待产品成熟。
Tesla 的封闭生态系统策略也体现在 PowerWall 等产品上,获取实时数据通常需要通过定制 API 的复杂方式,限制了用户对自身能源数据的控制权。
总体讨论显示,公众普遍对 Tesla 太阳能屋顶的可行性持怀疑态度,认为其存在根本性的经济缺陷并伴随可疑的商业动机。共识是,尽管一体化太阳能屋顶在美观上有吸引力,但标准太阳能面板的快速商品化已使高端一体化解决方案对大多数消费者在经济上不合算。
多位评论者将太阳能屋顶的失败历史,与 Tesla 更广泛的商业惯例联系起来——包括仓促发布产品、糟糕的客户服务,以及限制用户自主权的封闭生态系统。同时讨论还强调,政府激励、电价和气候等地区因素会显著影响太阳能的经济性:在某些地区回收期不到两年,而在另一些地区则难以证明投资合理。
• The Tesla Solar Roof's fundamental economics are deeply flawed, with a $106,000 price tag creating a $46,000 premium over traditional roof plus panel combinations, resulting in payback periods of 15-25 years versus 7-12 years for conventional solar.
• The product appears to have been rushed to market in 2016 primarily to justify Tesla's acquisition of SolarCity, a failing company run by Elon Musk's cousins, with development continuing post-acquisition using shareholder money.
• Customer service has been consistently poor, with Tesla Energy holding a 2.6/5 rating, and the company settled a $6 million class-action lawsuit over bait-and-switch pricing where one customer's contract price doubled from $72,000 to $146,000.
• The small tile design creates significant technical challenges including numerous interconnections impacting reliability, complex installation requiring specialized labor, and higher costs compared to simply retrofitting standard panels onto existing roofs.
• Standard solar panels have become so cheap and efficient that integrated solutions struggle to compete economically, with panels now costing around £1,000 for a 9.2kW system in the UK, making artisanal solar tiles economically unviable for mass markets.
• The only legitimate use case for integrated solar roofs appears to be areas with strict heritage or HOA aesthetic constraints, though even regular solar takes a decade to break even, making Tesla's premium pricing impossible to justify.
• Alternative integrated solutions exist from companies like Sunstyle, Invisible Solar, and Roofit.solar, offering larger tiles or panels that integrate flush with roofing materials while maintaining better economics.
• Regional solar economics vary significantly, with UK installations achieving 14-month payback periods due to tax advantages, while Irish installations benefit from government grants, and Australian systems cost around $4,500-6,000 for 6.6kW setups.
• The broader pattern of Tesla product announcements, from Solar Roof to self-driving cars to tunnels, suggests a tendency to announce revolutionary products prematurely, often appearing to serve stock price manipulation rather than genuine product readiness.
• Tesla's closed ecosystem approach extends to products like the PowerWall, where accessing real-time data requires complex workarounds through custom APIs, limiting user control over their own energy data.
The discussion reveals widespread skepticism about the Tesla Solar Roof's viability, with participants identifying both fundamental economic flaws and questionable business motivations behind its development. The consensus suggests that while integrated solar roofing has aesthetic appeal, the rapid commoditization of standard solar panels has made premium integrated solutions economically unjustifiable for most consumers. Multiple commenters draw connections between the Solar Roof's troubled history and broader patterns in Tesla's business practices, including rushed product launches, poor customer service, and closed ecosystems that limit user autonomy. The conversation also highlights how regional factors like government incentives, electricity costs, and climate significantly impact solar economics, with some areas achieving payback periods under two years while others struggle to justify the investment.
当 Fisker Inc. 于 2024 年 6 月申请第 11 章破产时,大约 11,000 名 Ocean SUV 车主被留下来了——这些车售价在 40,000 到 70,000 美元之间,但正迅速失去维持其运行所需的软件功能。这家公司曾被视为特斯拉的竞争对手,拥有超过 31,000 个订单,潜在收入达 17 亿美元,但在负债超过 10 亿美元之前只生产了 11,000 辆车。问题出在架构上:Fisker 打造了 Cory Doctorow 所称的"软件定义汽车",几乎每个子系统——从刹车和安全气囊到电池管理和车门锁——都需要定期连接 Fisker 的云端服务器。一旦这些服务器断联,车辆就会丧失关键功能,而不仅是娱乐系统。 When Fisker Inc. filed for Chapter 11 bankruptcy in June 2024, it left roughly 11,000 Ocean SUV owners with vehicles that cost between $40,000 and $70,000 but were rapidly losing the software functionality that made them work. The company, once positioned as a Tesla rival with over 31,000 reservations worth $1.7 billion in potential revenue, had produced just 11,000 vehicles before collapsing under more than $1 billion in debt. The problem was architectural. Fisker had built what digital rights activist Cory Doctorow called a "software-based car," where virtually every subsystem, from brakes and airbags to battery management and door locks, needed periodic connections to Fisker's cloud servers. When those servers went dark, the cars lost critical functionality, not just infotainment features.
当 Fisker Inc. 于 2024 年 6 月申请第 11 章破产时,大约 11,000 名 Ocean SUV 车主被留下来了——这些车售价在 40,000 到 70,000 美元之间,但正迅速失去维持其运行所需的软件功能。这家公司曾被视为特斯拉的竞争对手,拥有超过 31,000 个订单,潜在收入达 17 亿美元,但在负债超过 10 亿美元之前只生产了 11,000 辆车。问题出在架构上:Fisker 打造了 Cory Doctorow 所称的"软件定义汽车",几乎每个子系统——从刹车和安全气囊到电池管理和车门锁——都需要定期连接 Fisker 的云端服务器。一旦这些服务器断联,车辆就会丧失关键功能,而不仅是娱乐系统。
随后发生的事情成为电动汽车史上最引人注目的案例之一。 Fisker Ocean 的车主们没有接受车辆将被废弃的命运,而是自发组建了 Fisker Owners Association(FOA),这个非营利组织迅速发展到约 4,000 名成员,既像汽车俱乐部,又像科技初创公司,甚至像一家独立汽车制造商。他们雇佣独立技术专家逆向分析 Fisker 的专有软件补丁,互相教授刷写固件的方法,组织替换零件的集中采购,通过团购把钥匙扣等关键配件的价格从约 1,000 美元大幅压低。在欧洲,他们还成立了"飞行医生"计划,技术熟练的成员前往帮助其他车主维修车辆。
这些技术工作进一步发展成真正的开源生态。在 GitHub 上,开发者 MichaelOE 逆向了 Fisker 官方移动应用的 API,构建了一个 Home Assistant 集成,把每个云 API 的值作为传感器暴露出来,该项目在 Apache 2.0 许可下已有 135 次提交和 20 个发布。社区成员发布了 Fisker Ocean 的 CAN 总线文件,包括用于过滤和处理的 DBC 文件,系统性地绘制了以 500 kbps 运行的多条 CAN 总线。 Majr Srour 记录了如何嗅探 CAN 流量并解码诊断故障码,目标是把诊断能力放进手机应用,让车主能自行扫描,而不再依赖已不存在厂家的经销商工具。
然而,社区的努力在 2024 年 10 月遭遇重大阻碍:Fisker 的剩余库存被卖给 American Lease,后者额外支付 250 万美元以获取 Fisker 专有源代码和云服务的访问权。 American Lease 通过与 FOA 的口头协议同意为私人 Ocean 车主延续联网服务,但双方并未签署正式合同。合作破裂发生在 American Lease 要求 FOA 承担 58% 的所有运营成本(包括 LTE 连接和 Microsoft Cloud 服务),却拒绝提供明细发票。后果是毁灭性的:车主失去了远程连接,云功能被削减,一项待执行的软件召回也被阻止。
Fisker Ocean 的遭遇并非个例。 Nikola 也申请破产,令其车主面临类似困境,Canoo 和 Arrival 则走向清算拍卖。分析师预计随着行业整合,会有更多电动汽车初创公司倒下。消费者维权人士正在推动结构性变革,包括设立强制性软件托管基金以在制造商消失时维持车辆软件运行、在破产程序中强制开源、以及强制共享维修数据。俄勒冈州的 Right to Repair 法案已经禁止使独立维修困难的"零件配对",而大众、 BMW 和 Mercedes 等欧洲汽车厂商在 2025 年签署备忘录,共同开发一个开源的汽车软件平台。
问题不是是否会有更多电动汽车公司倒闭——这是不可避免的;问题是当它们倒闭时,是否已有机制能防止成千上万辆仍可使用的车辆变成电子垃圾。以太坊联合创始人 Vitalik Buterin 如此表述了这种担忧:汽车行业需要更多开源思维,令人悲哀的是"制造商一旦消失,汽车就变得无用"已经成了常态。 FOA 证明了一个有奉献精神的社区可以让被遗弃的电动汽车继续上路:逆向固件、绘制 CAN 总线、构建集成、运行移动维修服务。但车主不应被逼迫走这条路。行业需要强制性的软件托管和针对任何依赖云连接车辆的开源后备条款:如果制造商倒闭,软件应当向公众公开。下次有电动汽车初创公司倒闭时,车主不应再被迫变成黑客和零件中间人才能继续驾驶他们已经付钱购买的汽车。
When Fisker Inc. filed for Chapter 11 bankruptcy in June 2024, it left roughly 11,000 Ocean SUV owners with vehicles that cost between $40,000 and $70,000 but were rapidly losing the software functionality that made them work. The company, once positioned as a Tesla rival with over 31,000 reservations worth $1.7 billion in potential revenue, had produced just 11,000 vehicles before collapsing under more than $1 billion in debt. The problem was architectural. Fisker had built what digital rights activist Cory Doctorow called a "software-based car," where virtually every subsystem, from brakes and airbags to battery management and door locks, needed periodic connections to Fisker's cloud servers. When those servers went dark, the cars lost critical functionality, not just infotainment features.
What happened next became one of the most remarkable stories in electric vehicle history. Instead of accepting that their cars would become useless, Fisker Ocean owners organized into the Fisker Owners Association, a nonprofit that quickly grew to 4,000 members and began operating as something between a car club, a tech startup, and an independent automaker. They hired independent tech experts to reverse-engineer Fisker's proprietary software patches, taught each other how to flash firmware, and organized bulk purchases of replacement parts, negotiating key fob prices down from roughly $1,000 each to a fraction of that through coordinated group buys. In Europe, they created a "Flying Doctors" program where technically skilled members travel to help other owners maintain their vehicles.
The technical work evolved into a genuine open-source ecosystem. On GitHub, developer MichaelOE reverse-engineered the API behind Fisker's official mobile app and built a Home Assistant integration that exposes every cloud API value as a sensor, with 135 commits and 20 releases under an Apache 2.0 license. Community members published CAN bus files for the Fisker Ocean, including DBC files for filtering and processing, systematically mapping the multiple CAN buses that run at 500kbps. Majr Srour documented how to sniff CAN traffic and decode Diagnostic Trouble Codes, aiming to put diagnostic capabilities into mobile apps so owners can run their own scans without relying on dealer tools that no longer exist for a defunct company.
The community's path hit a major obstacle in October 2024 when Fisker's remaining inventory was sold to American Lease, which spent an extra $2.5 million to acquire access to Fisker's proprietary source code and cloud services. American Lease agreed to extend connected services to private Ocean owners through a deal with the FOA, but the agreement was never formally signed, it was based on a handshake. The relationship collapsed when American Lease asked the FOA to cover 58% of all operational costs, including LTE connectivity and Microsoft Cloud services, while refusing to provide itemized invoices. The result was devastating: Ocean owners lost remote connectivity, cloud features were cut, and a pending software recall was blocked.
The Fisker Ocean saga is not an isolated incident. Nikola also filed for bankruptcy, leaving its owners in a similar situation, while Canoo and Arrival are headed for liquidation auctions. Analysts expect more EV startups to follow as the industry consolidates. Consumer advocates are now pushing for structural changes, including mandatory software escrow funds to keep vehicle software running even if the manufacturer disappears, open-source mandates in bankruptcy proceedings, and shared repair data requirements. Oregon's Right to Repair bill already bans the "parts pairing" that makes independent repair so difficult, and European automakers like Volkswagen, BMW, and Mercedes-Benz signed a memorandum in 2025 to develop a shared open-source automotive software platform.
The question isn't whether more EV companies will fail, it's inevitable. The question is whether systems will be in place to prevent tens of thousands of functional vehicles from becoming e-waste when they do. Ethereum co-founder Vitalik Buterin captured the mood when he wrote that the auto industry needs much more open-source thinking, noting how sad it is that "if the manufacturer disappears, the car is useless now" has become a default. The Fisker Owners Association has proven that a dedicated community can keep orphaned EVs on the road, reverse-engineering firmware, mapping CAN buses, building integrations, and running mobile repair programs. But they shouldn't have had to. The industry needs mandatory software escrow and open-source fallback provisions for any vehicle that depends on cloud connectivity. If a manufacturer dies, the software should be released to the public. The next time an EV startup goes under, owners shouldn't need to become hackers and parts brokers just to keep driving the cars they already paid for.
• Fisker 采用高度依赖云的汽车设计使其格外脆弱,但更广泛的问题是所有制造商都面临的软件依赖性——不仅仅是电动汽车——这需要像欧洲汽车制造商正在开发的开源汽车平台那样的系统性解决方案。
• 如果采用开源软件,Fisker 本可以被拯救:车主能够自行维护和更新车辆,即便公司倒闭也能维持一个可持续的生态系统。
• 文章的写作风格,尤其是如 "the irony reads" 之类的短语,被批评为 AI 生成的垃圾内容,这引发了关于 AI 在新闻业中作用的讨论,以及 AI 辅助内容是否还能算作优质新闻的争议。
• 刹车和转向等关键安全系统绝不应由仅有软件控制且没有机械后备——现实恐怖的例子表明,发动机熄火会让老旧车辆的刹车助力和转向助力失效。
• 现代汽车对软件和云连接的依赖带来了不可接受的风险,包括可能被强制进行空中下载更新,从而在未经车主同意的情况下改变车辆行为。
• Fisker 的困境凸显了一种反复出现的企业伤害客户的模式:这是 Fisker 第二次破产,车主再次被抛在一边,车辆失去支持。
• 对于车主能够控制和修改的软件可见车辆有着强烈需求,许多消费者愿意为避免企业监控和控制而支付溢价。
• 以 250 万美元收购 Fisker 源代码的租赁公司,主要是为了自身商业利益(租给 Uber 司机),而并非为了支持更广泛的车主社区。
• 现代车辆中不必要的软件激增导致了荒谬的复杂性和成本,例如记忆座椅这类功能需要多个电机并通过 CAN 总线集成,取代了简单的机械杠杆。
• AI 检测工具并不可靠,常产生误报——它们倾向于根据表面标记(比如 em dash 的使用)将人类撰写的内容误判为 AI 生成。
讨论揭示了汽车行业对软件与云连接日益依赖的深层担忧,Fisker 的倒闭成为软件依赖型车辆风险的警示故事。在主张更多开源解决方案以赋予车主控制权的人,与认为根本问题在于车辆中软件过多的人之间存在紧张。对话还触及 AI 在新闻业中的更广泛问题以及检测 AI 生成内容的挑战,参与者普遍对 AI 写作质量和 AI 检测工具的可靠性持怀疑态度。关于用软件控制关键系统的安全问题反复出现,参与者分享了亲身经历,强调从基本功能中移除机械后备的潜在危险。
• Fisker's cloud-dependent vehicle design made it uniquely vulnerable, but the broader issue of software-dependent cars affects all manufacturers, not just EVs, and requires systemic solutions like the open-source automotive platform being developed by European automakers.
• Open-source software could have saved Fisker by allowing owners to maintain and update their vehicles independently, creating a sustainable ecosystem even after the company's collapse.
• The article's writing style, particularly phrases like "the irony reads," has been criticized as AI-generated slop, raising questions about the role of AI in journalism and whether AI-assisted content can still be considered quality journalism.
• Critical safety systems like brakes and steering should never be software-controlled without mechanical fail-safes, as demonstrated by terrifying real-world experiences where engine shutdowns caused brake and steering assist failures in older vehicles.
• Modern cars' dependence on software and cloud connectivity creates unacceptable risks, including the potential for mandatory over-the-air updates that could change vehicle behavior without owner consent.
• The Fisker situation highlights a recurring pattern of companies burning customers, with this being the second time Fisker has gone bankrupt and left owners stranded with unsupported vehicles.
• There's significant demand for vehicles with open-source software that owners can control and modify, with many consumers willing to pay premium prices to avoid corporate surveillance and control.
• The leasing company that bought Fisker's source code for $2.5 million appears to have done so primarily to serve their own commercial interests (leasing to Uber drivers) rather than supporting the broader owner community.
• The proliferation of unnecessary software in modern vehicles has created absurd complexity and cost, with features like memory seats requiring multiple motors and CAN bus integration to replace simple mechanical levers.
• AI detection tools are unreliable and often produce false positives, as demonstrated by their tendency to flag human-written content as AI-generated based on superficial markers like em dash usage.
The discussion reveals deep concerns about the automotive industry's increasing reliance on software and cloud connectivity, with Fisker's collapse serving as a cautionary tale about the risks of software-dependent vehicles. There's a tension between those advocating for more open-source solutions to give owners control and those who believe the fundamental problem is too much software in vehicles altogether. The conversation also touches on broader issues of AI in journalism and the challenges of detecting AI-generated content, with participants expressing skepticism about both the quality of AI writing and the reliability of AI detection tools. Safety concerns about software-controlled critical systems emerge as a recurring theme, with participants sharing personal experiences that highlight the potential dangers of removing mechanical fail-safes from essential vehicle functions.
作者重温了 2019 年的一个项目:他们制作了一台电压表时钟,用模拟面板电压表替代传统表盘来显示时、分、秒。虽然原作运行良好,但作者决定重新设计一版,更加优雅且文档更完善。新版选用了从 Amazon 购买的三只通用 90 度面板电压表,拆解后在自粘纸上定制印刷了刻度盘。小时表盘分为 13 格(0 到 12),分钟和秒钟表盘各为 61 格(00 到 60),因此指针可以连续移动而不是跳动。 The author revisits a project from 2019 where they built a voltmeter clock, a timepiece that uses analog panel voltmeters instead of traditional clock faces to display hours, minutes, and seconds. While the original version worked well, the author decided to create a revised design that would be both more elegant and better documented. The new version uses three generic 90-degree panel voltmeters purchased from Amazon, which were disassembled and fitted with custom-printed decals on adhesive paper. The hour gauge features 13 divisions (0 to 12), while the minute and second gauges have 61 divisions (00 to 60), allowing for continuous motion of the hands rather than discrete jumps.
作者重温了 2019 年的一个项目:他们制作了一台电压表时钟,用模拟面板电压表替代传统表盘来显示时、分、秒。虽然原作运行良好,但作者决定重新设计一版,更加优雅且文档更完善。新版选用了从 Amazon 购买的三只通用 90 度面板电压表,拆解后在自粘纸上定制印刷了刻度盘。小时表盘分为 13 格(0 到 12),分钟和秒钟表盘各为 61 格(00 到 60),因此指针可以连续移动而不是跳动。
外壳是与第一版最大的不同。作者没有手工拼装,而是用 CNC 铣床在枫木板上铣出前后面板。为实现无缝的弧形侧壁,他们在一块木板上切出内凹槽,使其更容易绕成型模板弯折。木板先加湿、夹紧定型并晾干,再用胶合板模板精确粘合到前后面板上。经打磨并喷涂一层硝化纤维素清漆后,成品外观干净光滑,带有凹陷的装饰纹理,巧妙地掩盖了电压表难看的塑料法兰。
电子部分相当简单,核心是一颗 AVR128DB28 微控制器,由外接电源适配器供电,并配有 8 MHz 晶振做时钟。三只电压表直接接到数字输出引脚,背部设有两颗按键用于校时。值得一提的是,设计不需要数模转换器;作者改用高频一位数字脉冲序列,借助表头机构的机械惯性和线圈的电感,使表针根据软件输出信号的占空比停在介于刻度之间的位置。代码精简且注释充分,利用定时中断推进一个 10 Hz 的计数器,主循环则计算并切换各表的占空比。
作者还附上了一段视频,记录了时钟在午夜前后的戏剧性翻转效果。在回应 Hacker News 的评论时,作者解释指针在过渡时的轻微下坠与弹跳是刻意为之,旨在增强视觉效果,并将这种表现手法与豪华腕表中的逆跳机构相类比——那类花巧往往能卖出高价。这个项目也体现了作者的理念:电子设计往往同样需要木工和实体制作方面的匠心,与电路设计和编程并重。
The author revisits a project from 2019 where they built a voltmeter clock, a timepiece that uses analog panel voltmeters instead of traditional clock faces to display hours, minutes, and seconds. While the original version worked well, the author decided to create a revised design that would be both more elegant and better documented. The new version uses three generic 90-degree panel voltmeters purchased from Amazon, which were disassembled and fitted with custom-printed decals on adhesive paper. The hour gauge features 13 divisions (0 to 12), while the minute and second gauges have 61 divisions (00 to 60), allowing for continuous motion of the hands rather than discrete jumps.
The enclosure represents a significant departure from the first version. Rather than building it by hand, the author opted to machine the front and back faces from maple lumber using a CNC mill. To achieve a seamless curved side wall, the author cut internal notches into a flat piece of wood, allowing it to flex more easily around a shaped template. The wood was moistened, clamped into shape, and left to dry before being glued to the front and back faces using a plywood template for precision. After sanding and applying a coat of nitrocellulose lacquer, the finished enclosure has a clean, polished appearance with a recessed decorative pattern that hides the voltmeters' unsightly plastic flanges.
The electronics are straightforward, centered around an AVR128DB28 microcontroller powered by a wall wart and synchronized with an 8 MHz crystal. The three voltmeters are connected directly to digital output pins, and two pushbuttons on the back allow for time setting. Notably, the design doesn't require digital-to-analog converters. Instead, it uses high-frequency 1-bit digital pulse trains, relying on the mechanical inertia of the meter movements and the inductance of their coils to settle at intermediate positions based on the duty cycle of the software-controlled signal. The code is compact and well-commented, using a timer interrupt to advance a 10 Hz counter while the main loop computes and toggles the appropriate duty cycles for each meter.
The author includes a video showing the clock's behavior around midnight, capturing the dramatic rollover effect. In response to comments on Hacker News, the author explains that the slight drop and bounce of the hands during the transition is intentional, adding visual flair to the clock. They compare this to the retrograde mechanisms in luxury wristwatches, where such theatrics command a premium. The project reflects the author's philosophy that electronic design often involves as much craftsmanship in woodworking and physical construction as it does in circuit design and programming.
• 一位制作者分享了自己制作类似模拟仪表钟的经历:在 Princess Auto 发现几只剩下的面板仪表,每只只要一美元多一点。尽管他做得没那么精致,但这些仪表能有效显示时间,而且会引起路人的小小好奇心。
• 一位从事模拟计算机项目的爱好者描述了用数字 LCD(通过 ESP32)和真实面板仪表来可视化模拟计算结果。他发现看到物理仪表与模拟计算同步移动,特别令人满足,也更有真实感。
• 一位专业家具制造者建议,拥有 CNC 路由器的创客空间能处理大部分木工活。他指出,如果绕过仪表周围的榫接,前面板就能简化为单面加工,可能用基本工具也能完成。
• 有评论者讨论了指针在切换时的过冲和跳动问题,其中一位解释说应逐渐降低 PWM 的占空比而不是瞬时改变以避免这种现象,另一位则担心反复的冲击会损坏便宜的面板仪表。
• 一个人幽默地指出,模拟仪表的自然跳动正是开发者经常试图用额外代码在数字显示上复制的,大家都很欣赏物理运动那种有机的质感。
• 一个技术性解释说明了 PWM(脉宽调制)如何用来控制等效电压:晶体管以某个占空比脉冲输出,从而模拟中间电压;而电子电压测量依赖于电容充电速率,这与模拟仪表靠机械弹簧和电磁铁的工作机制不同。
• 评论里也表达了对工艺的钦佩,有些人说这个项目激励他们去学 3D 建模或木工;也有人提到在过于雄心勃勃的建模项目中遇到困难,于是有人建议从 Tinkercad 等更简单的工具入门。
• 一位观察者最初以为秒针会更平滑地移动,怀疑 10Hz 的控制是否太慢;另一位对仪表不会在一天内真实升压感到些许失望,后来意识到 PWM 的方式其实能有效模拟这一点。
整体讨论展现出大家对模拟美学与数字控制交汇处的浓厚兴趣,参与者很看重面板仪表带来的触觉和物理化数据表现。技术话题集中在 PWM 的实现以及如何在实现真实模拟行为和延长设备寿命之间权衡。社区还给出实用建议,比如利用共享创客空间的设备,以及从更简单的工具和项目入手以降低门槛。反复出现的主题是:在纯数字显示无法复制的物理运算中,人们能找到特别的满足感。
• A builder shares their experience creating a similar analog meter clock after finding surplus meters at Princess Auto for just over $1 each, noting that while their version isn't as polished, it effectively displays time and draws mild fascination from observers.
• An enthusiast working on an analog computer project describes using both digital LCD displays (via ESP32) and actual panel meters to visualize simulation results, finding that seeing a real physical meter move in sync with the analog computation was particularly satisfying and authentic.
• A professional furniture maker suggests that makerspaces with CNC routers could handle most of the woodworking, noting that skipping the rabbets around the gauges would simplify the front panel to a single-sided job, potentially achievable with basic tools.
• Several commenters address the meter needle's overshoot and bounce behavior during transitions, with one explaining that ramping down the PWM duty cycle gradually rather than instantaneously would prevent this, while another expresses concern that repeated shocks might damage cheaper panel meters.
• One person humorously notes that the natural bounce of analog meters is something developers often try to replicate digitally with extra code, appreciating the organic quality of the physical movement.
• A technical explanation clarifies how PWM (Pulse Width Modulation) works for voltage control, where transistors pulse at specific duty cycles to simulate intermediate voltages, and how electronic voltage measurement relies on timing capacitor charge rates rather than the mechanical spring-and-electromagnet mechanism of analog meters.
• Commenters express admiration for the craftsmanship, with some noting the project inspires them to learn 3D modeling or woodworking, though one mentions struggling with overly ambitious modeling projects and receives a suggestion to start with Tinkercad.
• One observer initially expected the seconds hand to move more smoothly, questioning whether 10Hz control is too slow, while another expresses slight disappointment that the meters don't literally increase voltage throughout the day before realizing the PWM approach effectively simulates this.
The discussion reveals a strong appreciation for the intersection of analog aesthetics and digital control, with participants valuing the tactile, physical representation of data that panel meters provide. Technical conversations center on PWM implementation and the trade-offs between authentic analog behavior and equipment longevity. The community offers practical advice for overcoming barriers to similar projects, from utilizing shared makerspace resources to starting with simpler tools and projects. There's a recurring theme of finding satisfaction in physical computing that purely digital displays can't replicate.
作者最近开始为其主要工作工具提供 MCP Server,这次体验很有意思,处在确定性系统与非确定性系统的交汇处。尽管作者认为 MCP 是个设计欠佳的规范,但真正的问题更现实:客户不断反馈服务器"不能用"。原因很简单:用户在浏览器中打开 MCP 端点 URL,会看到 401 Unauthorized 错误和一段原始 JSON 响应,便认为链接坏了,立刻提交工单;实际上他们应该把该 URL 粘贴到自己的 LLM 客户端中。入门引导时没人会想到这一步。 The author recently started offering an MCP Server for their main work tool, which has been an interesting experience sitting at the intersection of deterministic and non-deterministic systems. Despite MCP being what they consider a poorly designed specification, the real problem has been practical: customers keep reporting the server isn't working. The issue is straightforward, when users open the MCP endpoint URL in a browser, they get a 401 Unauthorized error with a raw JSON blob. Users see this and immediately file support tickets saying the link is broken, when in reality they need to paste that URL into their LLM client. Nobody thinks that far ahead during onboarding.
作者最近开始为其主要工作工具提供 MCP Server,这次体验很有意思,处在确定性系统与非确定性系统的交汇处。尽管作者认为 MCP 是个设计欠佳的规范,但真正的问题更现实:客户不断反馈服务器"不能用"。原因很简单:用户在浏览器中打开 MCP 端点 URL,会看到 401 Unauthorized 错误和一段原始 JSON 响应,便认为链接坏了,立刻提交工单;实际上他们应该把该 URL 粘贴到自己的 LLM 客户端中。入门引导时没人会想到这一步。
显而易见但代价高的解决方案是把服务器为每个 LLM 客户端打包成连接器或插件——既慢又繁琐,而且容易变成无休止的打地鼠游戏,尤其当越来越多客户在组织内部构建嵌入式客户端时。于是作者采取了更简单、略微取巧的办法:当收到 GET /mcp 请求且 Accept 头包含 text/html 但不包含 application/json 或 text/event-stream 时,服务器返回一页 HTML,说明用户正在用浏览器查看 MCP Server,需要把它添加到他们的客户端。这个小改动效果显著:支持工单大幅减少,客户成功团队更省心,客户上手更快,作者也不用再反复解释并非所有错误信息都是真正的错误。目前尚未观察到负面影响。
作者希望 MCP 规范能内置处理这类用户体验问题,但像当前 AI 时代的常态一样,大家还是快速推进,寄希望于 AI 能比错误积累得更快地修复问题。文章最初写于 2026 年 5 月 16 日,附带侧边栏列有作者主页上的若干个人统计和更新,涵盖从音乐收听习惯到跑步里程及近期写作项目。
The author recently started offering an MCP Server for their main work tool, which has been an interesting experience sitting at the intersection of deterministic and non-deterministic systems. Despite MCP being what they consider a poorly designed specification, the real problem has been practical: customers keep reporting the server isn't working. The issue is straightforward, when users open the MCP endpoint URL in a browser, they get a 401 Unauthorized error with a raw JSON blob. Users see this and immediately file support tickets saying the link is broken, when in reality they need to paste that URL into their LLM client. Nobody thinks that far ahead during onboarding.
The obvious but painful solution would be to package the server as a connector or plugin for every LLM client on the market. This approach is slow, tedious, and becomes an endless game of whack-a-mole, especially as more customers start building their own embedded clients within their organizations. Instead, the author took a simpler, slightly hacky approach. When a request comes in for GET /mcp with an Accept header that includes text/html but not application/json or text/event-stream, the server returns an HTML page explaining that the user is trying to view an MCP server and needs to add it to their client. This small change has been remarkably effective. Support ticket volume has dropped significantly, customer success is happier, customers are getting set up faster, and the author no longer has to explain that not all error messages are actual errors. It is a win all around with no negative side effects observed so far.
The author wishes the MCP specification had built-in handling for this kind of user experience issue, but like everything in the current AI-era landscape, the approach has been to move fast and hope that AI can fix bugs faster than they accumulate. The post was first written on May 16th, 2026, and includes a sidebar with various personal stats and updates from the author's homepage, covering everything from music listening habits to running distances and recent writing projects.
- 浏览器请求 /mcp 端点时返回一个 HTML 说明页,通过 HTTP 内容协商(Accept 头)来实现是合理的做法,而不是一种 hack,因为它能恰当地提示用户该资源不适合在浏览器中直接查看。
- MCP 规范在认证方面存在重大空白,过度依赖复杂且不常用的 OAuth 2.0/2.1 功能(如 DCR 和令牌交换)。虽然最近的修订有所改进,但网关可以通过处理令牌交换和访问控制来缓解服务器端的认证负担。
- MCP 中对"网关"的缺乏正式定义导致实现多样,但网关可以作为代理来管理认证、按角色暴露工具并执行上游令牌交换,从而简化后端认证流程。
- MCP Contributors 的 Discord 社区非常活跃,欢迎参与工作组共同改进规范,目前正通过 XAA/ID-JAG 和 CIMD 等标准来解决企业级需求。
- 部分用户即便未使用 VPN,也会因为 Cloudflare 的封锁而无法访问某些网站,说明安全策略可能过于激进,影响了正常访问。
- /mcp 端点的必要性存在争议:带有 Swagger 文档的 REST API 提供更大的灵活性,但 MCP 提供了一致性,对于没有现成 API 的工具仍然有用;也有人更倾向于通过系统提示来调用工具。
- 当客户端的 Accept 头包含 text/html 时返回 HTML 是一种务实的做法,类似 ipinfo.io 和 Kubernetes API 的内容协商方式,但如何处理带通配符(如 Accept: /)的客户端仍需考虑。
- MCP 规范起初设计欠佳且存在损坏链接,这反映了"更差即更好"的采用模式——先占和覆盖率往往胜过质量,这在历史上的技术采用中也曾多次出现。
- 可改进的用户体验细节例如为 MCP URL 提供"复制到剪贴板"按钮而不是可点击链接,以避免用户误点或直接在地址栏打开时产生混淆。
- 将 MCP 服务器打包为特定客户端的连接器违背了 MCP 作为通用协议的初衷,但鉴于当前客户端的限制,这种做法在短期内可能是必要的。
• Returning an HTML explanation page when a browser requests the /mcp endpoint is a reasonable use of HTTP content negotiation via the Accept header, not a hack, as it appropriately informs users that the resource is not meant for direct browser viewing.
• The MCP specification has significant gaps, particularly around authentication, relying on complex and less common OAuth 2.0/2.1 features like DCR and token exchange, though recent revisions are improving this, and gateways can help manage auth by handling token exchange and access control.
• Gateways lack a formal definition in the MCP spec, leading to varied implementations, but they can act as proxies that manage authentication, expose tools based on roles, and perform upstream token exchange, simplifying server-side auth concerns.
• The MCP Contributors Discord is active and welcomes participation in working groups to help evolve the spec, with ongoing efforts to address enterprise needs through standards like XAA/ID-JAG & CIMD.
• Some users face issues accessing certain sites due to Cloudflare blocking, which can occur even without VPN use, highlighting potential overzealous security measures.
• The /mcp endpoint's necessity is questioned, as REST APIs with Swagger docs offer more flexibility, but MCP provides consistency and is useful for tools without existing APIs, though some prefer using system prompts for tool calls.
• Serving HTML when the Accept header includes text/html is a practical approach, similar to how other services like ipinfo.io and Kubernetes APIs handle content negotiation, though handling clients with wildcard Accept headers like `Accept: /` remains a consideration.
• The MCP spec's poor initial design and broken links reflect a "worse is better" adoption pattern, where being first and having reach outweighs quality, similar to historical tech adoptions.
• UX improvements, like providing a copy-to-clipboard button for MCP URLs instead of clickable links, can prevent user confusion, as users naturally click links or paste URLs into address bars.
• Packaging MCP servers as client-specific connectors contradicts MCP's goal of a universal protocol, though it may be necessary due to current client limitations.
60 comments • Comments Link
• 《 CUDA Programming: A Developer's Guide to Parallel Computing with GPUs 》被推荐为最佳入门书,而《 Massively Parallel Processors: A Hands-on Approach 》因大量错误和令人困惑的解释被批评,《 CUDA by Example 》则被认为过于简化并且对硬件架构抽象过度。
• 一本新的 CUDA 书正在开发中,采用自下而上的写作思路,从硬件工程入手,逐步深入 NVIDIA 硬件优化,覆盖除图算法之外的主要算法,基于一门成功的大学课程编写。
• 尽管推荐书籍中有一本出版于 2012 年,但它仍然适用,因为 GPU 硬件和 CUDA 语言没有发生根本性变化,它为通过其他资源学习现代特性提供了坚实基础。
• Warp 被推荐为基于 Python 的现代化 CUDA 开发替代方案,允许在 Python 中直接编写 CUDA kernel,学习曲线较平缓,但由于相对较新,尚难以进入书本教材。
• 人们对涵盖 cuTile 等新兴范式的资料表现出兴趣,这反映出当前教学资源在介绍 GPU 编程新技术方面存在空白。
• 越来越多的 NVIDIA 内部人员建议不要编写自定义 CUDA kernel,除非这是 NVIDIA 的全职工作,他们推荐使用更高级别的库;但也有人认为这种建议是推动供应商锁定的一种方式。
• 反对编写自定义 kernel 的建议被比作建议用 Python 代替 C,或用 Unreal 的授权而不是自己构建渲染引擎,强调了为特定需求选择合适工具的重要性。
• NVIDIA 未能为 sm120(非数据中心 GPU)发布可用的 kernel,尽管 Blackwell 已经发布,这表明 NVIDIA 并不总是平等地优先支持各个硬件细分市场,依赖其官方工具存在一定风险。
• 是否编写自定义 CUDA kernel 应基于具体需求:当高级库能满足需求时就使用高级库;但在学习、需要底层控制、进行微观优化或通过 kernel 融合减少内存流量时,编写自定义 kernel 仍然必要。
• 《 AI Systems Performance Engineering 》被提及为相关读物,虽然它并不专注于 CUDA,但表明更广泛的性能工程知识非常有价值。
• OLCF 的 CUDA 培训系列被推荐为良好的入门资源,覆盖基础内容,能让后续阅读更容易理解。
• 指向《 Programming Massively Parallel Processors 》第三版的链接已损坏,目前该书已出到第四版。
• 使用 LLM 提高即时生产力的做法引发了对通过传统书籍进行深入学习的质疑,这反映了行业更倾向于 prompt engineering 而非打牢基础编码技能的趋势。
• 大家普遍感到企业更青睐 prompt engineering 而不是传统编码技能,这在生产压力与深入技术学习之间造成了张力。
讨论显示,基础的 GPU 编程知识仍具有持久价值,但业界同时在推动更高级别的抽象和 LLM 驱动的生产力。虽然有几本书被推荐用于学习 CUDA,但共识是:出于优化和学习等特定用途,编写自定义 kernel 仍然重要,尽管供应商倾向于推广更高级别的库。社区对 Warp 、 cuTile 等新工具表现出兴趣,表明实践在不断演进,同时也对供应商锁定和 NVIDIA 对不同硬件支持不一致表示担忧。将 LLM 用于提高即时产能的压力与掌握 GPU 编程所需的深度、耗时学习之间存在明显冲突。 • 'CUDA Programming: A Developer's Guide to Parallel Computing with GPUs' is recommended as the best introductory book, while 'Massively Parallel Processors: A Hands-on Approach' is criticized for numerous errors and confusing explanations, and 'CUDA by Example' is considered too simplistic and overly abstracts the hardware architecture.
• A new CUDA book is being developed that takes a bottom-up approach, starting from hardware engineering and progressing to optimization on NVIDIA hardware, covering all major algorithms except graph algorithms, based on a successful university course.
• Despite being published in 2012, the first recommended book remains relevant because GPU hardware and CUDA language have not changed significantly, providing a solid foundation for learning modern features through other resources.
• Warp is suggested as a modern alternative for Python-based CUDA development, allowing direct CUDA kernel writing in Python with an easy learning curve, though it may be too new for book coverage.
• There is interest in resources covering newer paradigms like cuTile, indicating a gap in current educational materials for emerging GPU programming techniques.
• NVIDIA insiders increasingly advise against writing custom CUDA kernels unless it's a full-time job at NVIDIA, recommending higher-level libraries instead, though this advice is seen by some as promoting vendor lock-in.
• The recommendation to avoid custom CUDA kernels is compared to suggesting avoiding C in favor of Python or licensing Unreal instead of building a graphics engine, highlighting the importance of choosing the right tool for specific needs.
• NVIDIA's failure to release unbroken kernels for sm120 (non-data center GPU) despite Blackwell's release shows that NVIDIA doesn't always prioritize all hardware segments equally, making reliance on NVIDIA's own tools risky.
• The decision to write custom CUDA kernels should be based on specific needs: use higher-level libraries when they suffice, but write custom kernels for learning, low-level control, micro-optimization, or kernel fusion to reduce memory traffic.
• 'AI Systems Performance Engineering' is mentioned as a relevant resource, even though it's not strictly focused on CUDA, suggesting broader performance engineering knowledge is valuable.
• The OLCF CUDA training series is recommended as a good introductory resource that covers fundamentals and makes subsequent books easier to understand.
• There is a broken link to the 3rd edition of 'Programming Massively Parallel Processors', with the 4th edition being the current version.
• The pressure to use LLMs for immediate productivity raises questions about finding time for deep learning through traditional book reading, reflecting industry trends toward prompt engineering over fundamental coding skills.
• There's a sense that corporations prefer prompt engineering over traditional coding skills, creating tension between productivity demands and deep technical learning.
The discussion reveals a tension between the enduring value of foundational GPU programming knowledge and the industry's push toward higher-level abstractions and LLM-driven productivity. While several books are recommended for learning CUDA, there's consensus that writing custom kernels remains important for specific use cases like optimization and learning, despite vendor advice to use higher-level libraries. The community shows interest in newer tools like Warp and cuTile, indicating evolving practices, while also expressing concern about vendor lock-in and NVIDIA's inconsistent hardware support. The pressure to adopt LLMs for immediate productivity creates a conflict with the deep, time-intensive learning that mastering GPU programming requires.