632 points
• 5 days ago
• Article
Link
我把代码从 GitHub 迁到自托管的 Forgejo 实例,有好几个原因,但归根到底是一个问题:所有权。在截至 2026 年 4 月的一年里,GitHub 记录了 257 起事故,其中 48 起属重大事故;CTO 公开道歉,称容量需要扩大 30 倍才能应付 AI 带来的负载。但宕机只是表象,根本转折是在 2025 年 8 月:GitHub 不再有自己的 CEO,被并入微软的 CoreAI 部门,也就是在开发 Copilot 的团队。现在把代码推到 GitHub 等于推到了微软 AI 组织的一个单元,这从根本上改变了你与平台的关系。
2026 年 4 月训练数据默认设置的变更把这件事具体化。 GitHub 修改隐私声明,Copilot Free 、 Pro 和 Pro+ 用户的交互数据默认用于 AI 训练,并且没有仓库级别的退出选项。作为维护者,我不能要求 GitHub 不在我的代码库交互上进行训练。退出是按用户账户设置的,这意味着任何使用 Copilot 的人接触到我的代码时,代码都有可能被用作训练素材,不论我如何授权。私有仓库的豁免也没想象中完全:当 Copilot 激活时生成的"代码片段和交互上下文"仍会被收集。只有 Copilot Business 和 Enterprise 客户例外,且是因为他们受单独的数据保护协议约束。
还有管辖权问题。 GitHub 和 Microsoft 都是美国公司,这意味着它们持有的任何数据都可能受到 FISA 第 702 条和 CLOUD Act 的影响,不管数据物理存放在哪儿。 GitHub 于 2024 年 10 月推出的欧盟数据驻留选项解决了数据位置问题,但没能解决管辖权,因为 CLOUD Act 是随企业控制权而非地理位置生效。微软一位律师在法国参议院听证会上宣誓表示,他无法保证存放在欧洲数据中心的欧盟数据不会被美国政府静默访问。荷兰政府也得出相同结论,并于 2026 年 4 月软启动了 code.overheid.nl——一个自托管的 Forgejo 实例,正是因为需要在一个自己真正拥有的平台上发布源代码。
我选择 Forgejo 而不是 GitLab,主要出于两点考虑。第一是许可:GitLab 采用 open core 模式,许多生产功能被锁在企业版后面;而 Forgejo 在 2024 年 8 月改用 GPLv3+ 许可,明确要保持 copyleft,抵抗商业收编。第二是治理:Forgejo 隶属于 Codeberg e.V.,一家在柏林注册的非营利组织,有成员选举的董事会和公开预算。 2022 年 12 月从 Gitea 分叉,正是因为 Gitea Ltd 在未经社区同意的情况下控制了商标,项目结构也反映了这次教训。 Forgejo v15.0 LTS 于 2026 年 4 月 16 日发布,长期支持到 2027 年 7 月。
我的部署跑在一台 64 GB 内存的 Intel NUC 上。 Forgejo v15 LTS 、 Postgres 17 和 Traefik 在 Docker 中运行;Actions 运行器放在由 Incus 管理的 KVM 虚拟机里。运行器才是真正需要下功夫的部分,因为它要执行不受信任的代码,例如对我的仓库做 npm 和 pip 安装,并按 Renovate 的日程每天运行。我用了五层重叠隔离:用 KVM 虚拟机避免共享主机内核;在该虚拟机内用 gVisor 作为 Docker 运行时,在用户空间拦截系统调用;每周做一次破坏性重建,从干净基础镜像销毁并重建整个虚拟机;用 nftables 做出口过滤,阻断对局域网的访问;以及使用范围绑定的运行器令牌,确保令牌只能在注册范围内使用。这些技术本身并不新,但把它们整合到一台 NUC 的单人家庭实验室里,才使整体方案可行。
这其中有真实的权衡。可发现性和社交网络是最大的代价,因为贡献者习惯在 GitHub 上找到我。我的计划是在迁移完成后将每个公共 GitHub 仓库归档,并在 README 中指向 code.jorijn.com,以保持发现路径。 Forgejo Actions 倾向于与 GitHub Actions 保持相似体验,而不是完全兼容,所以有些功能会静默失效,比如工作流级别的权限受限,有些 action 需要固定版本或自行分叉。 Forgejo 没有 Dependabot,因此我在同一自托管运行器上运行 Renovate 。也没有 7×24 的厂商支持,只有一个 issue 跟踪器和一个聊天室——对单人运维足够,但对更大团队可能难以扩展。
这种迁移并不适合所有人。如果你的团队不想运行基础设施、如果你高度依赖 GitHub 特有功能如 Codespaces 或 Advanced Security 、或者你的贡献者群体完全依赖 GitHub 的社交网络,那么这些摩擦可能不值得。荷兰政府的方法是个好范例:code.overheid.nl 是软启动平台,而非全面替代。我的部署也是同样的思路:Forgejo 是我的主平台,GitHub 是镜像,我也不排除以后再调整。关键是:在一台 NUC 上实现一个有防护能力的自托管 Forgejo 部署是可行的,但运行器需要真正用心对待;如果你不准备处理 KVM 隔离、 gVisor 、 nftables 和每周重建,最好把 CI 任务放到托管运行器上,或者继续留在 GitHub 。
There are several reasons why I moved my code from GitHub to a self-hosted Forgejo instance, and they all boil down to one thing: ownership. GitHub logged 257 incidents in the year leading up to April 2026, 48 of them major, and the CTO publicly apologized, saying capacity needs to scale 30x to keep up with AI-driven load. But the outages are a symptom, not the cause. The real shift happened in August 2025 when GitHub stopped having its own CEO and was absorbed into Microsoft's CoreAI division, the same group building Copilot. When you push code to GitHub now, you're pushing it to a unit of Microsoft's AI organization, and that changes the relationship fundamentally.
The training-data default flip in April 2026 made this concrete. GitHub changed its privacy statement so that interaction data from Copilot Free, Pro, and Pro+ users is now used for AI training by default, with no repository-level opt-out. As a maintainer, I can't tell GitHub not to train on interactions inside my codebase. The opt-out is per user account, meaning my code becomes training material whenever anyone using Copilot touches it, regardless of how I license it. Private repositories get a narrower carve-out than it sounds, since "code snippets and interaction context" generated while Copilot is active are still collected. Only Copilot Business and Enterprise customers are exempt, and only because they're governed by separate Data Protection Agreements.
Then there's the jurisdiction problem. GitHub and Microsoft are US companies, which means anything they hold falls under FISA Section 702 and the CLOUD Act, regardless of where the data physically sits. GitHub's EU data residency option, launched in October 2024, solves data location but not jurisdiction, since CLOUD Act exposure follows corporate control, not geography. Microsoft's own attorney told a French Senate hearing under oath that he couldn't guarantee EU data stored in European datacenters was safe from silent US government access. The Dutch government reached the same conclusion and soft-launched code.overheid.nl in April 2026, a self-hosted Forgejo instance, specifically because the ministry needed to publish source code on a platform it actually owns.
I chose Forgejo over GitLab for two reasons. First, licensing. GitLab is open core, with many production features locked behind enterprise tiers. Forgejo relicensed to GPLv3+ in August 2024 with the explicit goal of staying copyleft and resisting commercial capture. Second, governance. Forgejo lives under Codeberg e.V., a non-profit registered in Berlin with a member-elected board and public budgets. The fork from Gitea in December 2022 happened precisely because Gitea Ltd took control of trademarks without community consent, and the project's structure now reflects that lesson. Forgejo v15.0 LTS shipped on April 16, 2026, with long-term support through July 2027.
My setup runs on a single Intel NUC with 64 GB of RAM. Forgejo v15 LTS, Postgres 17, and Traefik live inside Docker, and an Incus-managed KVM virtual machine hosts the Actions runner. The runner is where the real engineering lives, since it has to execute untrusted code like npm and pip installs against my repositories on a daily Renovate schedule. I use five overlapping isolation layers: a KVM virtual machine so the host kernel isn't shared, gVisor as the Docker runtime inside that VM to intercept system calls in user space, a weekly destructive rebuild that destroys and recreates the entire VM from a fresh base image, an nftables egress filter that blocks access to my LAN, and scope-bound runner tokens that can't be used outside their registered scope. None of these primitives are novel, but wiring them together for a single-user homelab on one NUC is what makes the setup work.
There are real tradeoffs. Discovery and the social graph are the biggest cost, since contributors expect to find me on GitHub. My plan is to archive each public GitHub repository once the migration is complete and point the README at code.jorijn.com, keeping the discovery path intact. Forgejo Actions aims for familiarity with GitHub Actions, not compatibility, so some things break silently, like permissions blocks at the workflow level, and some actions need pinning or forking. Forgejo doesn't have Dependabot, so I run Renovate on the same self-hosted runner instead. And there's no 24/7 vendor support, just an issue tracker and a chat room, which is fine for a one-person operation but might not scale for larger teams.
This move isn't for everyone. If your team has no appetite for running infrastructure, if you're heavily invested in GitHub-specific features like Codespaces or Advanced Security, or if your contributor base depends entirely on the GitHub social graph, the friction may not be worth it. The Dutch government's approach is a good model: code.overheid.nl is a soft-launch platform, not a wholesale replacement. My setup has the same shape, Forgejo is canonical for my work, GitHub is a mirror, and I'm willing to revisit that later. The key point is that a defensible self-hosted Forgejo deployment is achievable on a single NUC, but the runner is the part that requires real care, and if you're not prepared to think about KVM isolation, gVisor, nftables, and weekly rebuilds, you should run your CI jobs on a managed runner host or stay on GitHub.
361 comments • Comments Link
Git 的设计初衷是去中心化的,但凭借出色的工具、可扩展性和维护能力,GitHub 已成为事实上的中心枢纽。很多人认为,尽管技术上可以离开 GitHub,但为了项目的可发现性和长期存续,在 GitHub 上保留镜像仍然非常重要,因为那些托管在小众或自托管平台上的项目常常在镜像消失后也逐渐沉寂。
促使人们想要离开 GitHub 的一个重要原因是,GitHub 使用公开仓库来训练像 Copilot 这样的商业 AI 模型,却没有征得开发者的明确同意。这让许多开发者感到被背叛,因为开源社区原有的契约——例如署名和许可证合规——被利益化的行为所践踏。
围绕 AI 训练的法律环境正在向大型科技公司倾斜,法院通常裁定使用公开数据进行训练是允许的。这使得一些人觉得阻止 AI 抓取数据的斗争已近失败,从而考虑通过限制访问或制定新许可证来应对,但这些方案往往被认为不切实际或效果有限。
除了代码托管外,GitHub 的真实价值还在于其集成的项目管理功能,比如 issue 跟踪、 CI/CD 和访问控制。迁移仓库本身并不难,但要把项目的整个表面搬走——包括问题记录和 CI 配置——却是离开 GitHub 的主要障碍。
Forgejo 被视为一个真正可定制、由社区治理的替代方案。与 GitHub 不同,Forgejo 的架构从一开始就为定制设计,允许用户以较少的工作实现私有仓库的"展示"模式等功能,从而提供实际可用的自由,而不仅仅是理论上的源代码访问。
GitHub 的"社交组件",如星标、关注者和协作工具,被一些人视为双刃剑。它们确实有助于项目可发现性,但也催生了虚荣指标,并在用户和贡献者中滋生出某种权利感,这可能侵蚀 FOSS 的核心原则。
虽然去中心化经常被讨论,但现实是多数系统随时间推移会趋向中心化。真正的难题不仅在于离开 GitHub,而在于避免简单地建立另一个新的中心;例如 Forgejo 的兴起本身也可能形成一种新的依赖。
对于不想自托管的人来说,找到简单、低成本的 git 托管服务很困难。多数替代方案要么定价偏向企业级,要么捆绑许多不必要的功能(如 CI/CD 和 AI 工具),市场上缺少类似 GitHub 早期每月 7 美元那样基础且经济实惠的选择。
推动离开 GitHub 的动力来自多重因素的叠加:频繁的服务中断、反 AI 情绪、以及出于政治不稳定和供应链风险对美国科技巨头的普遍不信任。这使得像 Codeberg 这样的欧洲替代方案和各种自托管解决方案更具吸引力。
尽管存在离开 GitHub 的运动,但凭借其强大的网络效应、免费资源和高可发现性,GitHub 仍然是开源项目的主导平台。对许多开发者,尤其是处于职业早期的人来说,GitHub 个人资料在求职时仍然非常重要,因此短期内全面撤离的可能性不大。 • Git was designed to be decentralized, but GitHub became the de facto central hub due to its superior tooling, scalability, and maintenance. Many argue that while moving away from GitHub is possible, maintaining a mirror on GitHub is crucial for discoverability and longevity, as projects on niche or self-hosted platforms often disappear when their mirrors die.
• A major reason for leaving GitHub is the use of public repositories to train commercial AI models like Copilot without explicit consent from developers. This has led to a sense of betrayal, as the social contract of open source, which includes attribution and license compliance, was violated for profit.
• The legal landscape around AI training is shifting in favor of BigTech, with courts often ruling that training on publicly available data is acceptable. This has led some to believe the battle to prevent AI scraping is already lost, pushing developers to consider gated access or new licenses, though these are seen as impractical or ineffective.
• Beyond hosting, GitHub's real value lies in its integrated project management tools, such as issue tracking, CI/CD, and access control. Moving a repository is easy, but migrating the entire project surface, including issues and CI configurations, is the primary barrier to leaving.
• Forgejo is highlighted as a genuinely hackable and community-governed alternative. Unlike GitHub, its architecture is designed for customization, allowing users to implement features like "showcase" modes for private repos with minimal effort, providing actual freedom rather than just theoretical access to source code.
• The "social component" of GitHub, including stars, followers, and collaboration tools, is seen by some as a double-edged sword. While it aids discoverability, it also creates vanity metrics and a sense of entitlement among users and contributors, which can be detrimental to the core principles of FOSS.
• Decentralization is often discussed, but the reality is that most systems tend to centralize over time. The challenge is not just to move away from GitHub but to avoid simply creating a new center, as seen with the rise of Forgejo, which itself becomes a new dependency.
• For those who do not want to self-host, finding a simple, low-cost git hosting service is difficult. Most alternatives are enterprise-priced or come with unnecessary features like CI/CD and AI tools, whereas the market lacks a bare-bones, affordable option similar to GitHub's early $7 plan.
• The push to leave GitHub is driven by a convergence of factors: frequent outages, anti-AI sentiment, and a general distrust of US-based tech giants due to political instability and supply chain risks. This has made European alternatives like Codeberg and self-hosted solutions more attractive.
• Despite the movement away from GitHub, it remains the dominant platform for open source due to its network effects, free resources, and discoverability. For many developers, especially those early in their careers, a GitHub profile is still essential for employment, making a complete exodus unlikely in the near term.