Quack: The DuckDB Client-Server Protocol
383 points
• 6 days ago
• Article
Link
DuckDB 团队推出了 Quack,一种新的客户端 - 服务器协议,允许多个 DuckDB 实例相互通信。传统上 DuckDB 是进程内数据库,类似于 SQLite,在单一进程内运行、没有客户端 - 服务器架构。虽然这种设计在交互式分析和嵌入式场景中很出色,但当多个进程需要同时修改同一个数据库时就显得不足。很多用户不得不采用变通办法,比如自建 RPC 、使用 Arrow Flight SQL 的第三方实现,甚至在 PostgreSQL 中运行 DuckDB 。 Quack 直接填补了这一空白,使独立进程能够并发读写,实现完整的多用户体验。
从用户角度看,Quack 很容易上手。客户端和服务器都是 DuckDB 实例,功能通过 DuckDB v1.5.2 中的 Quack 扩展提供。使用时在一台实例上启动 Quack 服务器,另一台用简单的认证令牌连接即可。连接后,查询远程表与查询本地表一样简单。客户端还可以在远程服务器上创建表,或通过 query 函数把复杂的原生 SQL 直接发到远端执行。
Quack 的设计基于成熟技术并针对性能进行了优化。它以 HTTP 作为传输层,能利用现有且高度优化的基础设施,便于与负载均衡、防火墙等工具集成,也支持 DuckDB-Wasm 实例从浏览器原生连接。协议消息使用自定义 MIME 类型 application/duckdb,复用 DuckDB 自身高效且经过验证的序列化原语。安全性默认通过自动生成的认证令牌和绑定到 localhost 来保障;如果需要将服务器暴露到互联网,强烈建议用 nginx 等 HTTP 代理做 SSL 终止。性能方面,Quack 优化到可以在一次往返中完成查询及其初始结果返回,这对延迟敏感的操作至关重要,同时在大批量数据传输上也非常高效。
对 Quack 、 PostgreSQL 和 Arrow Flight SQL 的基准测试显示了令人信服的结果。在批量数据传输场景中,Quack 表现显著优于两者——在不到 5 秒内传输了 6000 万行数据,比 Arrow Flight 更快,且远超 PostgreSQL 。在小写入场景(强调往返效率)下,Quack 在最多 8 个并行线程时也意外地优于 PostgreSQL,达到约每秒 5500 次事务的水平。这些性能使 DuckDB 能够支持新的场景,例如汇总遥测数据或让多个数据生产者驱动仪表板,从而将其从"进程内分析工具"的小众角色推向数据架构中的更核心位置。
Quack 的发布只是更大路线图的第一步。未来计划包括把 Quack 集成到 DuckLake 以支持远程 catalog 服务器,改进 DuckDB 核心以应对更高的事务吞吐和更多并行线程;团队还在考虑通过扩展支持自定义消息,并增加用于只读副本的复制功能。该协议将在 2026 年秋与 DuckDB v2.0 一同为生产环境的发布而进一步完善。团队也承认 MotherDuck 、 GizmoSQL 等项目对其设计的影响;虽然他们选择不采用 Arrow Flight SQL,以保持对序列化的控制并避免强制性的多轮往返,但仍认可其作为交换格式的价值。
The DuckDB team has introduced Quack, a new client-server protocol that allows multiple DuckDB instances to communicate with each other. Traditionally, DuckDB has been an in-process database, meaning it runs within a single process without a client-server architecture, similar to SQLite. While this design excels in interactive analytics and embedded use cases, it falls short when multiple processes need to modify the same database simultaneously. Many users resorted to workarounds like custom RPC solutions, third-party implementations using Arrow Flight SQL, or even running DuckDB inside PostgreSQL. Quack addresses this gap directly, enabling a full multi-user experience where separate processes can read and write data concurrently.
From a user perspective, Quack is straightforward to set up. Both the client and server are DuckDB instances, and the functionality is provided through a Quack extension available in DuckDB v1.5.2. To connect, a user starts a Quack server on one instance and connects to it from another using a simple authentication token. Once connected, querying a remote table is as easy as querying a local one. The protocol also supports creating tables on the remote server from the client side and even shipping complex, verbatim SQL queries for remote execution using a `query` function.
The design of Quack is built on proven technologies and optimized for performance. It uses HTTP as its transport layer, benefiting from a ubiquitous and highly optimized infrastructure that is easy to manage with tools like load balancers and firewalls. This also allows DuckDB-Wasm instances to connect natively from a browser. Protocol messages use a custom MIME type, `application/duckdb`, which leverages DuckDB's own efficient and well-tested serialization primitives. Security is handled by default through auto-generated authentication tokens and binding to localhost, with strong recommendations to use an HTTP proxy like nginx for SSL termination when exposing the server to the internet. For performance, Quack is optimized to handle a query and its initial results in a single round trip, which is crucial for latency-sensitive operations, while also being highly efficient at bulk data transfer.
Benchmarking Quack against PostgreSQL and Arrow Flight SQL reveals compelling results. For bulk data transfer, Quack significantly outperforms both, transferring 60 million rows in under 5 seconds, which is faster than Arrow Flight and dramatically faster than PostgreSQL. For small writes, which stress round-trip efficiency, Quack surprisingly outperforms PostgreSQL up to 8 parallel threads, achieving around 5,500 transactions per second. This performance unlocks new use cases for DuckDB, such as centralizing telemetry data or driving dashboards from multiple data producers, moving it beyond its niche as an in-process analytics tool into a more central role in data architecture.
The release of Quack is the first step in a larger roadmap. Future plans include integrating Quack into DuckLake to enable remote catalog servers and improving DuckDB's core to handle higher transaction throughput with more parallel threads. The team is also considering extending the protocol to allow custom messages via extensions and adding replication capabilities for read replicas. The protocol will be refined for a production release alongside DuckDB v2.0 in the fall of 2026. The team acknowledges the influence of projects like MotherDuck and GizmoSQL, and while they chose not to use Arrow Flight SQL to maintain control over their serialization and avoid mandatory multi-round-trip waits, they see its value as an interchange format.
83 comments • Comments Link
DuckDB 已成为许多工作流中不可或缺的工具,包括传感器数据摄取、 LLM 交互、分析和数据管道,用户赞赏其多功能性和性能。 Quack 协议解决了一个常见痛点:当另一个进程锁定数据库时无法访问 DuckDB,从而实现了并发访问,无需构建自定义服务器层。 Quack 为基于 DuckDB 的应用提供了水平扩展能力,使其更适合生产场景,比如内部分析平台和可观测性数据系统。将 DuckLake 与 Quack 结合使用,可以替代更重的系统(如 Mimir 或 ClickHouse),显著降低运维复杂度,尤其对已在 DuckDB 生态中投入的团队更有吸引力。
一些用户对 DuckDB 范围不断扩大的方向感到困惑,拿它与 SQLite 的明确角色作比较;另一些人则认为 DuckDB 作为嵌入式分析引擎的演进既连贯又优雅。一个实际用例是:在数据管道中生成 .duckdb 文件并通过 S3 提供服务,使应用以约 30GB 数据集达到类似 BigQuery 或 ClickHouse 的性能,而无需相应的基础设施成本。 Quack 更应被理解为一种将 DuckDB 作为执行层整合入更大数据工作流的简洁方式,负责远程访问和共享计算资源,而不是把 DuckDB 变成像 Postgres 那样的传统 RDBMS 。
MotherDuck 有自己的专有协议,和 Quack 是分开的,尽管将来可能会支持 Quack;Quack 被设计为任何 DuckDB 客户端 - 服务器通信的通用协议。对于并发需求适度(几千条记录、 2–3 个用户)的小型多用户应用,建议使用 Firebird 或 MySQL 作为比 Postgres 更简单的替代方案,尽管通常不建议将 DuckDB 用于事务性多用户负载。 Quack 为 DuckLake 实现了真正的客户端 - 服务器模型,允许远程客户端在不直接访问底层存储的情况下查询数据,因为远程 DuckDB 实例负责目录管理和计算。
Quack 选择 HTTP/2 作为传输层引发争议:批评者认为它并非大数据传输和流式传输的最优选,而支持者指出它使基于浏览器的原生访问(通过 DuckDB-WASM)成为可能,并且简化了反向代理后的部署。在 15 Gbps 网络上 4.6 秒传输 76GB CSV 的基准测试因为缺乏压缩后大小和编码细节而受到质疑,有估计认为实际传输速率相对于硬件限制并不理想。 WASM 兼容性被强调为关键功能,允许在浏览器中运行的 DuckDB 通过 Quack 直接连接到远程 DuckDB 实例,从而在不同环境间保持一致性。
总体讨论显示,人们对 DuckDB 能力扩展表现出强烈热情,尤其是 Quack 实现了客户端 - 服务器架构与远程访问。尽管有人担忧范围蔓延,但主流观点认为 DuckDB 作为嵌入式分析引擎的核心价值未变,而 Quack 和 DuckLake 等新功能扩展了其在生产负载中的实用性。协议设计选择,尤其是对 HTTP/2 的采用,在部署优势与理论性能权衡之间引发了争论。人们明显有兴趣用 DuckDB 替代更重的分析基础设施,尽管在传统事务性多用户应用中,Postgres 或 MySQL 等数据库仍更为合适。 • DuckDB has become an essential tool for many workflows, including sensor data ingestion, LLM interactions, analytics, and data pipelines, with users praising its versatility and performance.
• The Quack protocol solves a common frustration: the inability to inspect a DuckDB database while another process has it locked, enabling concurrent access without building custom server layers.
• Quack enables horizontal scaling for DuckDB-based applications, making it more viable for production use cases like internal analytics platforms and observability data systems.
• DuckLake combined with Quack could replace heavier systems like Mimir or ClickHouse with significantly less operational complexity, especially for teams already invested in the DuckDB ecosystem.
• Some users express confusion about DuckDB's expanding scope, comparing it to SQLite's well-defined role, though others argue that DuckDB's evolution as an embedded analytics engine with optional extensions is coherent and tasteful.
• A practical use case involves generating .duckdb files in data pipelines and serving them via S3, allowing applications to achieve BigQuery or Clickhouse-like performance without the infrastructure cost, with datasets around 30GB.
• Quack is best understood not as DuckDB becoming a traditional RDBMS like Postgres, but as a cleaner way to integrate DuckDB as an execution layer within larger data workflows, handling remote access and shared compute resources.
• MotherDuck has its own proprietary protocol and is separate from Quack, though it may support Quack in the future; Quack is designed as a general-purpose protocol for any DuckDB client-server communication.
• For small-scale multi-user applications with modest concurrency needs (a few thousand records, 2-3 users), options like Firebird or MySQL are suggested as simpler alternatives to Postgres, though DuckDB is generally not recommended for transactional multi-user workloads.
• Quack enables a true client-server model for DuckLake, allowing remote clients to query data without direct access to the underlying storage, since the remote DuckDB instance handles both catalog and compute.
• The choice of HTTP/2 for the Quack protocol is debated, with critics arguing it's suboptimal for large data transfers and streaming, while supporters note it enables native browser-based access via DuckDB-WASM and simplifies deployment behind reverse proxies.
• Benchmarks showing 76GB CSV transfer in 4.6 seconds on a 15 Gbps network are questioned for lack of detail on compressed size and encoding, with estimates suggesting the actual transfer rate may be underwhelming relative to hardware limits.
• The WASM compatibility is highlighted as a key feature, allowing DuckDB running in a browser to connect directly to remote DuckDB instances via Quack, maintaining consistency across environments.
The discussion reveals a strong enthusiasm for DuckDB's expanding capabilities, particularly around the Quack protocol enabling client-server architectures and remote access. While some users express concern about scope creep, the prevailing view is that DuckDB's core value as an embedded analytics engine remains intact while new features like Quack and DuckLake extend its utility for production workloads. The protocol design choices, especially the use of HTTP/2, spark debate between practical deployment benefits and theoretical performance trade-offs. There's clear interest in DuckDB replacing heavier infrastructure for analytics use cases, though it's generally not seen as suitable for traditional transactional multi-user applications where databases like Postgres or MySQL remain more appropriate.