What It Does
ClickHouse is an open-source columnar OLAP (Online Analytical Processing) database designed for real-time analytics on large datasets. It originated at Yandex in 2016 and was spun out as an independent company (ClickHouse Inc.) in 2021. The database excels at fast aggregation queries over billions of rows, making it suitable for observability, data warehousing, real-time dashboards, and machine learning feature stores.
ClickHouse Inc. offers both the open-source self-hosted database (Apache-2.0) and ClickHouse Cloud, a managed service. In November 2025, ClickHouse acquired LibreChat to build the “Agentic Data Stack” — a natural-language interface for querying analytical data via LLMs. As of January 2026, the company is valued at approximately $15 billion after a $400M Series D round.
Key Features
- Columnar storage with vectorized execution: Processes analytical queries orders of magnitude faster than row-oriented databases by reading only required columns and using SIMD instructions
- Real-time ingestion: Handles millions of rows per second on commodity hardware with asynchronous inserts and background merges (MergeTree engine family)
- SQL-compatible: Standard SQL interface with extensions for analytical functions, materialized views, and approximate query processing (HyperLogLog, quantiles)
- Horizontal scaling: Distributed query execution across shards with configurable replication via ClickHouse Keeper (ZooKeeper replacement)
- ClickHouse Cloud: Managed service with auto-scaling, separation of storage and compute, and pay-per-query pricing
- Broad ecosystem integration: Kafka, S3, PostgreSQL, MySQL connectors; native Grafana, Superset, and Metabase support
- MCP server: Official Model Context Protocol server for LLM-driven analytical queries (post-LibreChat acquisition)
- Materialized views: Incrementally updated pre-aggregations for sub-second dashboard queries
Use Cases
- Real-time observability: Log and trace analysis at scale (used by Cloudflare, Uber, GitLab for observability pipelines)
- Product analytics: User behavior tracking and funnel analysis with sub-second query times on billions of events
- Data warehousing: Cost-effective alternative to Snowflake/BigQuery for teams comfortable with self-hosting or ClickHouse Cloud
- AI-driven analytics: Post-LibreChat acquisition, positioned as the backend for natural-language data querying via the “Agentic Data Stack”
- Time-series analytics: High-cardinality metrics storage and querying as an alternative to specialized time-series databases
Adoption Level Analysis
Small teams (<20 engineers): Does not fit for self-hosted deployments. ClickHouse clusters require dedicated operations expertise for shard management, replication, and capacity planning. ClickHouse Cloud reduces this burden but introduces cost that may not be justified at small scale. DuckDB or SQLite are better fits for small analytical workloads.
Medium orgs (20-200 engineers): Good fit, particularly via ClickHouse Cloud. The managed service handles the operational complexity while providing the performance benefits. Self-hosted deployments are feasible but require at least one engineer with ClickHouse expertise. The “too many parts” failure mode (see Notes) is a common pitfall that requires understanding of ClickHouse internals.
Enterprise (200+ engineers): Strong fit. ClickHouse is battle-tested at Cloudflare, Uber, Spotify, and many other large-scale deployments. The $15B valuation and $1B+ total funding provide long-term viability. However, enterprise deployments require dedicated data platform teams. Data rebalancing when adding shards is a known pain point — ClickHouse does not automatically redistribute data, requiring manual intervention with limited tooling.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| Snowflake | Fully managed, separation of storage/compute, mature governance | You need zero-ops analytics with strong enterprise governance and can afford premium pricing |
| DuckDB | Embedded, single-node, zero infrastructure | Your analytical workloads fit on a single machine and you want the simplest possible setup |
| Apache Druid | Better for high-concurrency low-latency queries, native time-series support | You need sub-second queries at very high concurrency for user-facing dashboards |
| Databricks | Unified analytics and ML platform, Delta Lake | You need combined ETL, analytics, and ML in one platform |
| TimescaleDB | PostgreSQL extension, familiar SQL, better for mixed OLTP/OLAP | You want to add analytics to an existing PostgreSQL deployment |
Evidence & Sources
- Trigger.dev ClickHouse “too many parts” post-mortem
- ClickHouse cluster silent failure post-mortem (Medium)
- ClickHouse challenging journey in production (Maxilect)
- Contentsquare: scaling out ClickHouse cluster
- ClickHouse acquires LibreChat (official blog)
- Bloomberg: ClickHouse lands $15B valuation (Jan 2026)
- ClickHouse raises $350M Series C (May 2025)
- 13 common ClickHouse mistakes (official blog)
Notes & Caveats
- “Too many parts” is the most common production failure mode: When ingestion patterns create too many small data parts in a partition (default limit: 3,000), inserts are rejected. This has caused data loss incidents at Trigger.dev and others. Partition key design is critical and non-obvious for newcomers.
- No automatic data rebalancing: Adding shards to a ClickHouse cluster does not redistribute existing data. The available rebalancing utilities have “limitations in terms of performance and usability.” This makes capacity planning important upfront.
- ZooKeeper/Keeper dependency: Replicated setups require ClickHouse Keeper (or ZooKeeper), adding operational complexity. Keeper metadata corruption can cascade to cluster-wide read-only states.
- License is genuinely open: Apache-2.0, not source-available or BSL. This is a positive differentiator vs. some competitors that have changed licenses.
- LibreChat acquisition strategic risk: The “Agentic Data Stack” vision ties an AI chat UI to an OLAP database. Hacker News commenters flagged that LLMs are still unreliable for business-critical SQL generation, with hallucination and accuracy concerns even with extensive schema documentation.
- Revenue growth is strong but from a low base: ~$160M ARR in 2025 (estimated by Sacra), up 256% YoY. The $15B valuation implies a ~94x revenue multiple, which is aggressive even for high-growth infrastructure.