What It Does
Langfuse is an open-source LLM engineering platform that covers the full lifecycle of LLM application development and production: distributed tracing of LLM calls and agent steps, evaluation via LLM-as-a-judge or human annotation, prompt management with versioning and A/B testing, and dataset management for systematic testing. It is framework-agnostic, integrating via SDKs (Python, TypeScript/JS), OpenTelemetry, and native integrations with LangChain, LlamaIndex, LiteLLM, OpenAI SDK, and more.
Langfuse was founded in 2023 by Clemens Rawert and Max Langenkamp (YC W23), built on ClickHouse for its analytics backend from day one. In January 2026, ClickHouse acquired Langfuse alongside its $400M Series D at $15B valuation, making Langfuse part of ClickHouse’s AI observability platform strategy. The project ended 2025 with 21k+ GitHub stars, 26M+ monthly SDK installs, 8,000+ monthly active self-hosted instances, and customers including 19 of the Fortune 50 and 63 of the Fortune 500.
Key Features
- Distributed tracing: Captures traces of every LLM call, tool invocation, retrieval step, and chain segment with latency, token counts, cost, and model metadata. Supports OpenTelemetry for integration with existing observability stacks.
- LLM-as-a-judge evaluation: Built-in evaluation pipelines scoring traces against custom criteria (faithfulness, relevance, quality) using configurable LLM judges without requiring a separate evaluation library.
- Human annotation queues: Route sampled production traces to human annotators for labeling, creating feedback loops for continuous improvement.
- Prompt management: Version-controlled prompt registry with A/B testing, rollback, and production/staging environments. Prompt changes tracked alongside their downstream metric impact.
- Datasets and experiments: Create evaluation datasets from production traces, run experiments against them, and compare results across model/prompt/chain configurations.
- Self-hosting: Docker Compose deployment in under 5 minutes. Kubernetes Helm chart available. All data stays in the operator’s infrastructure. 8,000+ active self-hosted instances.
- Framework-agnostic SDK: Python and TypeScript SDKs with callback-based auto-instrumentation for LangChain and LlamaIndex, or manual decorator-based instrumentation for arbitrary applications.
Use Cases
- LLM application observability: Complete visibility into production LLM application behavior — which prompts fire, which models are called, what the latency and cost per request is, and whether output quality is degrading.
- RAG pipeline debugging: Trace individual retrieval and generation steps to identify where in the pipeline quality problems originate.
- Prompt optimization: Version and A/B test prompts in production, tracking downstream metric impact via integrated evaluation.
- Compliance and audit: Full trace history for organizations that need to audit LLM decisions (financial services, healthcare) — particularly useful with self-hosted deployment keeping data on-premise.
- Team-scale LLM development: Shared trace history, annotation queues, and experiment comparison for teams where multiple engineers iterate on the same application.
Adoption Level Analysis
Small teams (<20 engineers): Strong fit. The cloud-hosted tier has a generous free plan. Self-hosting via Docker Compose is genuinely simple — documented as a 5-minute setup. The framework-agnostic SDK and LangChain/LlamaIndex auto-instrumentation mean most small teams can add Langfuse in under an hour with near-zero code changes.
Medium orgs (20–200 engineers): Strong fit. Langfuse’s unified product (tracing + evaluation + prompt management + datasets) means medium teams avoid stitching together three separate tools. The self-hosting option addresses data residency concerns that cloud-only alternatives cannot. The prompt management feature is particularly valuable for teams iterating rapidly on prompts without a formal release process.
Enterprise (200+ engineers): Reasonable fit. 19 of the Fortune 50 reportedly use Langfuse, and the ClickHouse acquisition provides organizational backing for enterprise roadmap commitments. Enterprise-specific features (SCIM, advanced audit logs, dedicated support) are commercially licensed. SSO (SAML/OAuth) is MIT-licensed and available in self-hosted deployments. The ClickHouse acquisition may accelerate analytics capabilities but introduces strategic dependency on ClickHouse’s roadmap priorities.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| LangSmith | Native LangChain tracing, tighter LangGraph integration | You are fully committed to LangChain/LangGraph with no self-hosting requirement |
| TruLens | Feedback functions injected into traces, stronger RAG diagnostic focus | You need span-level RAG pipeline diagnosis rather than a full platform |
| RAGAS | Pure evaluation library without tracing | You want only metrics without observability infrastructure |
| DeepEval | Pytest-native, CI/CD enforcement focus, 50+ metrics | You prioritize deployment gate enforcement over production observability |
| Arize Phoenix | Open-source, strong on ML observability + LLM, dataset analysis | You need combined traditional ML + LLM observability |
Evidence & Sources
- Langfuse GitHub — 21k+ stars, MIT license
- ClickHouse acquires Langfuse announcement — Acquisition details
- Langfuse joining ClickHouse post — Company perspective on acquisition
- LLM Evaluation Frameworks Compared (Atlan 2026) — Independent comparison with RAGAS and TruLens
- Best LLM Observability Tools 2026 (Firecrawl) — Independent market review
- Langfuse self-hosting documentation — Technical self-hosting reference
Notes & Caveats
- ClickHouse acquisition (January 2026): Langfuse was acquired as part of ClickHouse’s $400M Series D. The acquisition is positioned as “roadmap stays the same, open source commitment maintained.” However, all acquisitions carry strategic risk — if ClickHouse’s priorities shift or the product is embedded more deeply into the ClickHouse commercial platform, the independent neutral positioning may erode. Monitor the open-source changelog for feature gatekeeping changes post-acquisition.
- Enterprise features are commercially licensed: SCIM and Audit Logs require commercial enterprise license. Regular SSO (SAML/OAuth) remains MIT-licensed. Teams requiring SCIM for large user bases need to budget for commercial licensing.
- ClickHouse dependency in self-hosting: Langfuse’s self-hosted deployment requires a ClickHouse instance. This is a meaningful infrastructure prerequisite — teams self-hosting need ClickHouse operational expertise or must use the simplified Docker Compose bundle (which embeds a single-node ClickHouse, not suitable for very high trace volumes).
- Evaluation is secondary to tracing: While Langfuse has solid LLM-judge evaluation features, its strength is tracing and prompt management. Teams who need the deepest evaluation capabilities (50+ metric types, adversarial testing) will likely still want RAGAS or DeepEval alongside Langfuse for evaluation depth.