Skip to content

Langfuse

★ New
trial
AI / ML open-source MIT freemium

At a Glance

Open-source LLM engineering platform (MIT-licensed, 21k+ GitHub stars) covering observability traces, evaluation, prompt management, and datasets; self-hostable in minutes; acquired by ClickHouse in January 2026.

Type
open-source
Pricing
freemium
License
MIT
Adoption fit
small, medium, enterprise
Top alternatives

What It Does

Langfuse is an open-source LLM engineering platform that covers the full lifecycle of LLM application development and production: distributed tracing of LLM calls and agent steps, evaluation via LLM-as-a-judge or human annotation, prompt management with versioning and A/B testing, and dataset management for systematic testing. It is framework-agnostic, integrating via SDKs (Python, TypeScript/JS), OpenTelemetry, and native integrations with LangChain, LlamaIndex, LiteLLM, OpenAI SDK, and more.

Langfuse was founded in 2023 by Clemens Rawert and Max Langenkamp (YC W23), built on ClickHouse for its analytics backend from day one. In January 2026, ClickHouse acquired Langfuse alongside its $400M Series D at $15B valuation, making Langfuse part of ClickHouse’s AI observability platform strategy. The project ended 2025 with 21k+ GitHub stars, 26M+ monthly SDK installs, 8,000+ monthly active self-hosted instances, and customers including 19 of the Fortune 50 and 63 of the Fortune 500.

Key Features

  • Distributed tracing: Captures traces of every LLM call, tool invocation, retrieval step, and chain segment with latency, token counts, cost, and model metadata. Supports OpenTelemetry for integration with existing observability stacks.
  • LLM-as-a-judge evaluation: Built-in evaluation pipelines scoring traces against custom criteria (faithfulness, relevance, quality) using configurable LLM judges without requiring a separate evaluation library.
  • Human annotation queues: Route sampled production traces to human annotators for labeling, creating feedback loops for continuous improvement.
  • Prompt management: Version-controlled prompt registry with A/B testing, rollback, and production/staging environments. Prompt changes tracked alongside their downstream metric impact.
  • Datasets and experiments: Create evaluation datasets from production traces, run experiments against them, and compare results across model/prompt/chain configurations.
  • Self-hosting: Docker Compose deployment in under 5 minutes. Kubernetes Helm chart available. All data stays in the operator’s infrastructure. 8,000+ active self-hosted instances.
  • Framework-agnostic SDK: Python and TypeScript SDKs with callback-based auto-instrumentation for LangChain and LlamaIndex, or manual decorator-based instrumentation for arbitrary applications.

Use Cases

  • LLM application observability: Complete visibility into production LLM application behavior — which prompts fire, which models are called, what the latency and cost per request is, and whether output quality is degrading.
  • RAG pipeline debugging: Trace individual retrieval and generation steps to identify where in the pipeline quality problems originate.
  • Prompt optimization: Version and A/B test prompts in production, tracking downstream metric impact via integrated evaluation.
  • Compliance and audit: Full trace history for organizations that need to audit LLM decisions (financial services, healthcare) — particularly useful with self-hosted deployment keeping data on-premise.
  • Team-scale LLM development: Shared trace history, annotation queues, and experiment comparison for teams where multiple engineers iterate on the same application.

Adoption Level Analysis

Small teams (<20 engineers): Strong fit. The cloud-hosted tier has a generous free plan. Self-hosting via Docker Compose is genuinely simple — documented as a 5-minute setup. The framework-agnostic SDK and LangChain/LlamaIndex auto-instrumentation mean most small teams can add Langfuse in under an hour with near-zero code changes.

Medium orgs (20–200 engineers): Strong fit. Langfuse’s unified product (tracing + evaluation + prompt management + datasets) means medium teams avoid stitching together three separate tools. The self-hosting option addresses data residency concerns that cloud-only alternatives cannot. The prompt management feature is particularly valuable for teams iterating rapidly on prompts without a formal release process.

Enterprise (200+ engineers): Reasonable fit. 19 of the Fortune 50 reportedly use Langfuse, and the ClickHouse acquisition provides organizational backing for enterprise roadmap commitments. Enterprise-specific features (SCIM, advanced audit logs, dedicated support) are commercially licensed. SSO (SAML/OAuth) is MIT-licensed and available in self-hosted deployments. The ClickHouse acquisition may accelerate analytics capabilities but introduces strategic dependency on ClickHouse’s roadmap priorities.

Alternatives

AlternativeKey DifferencePrefer when…
LangSmithNative LangChain tracing, tighter LangGraph integrationYou are fully committed to LangChain/LangGraph with no self-hosting requirement
TruLensFeedback functions injected into traces, stronger RAG diagnostic focusYou need span-level RAG pipeline diagnosis rather than a full platform
RAGASPure evaluation library without tracingYou want only metrics without observability infrastructure
DeepEvalPytest-native, CI/CD enforcement focus, 50+ metricsYou prioritize deployment gate enforcement over production observability
Arize PhoenixOpen-source, strong on ML observability + LLM, dataset analysisYou need combined traditional ML + LLM observability

Evidence & Sources

Notes & Caveats

  • ClickHouse acquisition (January 2026): Langfuse was acquired as part of ClickHouse’s $400M Series D. The acquisition is positioned as “roadmap stays the same, open source commitment maintained.” However, all acquisitions carry strategic risk — if ClickHouse’s priorities shift or the product is embedded more deeply into the ClickHouse commercial platform, the independent neutral positioning may erode. Monitor the open-source changelog for feature gatekeeping changes post-acquisition.
  • Enterprise features are commercially licensed: SCIM and Audit Logs require commercial enterprise license. Regular SSO (SAML/OAuth) remains MIT-licensed. Teams requiring SCIM for large user bases need to budget for commercial licensing.
  • ClickHouse dependency in self-hosting: Langfuse’s self-hosted deployment requires a ClickHouse instance. This is a meaningful infrastructure prerequisite — teams self-hosting need ClickHouse operational expertise or must use the simplified Docker Compose bundle (which embeds a single-node ClickHouse, not suitable for very high trace volumes).
  • Evaluation is secondary to tracing: While Langfuse has solid LLM-judge evaluation features, its strength is tracing and prompt management. Teams who need the deepest evaluation capabilities (50+ metric types, adversarial testing) will likely still want RAGAS or DeepEval alongside Langfuse for evaluation depth.

Related