LiteLLM

★ New
assess
AI / ML open-source MIT open-source

What It Does

LiteLLM is an open-source Python SDK and proxy server (AI Gateway) that provides a unified OpenAI-compatible API for calling 100+ LLM providers including OpenAI, Anthropic, Azure, AWS Bedrock, Google Vertex AI, Cohere, HuggingFace, vLLM, and NVIDIA NIM. It translates requests from a single API format into provider-specific formats, handling authentication, cost tracking, load balancing, fallbacks, rate limiting, and virtual key management.

The project is maintained by BerriAI (YC W23, $2.1M raised) and has significant community adoption with 41k+ GitHub stars and 1,300+ contributors. LiteLLM can be used as a Python library (from litellm import completion) or deployed as a containerized proxy server that acts as a drop-in replacement for the OpenAI API endpoint. Enterprise features (SSO, audit logs, custom SLAs) are available under a separate commercial license.

Key Features

  • Unified OpenAI-compatible API: Single endpoint format for 100+ LLM providers — existing OpenAI SDK code works by changing the base URL.
  • Cost tracking and budget controls: Spend attribution per virtual key, user, team, or organization with configurable budget limits.
  • Load balancing: Distributes requests across multiple model deployments with configurable routing strategies.
  • Automatic fallbacks: Switches to backup models/providers when primary fails (5xx errors, rate limits, timeouts).
  • Rate limiting: Configurable RPM/TPM (requests/tokens per minute) limits per key, team, or model.
  • Virtual key management: Issue API keys with per-key budgets, model access controls, and expiration.
  • Observability integrations: Built-in support for Langfuse, Arize Phoenix, OpenTelemetry, and logging to S3/GCS.
  • Guardrails: Content moderation and prompt injection detection (basic in OSS, advanced in Enterprise).
  • Prompt formatting: Automatic translation for HuggingFace model prompt templates.
  • Docker deployment: Official container image (ghcr.io/berriai/litellm) with PostgreSQL and optional Redis for state management.

Use Cases

  • Platform team LLM governance: Centralizing all LLM access through a single gateway with cost controls, key management, and audit logging across multiple teams and projects.
  • Multi-provider failover: Applications that need automatic fallback from one provider to another (e.g., OpenAI -> Anthropic -> Azure) without application-level changes.
  • Cost optimization and tracking: Organizations needing granular spend visibility per team, project, or individual developer.
  • Model experimentation: Rapidly testing different LLM providers and models through a consistent API without code changes.
  • Self-hosted AI gateway: Organizations that cannot send prompts through a third-party SaaS gateway (e.g., OpenRouter) for data privacy reasons.

Adoption Level Analysis

Small teams (<20 engineers): Good fit as the Python SDK (from litellm import completion). Minimal setup required for basic multi-provider access. However, the proxy server deployment adds operational overhead (PostgreSQL, optional Redis, container management) that may be excessive for very small teams. Import time is slow (3-4 seconds), which is noticeable in scripts.

Medium orgs (20-200 engineers): Strong fit for the core use case — platform teams managing LLM access for multiple development teams. Virtual key management, cost attribution, and rate limiting are genuinely valuable at this scale. However, operational challenges emerge: PostgreSQL log storage degrades at 1M+ entries (hit within 10 days at 100k requests/day), Python GIL limits throughput under high concurrency, and memory leaks require worker recycling (max_requests_before_restart). Requires a dedicated platform engineer to operate.

Enterprise (200+ engineers): Fit is questionable without the Enterprise license and significant operational investment. The March 2026 supply chain attack (compromised PyPI packages harvested credentials) raises serious trust concerns for security-critical infrastructure. The company’s small size ($2.1M raised, <20 employees) creates sustainability and support risk. At sustained traffic above 500 RPS, Python-native performance limitations become material. Enterprises should evaluate Portkey or build a custom gateway on top of Go/Rust-based infrastructure.

Alternatives

AlternativeKey DifferencePrefer when…
OpenRouterFully managed SaaS, 300+ models, 5% markupYou want zero infrastructure overhead and can tolerate a third-party intermediary
Portkey AIEnterprise-grade managed gateway, now open-source, Go-based performanceYou need production-grade throughput, guardrails, and enterprise governance
Vercel AI GatewayIntegrated with Vercel ecosystem, budget controlsYou are already in the Vercel ecosystem
AWS Multi-Provider Gen AI GatewayNative AWS integration, managed serviceYou are AWS-native and want a first-party solution
Direct provider APIsNo intermediary, maximum control, volume discountsYou use 1-2 providers and want direct SLAs and pricing

Evidence & Sources

Notes & Caveats

  • CRITICAL: March 2026 supply chain attack. PyPI packages v1.82.7 and v1.82.8 were compromised on March 24, 2026, containing credential-harvesting malware that exfiltrated SSH keys, cloud credentials, Kubernetes tokens, and database passwords. Packages were live for ~40 minutes. Docker image users were unaffected. BerriAI engaged Mandiant for forensics and rebuilt their CI/CD pipeline. Any team that installed via pip install litellm during the window must assume full credential compromise and rotate all secrets.
  • PostgreSQL log storage bottleneck. Request logs stored in PostgreSQL degrade performance significantly after 1M+ entries. At 100k requests/day, this threshold is hit within 10 days. Requires manual log rotation or archival — not handled automatically.
  • Python GIL throughput ceiling. As a Python application, LiteLLM inherits the Global Interpreter Lock constraint. At sustained traffic above 500 RPS, latency spikes are reported. Go-based alternatives (Portkey) maintain single-digit microsecond overhead at the same load.
  • Memory leaks require worker recycling. Production deployments need max_requests_before_restart configuration to periodically recycle workers, adding operational complexity.
  • Rapid release cadence creates stability risk. Multiple releases per day are common. This is good for feature velocity but creates a moving target for production pinning. The supply chain attack exploited this rapid release pattern.
  • Slow import time. from litellm import completion takes 3-4 seconds due to heavy dependencies. This is painful for scripts and CLI tools.
  • Small company risk. BerriAI has raised only $2.1M and employs fewer than 20 people. For infrastructure that sits on the critical path of all LLM API calls, the bus factor and support capacity are concerning.
  • Enterprise license is separate. SSO, audit logs, custom SLAs, and advanced guardrails require the Enterprise tier with custom pricing. The open-source version lacks these features.
  • Downstream ecosystem impact. LiteLLM is a transitive dependency of DSPy, MLflow, CrewAI, OpenHands, and other major AI frameworks. The supply chain attack demonstrated that a compromise of LiteLLM propagates across the ecosystem.