What It Does
LiteLLM is an open-source Python SDK and proxy server (AI Gateway) that provides a unified OpenAI-compatible API for calling 100+ LLM providers including OpenAI, Anthropic, Azure, AWS Bedrock, Google Vertex AI, Cohere, HuggingFace, vLLM, and NVIDIA NIM. It translates requests from a single API format into provider-specific formats, handling authentication, cost tracking, load balancing, fallbacks, rate limiting, and virtual key management.
The project is maintained by BerriAI (YC W23, $2.1M raised) and has significant community adoption with 41k+ GitHub stars and 1,300+ contributors. LiteLLM can be used as a Python library (from litellm import completion) or deployed as a containerized proxy server that acts as a drop-in replacement for the OpenAI API endpoint. Enterprise features (SSO, audit logs, custom SLAs) are available under a separate commercial license.
Key Features
- Unified OpenAI-compatible API: Single endpoint format for 100+ LLM providers — existing OpenAI SDK code works by changing the base URL.
- Cost tracking and budget controls: Spend attribution per virtual key, user, team, or organization with configurable budget limits.
- Load balancing: Distributes requests across multiple model deployments with configurable routing strategies.
- Automatic fallbacks: Switches to backup models/providers when primary fails (5xx errors, rate limits, timeouts).
- Rate limiting: Configurable RPM/TPM (requests/tokens per minute) limits per key, team, or model.
- Virtual key management: Issue API keys with per-key budgets, model access controls, and expiration.
- Observability integrations: Built-in support for Langfuse, Arize Phoenix, OpenTelemetry, and logging to S3/GCS.
- Guardrails: Content moderation and prompt injection detection (basic in OSS, advanced in Enterprise).
- Prompt formatting: Automatic translation for HuggingFace model prompt templates.
- Docker deployment: Official container image (
ghcr.io/berriai/litellm) with PostgreSQL and optional Redis for state management.
Use Cases
- Platform team LLM governance: Centralizing all LLM access through a single gateway with cost controls, key management, and audit logging across multiple teams and projects.
- Multi-provider failover: Applications that need automatic fallback from one provider to another (e.g., OpenAI -> Anthropic -> Azure) without application-level changes.
- Cost optimization and tracking: Organizations needing granular spend visibility per team, project, or individual developer.
- Model experimentation: Rapidly testing different LLM providers and models through a consistent API without code changes.
- Self-hosted AI gateway: Organizations that cannot send prompts through a third-party SaaS gateway (e.g., OpenRouter) for data privacy reasons.
Adoption Level Analysis
Small teams (<20 engineers): Good fit as the Python SDK (from litellm import completion). Minimal setup required for basic multi-provider access. However, the proxy server deployment adds operational overhead (PostgreSQL, optional Redis, container management) that may be excessive for very small teams. Import time is slow (3-4 seconds), which is noticeable in scripts.
Medium orgs (20-200 engineers): Strong fit for the core use case — platform teams managing LLM access for multiple development teams. Virtual key management, cost attribution, and rate limiting are genuinely valuable at this scale. However, operational challenges emerge: PostgreSQL log storage degrades at 1M+ entries (hit within 10 days at 100k requests/day), Python GIL limits throughput under high concurrency, and memory leaks require worker recycling (max_requests_before_restart). Requires a dedicated platform engineer to operate.
Enterprise (200+ engineers): Fit is questionable without the Enterprise license and significant operational investment. The March 2026 supply chain attack (compromised PyPI packages harvested credentials) raises serious trust concerns for security-critical infrastructure. The company’s small size ($2.1M raised, <20 employees) creates sustainability and support risk. At sustained traffic above 500 RPS, Python-native performance limitations become material. Enterprises should evaluate Portkey or build a custom gateway on top of Go/Rust-based infrastructure.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| OpenRouter | Fully managed SaaS, 300+ models, 5% markup | You want zero infrastructure overhead and can tolerate a third-party intermediary |
| Portkey AI | Enterprise-grade managed gateway, now open-source, Go-based performance | You need production-grade throughput, guardrails, and enterprise governance |
| Vercel AI Gateway | Integrated with Vercel ecosystem, budget controls | You are already in the Vercel ecosystem |
| AWS Multi-Provider Gen AI Gateway | Native AWS integration, managed service | You are AWS-native and want a first-party solution |
| Direct provider APIs | No intermediary, maximum control, volume discounts | You use 1-2 providers and want direct SLAs and pricing |
Evidence & Sources
- GitHub: BerriAI/litellm — 41k+ stars, 1,300+ contributors
- LiteLLM official documentation
- TrueFoundry: LiteLLM Review 2026 — independent review with pros/cons
- DEV Community: 5 Real Issues With LiteLLM (2026)
- DEV Community: LiteLLM Issues in Production
- LiteLLM security update: March 2026 supply chain incident
- Trend Micro: Inside the LiteLLM Supply Chain Compromise
- HeroDevs: The LiteLLM Supply Chain Attack
- InfoWorld: LiteLLM open-source gateway
- Y Combinator: LiteLLM company page
Notes & Caveats
- CRITICAL: March 2026 supply chain attack. PyPI packages v1.82.7 and v1.82.8 were compromised on March 24, 2026, containing credential-harvesting malware that exfiltrated SSH keys, cloud credentials, Kubernetes tokens, and database passwords. Packages were live for ~40 minutes. Docker image users were unaffected. BerriAI engaged Mandiant for forensics and rebuilt their CI/CD pipeline. Any team that installed via
pip install litellmduring the window must assume full credential compromise and rotate all secrets. - PostgreSQL log storage bottleneck. Request logs stored in PostgreSQL degrade performance significantly after 1M+ entries. At 100k requests/day, this threshold is hit within 10 days. Requires manual log rotation or archival — not handled automatically.
- Python GIL throughput ceiling. As a Python application, LiteLLM inherits the Global Interpreter Lock constraint. At sustained traffic above 500 RPS, latency spikes are reported. Go-based alternatives (Portkey) maintain single-digit microsecond overhead at the same load.
- Memory leaks require worker recycling. Production deployments need
max_requests_before_restartconfiguration to periodically recycle workers, adding operational complexity. - Rapid release cadence creates stability risk. Multiple releases per day are common. This is good for feature velocity but creates a moving target for production pinning. The supply chain attack exploited this rapid release pattern.
- Slow import time.
from litellm import completiontakes 3-4 seconds due to heavy dependencies. This is painful for scripts and CLI tools. - Small company risk. BerriAI has raised only $2.1M and employs fewer than 20 people. For infrastructure that sits on the critical path of all LLM API calls, the bus factor and support capacity are concerning.
- Enterprise license is separate. SSO, audit logs, custom SLAs, and advanced guardrails require the Enterprise tier with custom pricing. The open-source version lacks these features.
- Downstream ecosystem impact. LiteLLM is a transitive dependency of DSPy, MLflow, CrewAI, OpenHands, and other major AI frameworks. The supply chain attack demonstrated that a compromise of LiteLLM propagates across the ecosystem.