What It Does

LiteLLM is an open-source Python SDK and proxy server (AI Gateway) that provides a unified OpenAI-compatible API for calling 100+ LLM providers including OpenAI, Anthropic, Azure, AWS Bedrock, Google Vertex AI, Cohere, HuggingFace, vLLM, and NVIDIA NIM. It translates requests from a single API format into provider-specific formats, handling authentication, cost tracking, load balancing, fallbacks, rate limiting, and virtual key management.

The project is maintained by BerriAI (YC W23, $2.1M raised) and has significant community adoption with 41k+ GitHub stars and 1,300+ contributors. LiteLLM can be used as a Python library (from litellm import completion) or deployed as a containerized proxy server that acts as a drop-in replacement for the OpenAI API endpoint. Enterprise features (SSO, audit logs, custom SLAs) are available under a separate commercial license.

Key Features

Unified OpenAI-compatible API: Single endpoint format for 100+ LLM providers — existing OpenAI SDK code works by changing the base URL.
Cost tracking and budget controls: Spend attribution per virtual key, user, team, or organization with configurable budget limits.
Load balancing: Distributes requests across multiple model deployments with configurable routing strategies.
Automatic fallbacks: Switches to backup models/providers when primary fails (5xx errors, rate limits, timeouts).
Rate limiting: Configurable RPM/TPM (requests/tokens per minute) limits per key, team, or model.
Virtual key management: Issue API keys with per-key budgets, model access controls, and expiration.
Observability integrations: Built-in support for Langfuse, Arize Phoenix, OpenTelemetry, and logging to S3/GCS.
Guardrails: Content moderation and prompt injection detection (basic in OSS, advanced in Enterprise).
Prompt formatting: Automatic translation for HuggingFace model prompt templates.
Docker deployment: Official container image (ghcr.io/berriai/litellm) with PostgreSQL and optional Redis for state management.

Use Cases

Platform team LLM governance: Centralizing all LLM access through a single gateway with cost controls, key management, and audit logging across multiple teams and projects.
Multi-provider failover: Applications that need automatic fallback from one provider to another (e.g., OpenAI -> Anthropic -> Azure) without application-level changes.
Cost optimization and tracking: Organizations needing granular spend visibility per team, project, or individual developer.
Model experimentation: Rapidly testing different LLM providers and models through a consistent API without code changes.
Self-hosted AI gateway: Organizations that cannot send prompts through a third-party SaaS gateway (e.g., OpenRouter) for data privacy reasons.

Adoption Level Analysis

Small teams (<20 engineers): Good fit as the Python SDK (from litellm import completion). Minimal setup required for basic multi-provider access. However, the proxy server deployment adds operational overhead (PostgreSQL, optional Redis, container management) that may be excessive for very small teams. Import time is slow (3-4 seconds), which is noticeable in scripts.

Medium orgs (20-200 engineers): Strong fit for the core use case — platform teams managing LLM access for multiple development teams. Virtual key management, cost attribution, and rate limiting are genuinely valuable at this scale. However, operational challenges emerge: PostgreSQL log storage degrades at 1M+ entries (hit within 10 days at 100k requests/day), Python GIL limits throughput under high concurrency, and memory leaks require worker recycling (max_requests_before_restart). Requires a dedicated platform engineer to operate.

Enterprise (200+ engineers): Fit is questionable without the Enterprise license and significant operational investment. The March 2026 supply chain attack (compromised PyPI packages harvested credentials) raises serious trust concerns for security-critical infrastructure. The company’s small size ($2.1M raised, <20 employees) creates sustainability and support risk. At sustained traffic above 500 RPS, Python-native performance limitations become material. Enterprises should evaluate Portkey or build a custom gateway on top of Go/Rust-based infrastructure.

Alternatives

Alternative	Key Difference	Prefer when…
OpenRouter	Fully managed SaaS, 300+ models, 5% markup	You want zero infrastructure overhead and can tolerate a third-party intermediary
Portkey AI	Enterprise-grade managed gateway, now open-source, Go-based performance	You need production-grade throughput, guardrails, and enterprise governance
Vercel AI Gateway	Integrated with Vercel ecosystem, budget controls	You are already in the Vercel ecosystem
AWS Multi-Provider Gen AI Gateway	Native AWS integration, managed service	You are AWS-native and want a first-party solution
Direct provider APIs	No intermediary, maximum control, volume discounts	You use 1-2 providers and want direct SLAs and pricing

Evidence & Sources

Notes & Caveats

CRITICAL: March 2026 supply chain attack. PyPI packages v1.82.7 and v1.82.8 were compromised on March 24, 2026, containing credential-harvesting malware that exfiltrated SSH keys, cloud credentials, Kubernetes tokens, and database passwords. Packages were live for ~40 minutes. Docker image users were unaffected. BerriAI engaged Mandiant for forensics and rebuilt their CI/CD pipeline. Any team that installed via pip install litellm during the window must assume full credential compromise and rotate all secrets.
PostgreSQL log storage bottleneck. Request logs stored in PostgreSQL degrade performance significantly after 1M+ entries. At 100k requests/day, this threshold is hit within 10 days. Requires manual log rotation or archival — not handled automatically.
Python GIL throughput ceiling. As a Python application, LiteLLM inherits the Global Interpreter Lock constraint. At sustained traffic above 500 RPS, latency spikes are reported. Go-based alternatives (Portkey) maintain single-digit microsecond overhead at the same load.
Memory leaks require worker recycling. Production deployments need max_requests_before_restart configuration to periodically recycle workers, adding operational complexity.
Rapid release cadence creates stability risk. Multiple releases per day are common. This is good for feature velocity but creates a moving target for production pinning. The supply chain attack exploited this rapid release pattern.
Slow import time. from litellm import completion takes 3-4 seconds due to heavy dependencies. This is painful for scripts and CLI tools.
Small company risk. BerriAI has raised only $2.1M and employs fewer than 20 people. For infrastructure that sits on the critical path of all LLM API calls, the bus factor and support capacity are concerning.
Enterprise license is separate. SSO, audit logs, custom SLAs, and advanced guardrails require the Enterprise tier with custom pricing. The open-source version lacks these features.
Downstream ecosystem impact. LiteLLM is a transitive dependency of DSPy, MLflow, CrewAI, OpenHands, and other major AI frameworks. The supply chain attack demonstrated that a compromise of LiteLLM propagates across the ecosystem.

LiteLLM

At a Glance

What It Does

Key Features

Use Cases

Adoption Level Analysis

Alternatives

Evidence & Sources

Notes & Caveats

Related

GoModel

Cloudflare AI Gateway

LLM Gateway Pattern

Hermes Agent