Fish Audio
Website: fish.audio | API Docs: docs.fish.audio | Company: 39 AI, Inc.
What It Does
Fish Audio is the commercial product by 39 AI, Inc. that wraps the Fish Speech open-source model into a managed TTS and voice cloning API. It offers a marketplace of 2M+ community-contributed voice profiles alongside a developer API for real-time streaming speech synthesis and zero-shot voice cloning. The company positions itself as a substantially cheaper alternative to ElevenLabs, citing approximately 6x lower per-character pricing.
The service provides a Python SDK with async support, a RESTful streaming API, sub-500ms latency for interactive applications, and a unified endpoint for both catalog voices and user-cloned voices. It also serves as the commercial licensing path for organisations that want to use Fish Speech commercially without self-hosting.
Key Features
- Pay-as-you-go API at $15 per million UTF-8 bytes (~12 hours of speech per $15)
- 2M+ community voice marketplace with official and user-contributed voices
- Zero-shot voice cloning from 10 seconds of reference audio via API
- Streaming TTS with sub-500ms time-to-first-audio
- 70+ language support (with declared Tier 1/2/3 quality tiers)
- Official Python SDK with async support; standard REST for other languages
- Free tier for Playground (non-commercial); paid plans starting from ~$5.50/month
- Commercial licensing path for Fish Speech model weights for self-hosting use
Use Cases
- Indie developers and small product teams: Cost-effective TTS API for apps, games, or content tools where ElevenLabs pricing is prohibitive
- Multilingual content production: Batch voiceover generation across many languages using a single API
- Voice cloning pipelines: Generating personalised voices for accessibility tools, content creators, or interactive media
- Evaluation before self-hosting: Testing Fish Speech model quality via managed API before investing in self-hosted GPU infrastructure
Adoption Level Analysis
Small teams (<20 engineers): Good fit. Low barrier to entry, pay-as-you-go with no minimum commitment, Python SDK, and reasonable quality. The free Playground allows evaluation before paying. Suitable for MVPs, hobby projects, and content tools.
Medium orgs (20–200 engineers): Fits with caveats. The API is production-capable and the pricing is competitive. Dependency on a single-vendor managed service without published SLA details is a risk. The company is an early-stage startup (no disclosed funding rounds found) which adds longevity risk for mission-critical use.
Enterprise (200+ engineers): Does not fit well in current state. No enterprise SLA, no on-premises option without a separate commercial license negotiation, no disclosed SOC2 or ISO 27001 certification, and limited public track record at enterprise scale. Regulated industries (healthcare, finance) should wait for more maturity.
Alternatives
| Alternative | Key Difference | Prefer when… |
|---|---|---|
| ElevenLabs | More polished API, larger model selection, enterprise SLA | You need production reliability and enterprise compliance |
| Cartesia Sonic | Ultra-low latency (<100ms), focused on real-time voice agents | You’re building real-time conversational AI |
| PlayHT | Voice cloning API; more established commercial track record | You need a more mature vendor with published SLA |
| Microsoft Azure TTS | Enterprise-grade, SOC2, vast language support | You need enterprise compliance and existing Azure contract |
| Kokoro TTS (self-hosted) | Apache 2.0, small model, CPU-viable | You need fully open-source, no third-party dependency |
Evidence & Sources
- Fish Audio API pricing and plans
- Fish Audio Review 2026 — AI Tool Analysis (independent)
- Best TTS APIs 2026 — developer comparison
- Open Source TTS Models 2026 — SiliconFlow guide
Notes & Caveats
- Startup risk: No disclosed VC funding rounds or revenue figures were found. For a mission-critical TTS integration, vendor longevity is an open question. Coqui AI — the main prior open-source TTS company — shut down abruptly in December 2025 after running out of runway despite $3.3M in funding.
- License duality: The underlying Fish Speech model weights are licensed under the Fish Audio Research License (non-commercial). Deploying the weights commercially without using the managed API requires a separate written license from Fish Audio. This creates a lock-in dynamic: evaluate for free, pay for production.
- No independent SLA published: The public documentation does not disclose uptime guarantees, support tiers, or data retention/deletion policies — gaps that matter for enterprise procurement.
- Training data provenance: The “10M+ hours” claim is unaudited. No data card or third-party audit of speaker consent and copyright status has been published.
- Commercial API separate from open-source model: Despite sharing the same underlying research, the API is a distinct commercial product. Updates to the open-source model and the API may diverge over time.