Fish Audio

Website: fish.audio | API Docs: docs.fish.audio | Company: 39 AI, Inc.

What It Does

Fish Audio is the commercial product by 39 AI, Inc. that wraps the Fish Speech open-source model into a managed TTS and voice cloning API. It offers a marketplace of 2M+ community-contributed voice profiles alongside a developer API for real-time streaming speech synthesis and zero-shot voice cloning. The company positions itself as a substantially cheaper alternative to ElevenLabs, citing approximately 6x lower per-character pricing.

The service provides a Python SDK with async support, a RESTful streaming API, sub-500ms latency for interactive applications, and a unified endpoint for both catalog voices and user-cloned voices. It also serves as the commercial licensing path for organisations that want to use Fish Speech commercially without self-hosting.

Key Features

Pay-as-you-go API at $15 per million UTF-8 bytes (~12 hours of speech per $15)
2M+ community voice marketplace with official and user-contributed voices
Zero-shot voice cloning from 10 seconds of reference audio via API
Streaming TTS with sub-500ms time-to-first-audio
70+ language support (with declared Tier 1/2/3 quality tiers)
Official Python SDK with async support; standard REST for other languages
Free tier for Playground (non-commercial); paid plans starting from ~$5.50/month
Commercial licensing path for Fish Speech model weights for self-hosting use

Use Cases

Indie developers and small product teams: Cost-effective TTS API for apps, games, or content tools where ElevenLabs pricing is prohibitive
Multilingual content production: Batch voiceover generation across many languages using a single API
Voice cloning pipelines: Generating personalised voices for accessibility tools, content creators, or interactive media
Evaluation before self-hosting: Testing Fish Speech model quality via managed API before investing in self-hosted GPU infrastructure

Adoption Level Analysis

Small teams (<20 engineers): Good fit. Low barrier to entry, pay-as-you-go with no minimum commitment, Python SDK, and reasonable quality. The free Playground allows evaluation before paying. Suitable for MVPs, hobby projects, and content tools.

Medium orgs (20–200 engineers): Fits with caveats. The API is production-capable and the pricing is competitive. Dependency on a single-vendor managed service without published SLA details is a risk. The company is an early-stage startup (no disclosed funding rounds found) which adds longevity risk for mission-critical use.

Enterprise (200+ engineers): Does not fit well in current state. No enterprise SLA, no on-premises option without a separate commercial license negotiation, no disclosed SOC2 or ISO 27001 certification, and limited public track record at enterprise scale. Regulated industries (healthcare, finance) should wait for more maturity.

Alternatives

Alternative	Key Difference	Prefer when…
ElevenLabs	More polished API, larger model selection, enterprise SLA	You need production reliability and enterprise compliance
Cartesia Sonic	Ultra-low latency (<100ms), focused on real-time voice agents	You’re building real-time conversational AI
PlayHT	Voice cloning API; more established commercial track record	You need a more mature vendor with published SLA
Microsoft Azure TTS	Enterprise-grade, SOC2, vast language support	You need enterprise compliance and existing Azure contract
Kokoro TTS (self-hosted)	Apache 2.0, small model, CPU-viable	You need fully open-source, no third-party dependency

Evidence & Sources

Notes & Caveats

Startup risk: No disclosed VC funding rounds or revenue figures were found. For a mission-critical TTS integration, vendor longevity is an open question. Coqui AI — the main prior open-source TTS company — shut down abruptly in December 2025 after running out of runway despite $3.3M in funding.
License duality: The underlying Fish Speech model weights are licensed under the Fish Audio Research License (non-commercial). Deploying the weights commercially without using the managed API requires a separate written license from Fish Audio. This creates a lock-in dynamic: evaluate for free, pay for production.
No independent SLA published: The public documentation does not disclose uptime guarantees, support tiers, or data retention/deletion policies — gaps that matter for enterprise procurement.
Training data provenance: The “10M+ hours” claim is unaudited. No data card or third-party audit of speaker consent and copyright status has been published.
Commercial API separate from open-source model: Despite sharing the same underlying research, the API is a distinct commercial product. Updates to the open-source model and the API may diverge over time.

Fish Audio

At a Glance

Fish Audio

What It Does

Key Features

Use Cases

Adoption Level Analysis

Alternatives

Evidence & Sources

Notes & Caveats

Related

Fish Speech

HeyGen