Skip to content

Fish Audio

★ New
assess
AI / ML vendor Proprietary (commercial API); Fish Audio Research License (model weights — non-commercial only) freemium

At a Glance

Commercial AI voice platform by 39 AI, Inc. offering a TTS and voice cloning API backed by the open-source Fish Speech model, with a marketplace of 2M+ voices and pay-as-you-go pricing positioned as a lower-cost ElevenLabs alternative.

Type
vendor
Pricing
freemium
License
Proprietary
Adoption fit
small, medium
Top alternatives

Fish Audio

Website: fish.audio | API Docs: docs.fish.audio | Company: 39 AI, Inc.

What It Does

Fish Audio is the commercial product by 39 AI, Inc. that wraps the Fish Speech open-source model into a managed TTS and voice cloning API. It offers a marketplace of 2M+ community-contributed voice profiles alongside a developer API for real-time streaming speech synthesis and zero-shot voice cloning. The company positions itself as a substantially cheaper alternative to ElevenLabs, citing approximately 6x lower per-character pricing.

The service provides a Python SDK with async support, a RESTful streaming API, sub-500ms latency for interactive applications, and a unified endpoint for both catalog voices and user-cloned voices. It also serves as the commercial licensing path for organisations that want to use Fish Speech commercially without self-hosting.

Key Features

  • Pay-as-you-go API at $15 per million UTF-8 bytes (~12 hours of speech per $15)
  • 2M+ community voice marketplace with official and user-contributed voices
  • Zero-shot voice cloning from 10 seconds of reference audio via API
  • Streaming TTS with sub-500ms time-to-first-audio
  • 70+ language support (with declared Tier 1/2/3 quality tiers)
  • Official Python SDK with async support; standard REST for other languages
  • Free tier for Playground (non-commercial); paid plans starting from ~$5.50/month
  • Commercial licensing path for Fish Speech model weights for self-hosting use

Use Cases

  • Indie developers and small product teams: Cost-effective TTS API for apps, games, or content tools where ElevenLabs pricing is prohibitive
  • Multilingual content production: Batch voiceover generation across many languages using a single API
  • Voice cloning pipelines: Generating personalised voices for accessibility tools, content creators, or interactive media
  • Evaluation before self-hosting: Testing Fish Speech model quality via managed API before investing in self-hosted GPU infrastructure

Adoption Level Analysis

Small teams (<20 engineers): Good fit. Low barrier to entry, pay-as-you-go with no minimum commitment, Python SDK, and reasonable quality. The free Playground allows evaluation before paying. Suitable for MVPs, hobby projects, and content tools.

Medium orgs (20–200 engineers): Fits with caveats. The API is production-capable and the pricing is competitive. Dependency on a single-vendor managed service without published SLA details is a risk. The company is an early-stage startup (no disclosed funding rounds found) which adds longevity risk for mission-critical use.

Enterprise (200+ engineers): Does not fit well in current state. No enterprise SLA, no on-premises option without a separate commercial license negotiation, no disclosed SOC2 or ISO 27001 certification, and limited public track record at enterprise scale. Regulated industries (healthcare, finance) should wait for more maturity.

Alternatives

AlternativeKey DifferencePrefer when…
ElevenLabsMore polished API, larger model selection, enterprise SLAYou need production reliability and enterprise compliance
Cartesia SonicUltra-low latency (<100ms), focused on real-time voice agentsYou’re building real-time conversational AI
PlayHTVoice cloning API; more established commercial track recordYou need a more mature vendor with published SLA
Microsoft Azure TTSEnterprise-grade, SOC2, vast language supportYou need enterprise compliance and existing Azure contract
Kokoro TTS (self-hosted)Apache 2.0, small model, CPU-viableYou need fully open-source, no third-party dependency

Evidence & Sources

Notes & Caveats

  • Startup risk: No disclosed VC funding rounds or revenue figures were found. For a mission-critical TTS integration, vendor longevity is an open question. Coqui AI — the main prior open-source TTS company — shut down abruptly in December 2025 after running out of runway despite $3.3M in funding.
  • License duality: The underlying Fish Speech model weights are licensed under the Fish Audio Research License (non-commercial). Deploying the weights commercially without using the managed API requires a separate written license from Fish Audio. This creates a lock-in dynamic: evaluate for free, pay for production.
  • No independent SLA published: The public documentation does not disclose uptime guarantees, support tiers, or data retention/deletion policies — gaps that matter for enterprise procurement.
  • Training data provenance: The “10M+ hours” claim is unaudited. No data card or third-party audit of speaker consent and copyright status has been published.
  • Commercial API separate from open-source model: Despite sharing the same underlying research, the API is a distinct commercial product. Updates to the open-source model and the API may diverge over time.

Related