A cutting-edge open-weight text-to-speech model trained on over 200,000 hours of multilingual speech data.
Zonos TTS is designed to generate highly natural speech from text prompts, utilizing speaker embeddings or audio prefixes. It only needs a few seconds of reference audio to achieve excellent voice cloning.
The model provides precise control over speech parameters such as speaking rate, pitch variation, audio quality, and emotional coloring like happiness, fear, sadness, and anger. Zonos TTS natively outputs 44kHz audio to ensure top-tier sound quality.
Get started
Generate high-quality TTS output by inputting the desired text and a 10-30 second sample of the speaker.
Enhance speaker matching by adding an audio prefix to text input, enabling actions like whispering.
Supports English, Japanese, Chinese, French, and German with natural pronunciation.
Precisely adjust speaking rate, pitch, audio quality, and emotional expression.
import torch import torchaudio from zonos.model import Zonos from zonos.conditioning import make_cond_dict # Initialize model model = Zonos.from_pretrained("Zyphra/Zonos-v0.1-transformer", device="cuda") # Load audio sample wav, sampling_rate = torchaudio.load("assets/exampleaudio.mp3") speaker = model.make_speaker_embedding(wav, sampling_rate) # Generate speech cond_dict = make_cond_dict( text="Hello, world!", speaker=speaker, language="en-us" ) conditioning = model.prepare_conditioning(cond_dict) codes = model.generate(conditioning) # Save output wavs = model.autoencoder.decode(codes).cpu() torchaudio.save("sample.wav", wavs[0], model.autoencoder.sampling_rate)
uv run gradio_interface.py # python gradio_interface.py
Zonos TTS Github >>
Zonos TTS currently supports English, Japanese, Chinese, French, and German.
You can fine-tune the emotional tone by adjusting parameters such as happiness, anger, sadness, and fear in the settings.
The real-time factor for Zonos TTS is approximately 2x when running on the RTX 4090.
Zonos TTS can be easily installed and deployed using the Docker files provided in our repository.
Please refer to our license terms for information on commercial use.