Sonus

A massively multilingual zero-shot text-to-speech synthesis system

Overview

Sonus is an advanced multilingual zero-shot text-to-speech synthesis system supporting over 600 languages. Built on a novel architecture, it delivers high-quality speech generation with superior inference speed, supporting voice cloning and voice design capabilities.

Key Features

600+ Languages Supported: Broad language coverage for zero-shot TTS
Voice Cloning: High-quality voice cloning from short reference audio
Voice Design: Control voices via speaker attributes (gender, age, pitch, accent, etc.)
Fine-grained Control: Support for non-verbal symbols and pronunciation correction
Fast Inference: Optimized for real-time and batch processing

Installation

pip install torch torchaudio
pip install transformers

Quick Start

from transformers import AutoModel, AutoTokenizer
import torch

model = AutoModel.from_pretrained("comethrusws/sonus", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("comethrusws/sonus", trust_remote_code=True)

# Load to device
model = model.to("cuda")

# Generate speech
text = "Hello, this is a test of voice synthesis."

Model Specifications

Architecture: Diffusion language model-style
Parameters: 0.6B
Sampling Rate: 24 kHz
Languages: 600+

License

This project is available under a custom license.

Non-commercial use: Free for personal projects, research, and educational purposes
Commercial use: Requires explicit permission. Contact inquiry@sagea.space for licensing inquiries

See LICENSE file for full terms.

Disclaimer

Users are prohibited from using this model for unauthorized voice cloning, impersonation, fraud, or any illegal activities. Ensure compliance with applicable laws and ethical standards.

Downloads last month: -

Safetensors

Model size

0.6B params

Tensor type

I64

F32