๐ŸŽค ZipVoice: Zero-shot Vietnamese Text-to-Speech Synthesis using Flow Matching with only 123M parameters.

The model was trained with approximately 2500 hours of data on a RTX 3090 GPU.

Enter text and upload a sample voice to generate natural speech.

0.3 2