[GitHub Trending] OpenBMB/VoxCPM

7.9 relevance

Tokenizer-free TTS model, technically deep but not core to reader's interests.

2026-06-01 Open Source github.com

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning - OpenBMB/VoxCPM

Summary

OpenBMB released VoxCPM2, a 2B-parameter tokenizer-free TTS model using a diffusion autoregressive architecture on a MiniCPM-4 backbone, trained on 2M+ hours of 30-language speech data. It supports voice design from text descriptions, controllable cloning from short audio clips, and outputs 48kHz studio-quality audio via AudioVAE V2 with built-in super-resolution. Real-time streaming achieves RTF ~0.3 on RTX 4090, dropping to ~0.13 with Nano-vLLM or vLLM-Omni acceleration, and the model is fully open-source under Apache-2.0.

Key Takeaways

Evaluate VoxCPM2 for voice agent pipelines requiring real-time, high-fidelity multilingual speech synthesis with minimal infrastructure overhead.

Why it matters

For a Solutions Architect building AI agents or voice-enabled services, VoxCPM2 provides a production-ready, commercially permissive TTS engine with low-latency streaming, multilingual support, and controllable voice design—critical for conversational AI, content generation, and accessibility tools.

Author

OpenBMB