[GitHub Trending] OpenBMB/VoxCPM
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
Tokenizer-free TTS model, technically deep but not core to reader's interests.
OpenBMB released VoxCPM2, a 2B-parameter tokenizer-free TTS model using a diffusion autoregressive architecture on a MiniCPM-4 backbone, trained on 2M+ hours of 30-language speech data. It supports voice design from text descriptions, controllable cloning from short audio clips, and outputs 48kHz studio-quality audio via AudioVAE V2 with built-in super-resolution. Real-time streaming achieves RTF ~0.3 on RTX 4090, dropping to ~0.13 with Nano-vLLM or vLLM-Omni acceleration, and the model is fully open-source under Apache-2.0.
- Evaluate VoxCPM2 for voice agent pipelines requiring real-time, high-fidelity multilingual speech synthesis with minimal infrastructure overhead.
For a Solutions Architect building AI agents or voice-enabled services, VoxCPM2 provides a production-ready, commercially permissive TTS engine with low-latency streaming, multilingual support, and controllable voice design—critical for conversational AI, content generation, and accessibility tools.
OpenBMB