[GitHub Trending] OpenBMB/VoxCPM
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
TTS model, technically deep but not aligned with reader's focus.
OpenBMB released VoxCPM2, a 2B-parameter tokenizer-free TTS system built on a diffusion autoregressive architecture and MiniCPM-4 backbone, supporting 30 languages with voice design from natural-language descriptions and controllable voice cloning from short reference clips. It outputs 48kHz studio-quality audio via an AudioVAE V2 asymmetric encode/decode with built-in super-resolution, achieves a real-time factor of ~0.13 when accelerated by vLLM-Omni's PagedAttention, and is fully open-sourced under Apache-2.0 for commercial use.
- Evaluate VoxCPM2 as a drop-in open-source TTS component for agent pipelines requiring multilingual, controllable speech output at 48kHz quality.
This gives platform engineers a production-ready, open-source TTS engine that can be self-hosted (RTF 0.13 on RTX 4090 via vLLM) and integrated into agentic workflows for multilingual voice synthesis without relying on proprietary APIs.
OpenBMB