Skip to content

[GitHub Trending] OpenBMB/VoxCPM

7.5 relevance
Score Breakdown
technical depth
8
novelty
8
actionability
5
community
7
strategic
5
personal
3

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

TTS model, technically deep but not aligned with reader's focus.

2026-06-02 Open Source github.com
VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning - OpenBMB/VoxCPM
Summary

OpenBMB released VoxCPM2, a 2B-parameter tokenizer-free TTS system built on a diffusion autoregressive architecture and MiniCPM-4 backbone, supporting 30 languages with voice design from natural-language descriptions and controllable voice cloning from short reference clips. It outputs 48kHz studio-quality audio via an AudioVAE V2 asymmetric encode/decode with built-in super-resolution, achieves a real-time factor of ~0.13 when accelerated by vLLM-Omni's PagedAttention, and is fully open-sourced under Apache-2.0 for commercial use.

Key Takeaways
  • Evaluate VoxCPM2 as a drop-in open-source TTS component for agent pipelines requiring multilingual, controllable speech output at 48kHz quality.
Why it matters

This gives platform engineers a production-ready, open-source TTS engine that can be self-hosted (RTF 0.13 on RTX 4090 via vLLM) and integrated into agentic workflows for multilingual voice synthesis without relying on proprietary APIs.

Author

OpenBMB