JetBrains open-sources Mellum2 to go where Claude Code can’t

9.2 relevance

JetBrains open-sources Mellum2, a new 12B MoE coding model, directly relevant to AI coding tools.

2026-06-02 AI/ML thenewstack.io

JetBrains open-sources Mellum2 to go where Claude Code can’t

Summary

JetBrains open-sourced Mellum2, a 12B-parameter MoE model with 2.5B active parameters per token, targeting agentic infrastructure tasks (routing, retrieval, sub-agent coordination) and private on-premises deployment — going where Claude Code can't. Successor to Mellum (4B code completion), it achieves 192 tokens/sec on a single H100, pulling 21% ahead of Qwen2.5-7B under concurrent load and scoring 78.4% on EvalPlus function-level code generation, though it concedes broader reasoning (GPQA, MMLU-Redux) to frontier models. Two variants ship: "instruct" for direct answers and "thinking" for explicit reasoning traces in multi-step agentic tasks.

Key Takeaways

Evaluate Mellum2 as a specialized component for private on-premises agentic pipelines where latency and control over code intelligence matter more than general reasoning breadth.

Why it matters

For architects building AI-augmented SDLC pipelines, Mellum2 offers a cost-effective, high-throughput focal model that can be deployed on your own infrastructure for agentic sub-tasks without sacrificing inference control or latency.

Author

Paul Sawers