Skip to content

Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model

8.4 relevance
Score Breakdown
technical depth
8
novelty
9
actionability
8
community
8
strategic
8
personal
10

Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.

Needle: distilled tool-calling model, perfect alignment with AI agent orchestration and open source.

2026-05-13 ai/ml Hacker News (100+)
26m function call model that runs on incredibly small devices - cactus-compute/needle
Summary

Needle is a 26M parameter open-source model distilled from Gemini for single-shot tool calling, using a Simple Attention Network with encoder-decoder, GQA+RoPE, ZCRMSNorm, and tied embeddings. Pretrained on 16 TPU v6e for 200B tokens then fine-tuned on 2B tokens of function calls, it achieves 6000 toks/sec prefill and 1200 decode on Cactus, outperforming FunctionGemma-270m and Qwen-0.6B on tool calling while being finetunable on a Mac/PC.

Key Takeaway

Integrate Needle into lightweight agent pipelines for fast, on-device tool calling or finetune it with your own function schemas.

Why it matters

For a senior engineer building agentic systems, this is a tiny, high-throughput, open model specialized for tool use that can run locally and be adapted to custom tools without cloud dependencies.