Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model
Scored daily by a customisable AI persona to surface the most relevant engineering leadership news.
Needle: distilled tool-calling model, perfect alignment with AI agent orchestration and open source.
Needle is a 26M parameter open-source model distilled from Gemini for single-shot tool calling, using a Simple Attention Network with encoder-decoder, GQA+RoPE, ZCRMSNorm, and tied embeddings. Pretrained on 16 TPU v6e for 200B tokens then fine-tuned on 2B tokens of function calls, it achieves 6000 toks/sec prefill and 1200 decode on Cactus, outperforming FunctionGemma-270m and Qwen-0.6B on tool calling while being finetunable on a Mac/PC.
Integrate Needle into lightweight agent pipelines for fast, on-device tool calling or finetune it with your own function schemas.
For a senior engineer building agentic systems, this is a tiny, high-throughput, open model specialized for tool use that can run locally and be adapted to custom tools without cloud dependencies.