The State of Agentic Tooling 2026

I love reports like State of JS. They tell you what the industry is actually using. What's being adopted, what's being abandoned, which packages and frameworks are settling, and which ones the survey hype outran the production reality. They interview engineers and developers and get the word off the street. They are the closest thing the industry has to a non-marketing read on what is real.

Engineering teams that ignore reports like this do so at their own cost. The Not Invented Here pattern, where a team convinces itself that its situation is too unusual to follow the industry-standard convention, is a fool's errand almost every time. The industry converges on patterns because the patterns work. Standing apart is occasionally inspired and usually expensive.

This article is a synthesis, anchored to objective research and guided by real-world experience with startup teams. The aim is to summarise where the field has settled, where it has not, and to give engineers and business leaders a shared vocabulary to work from. As much a starting point for the conversation about where we go in 2027 as a snapshot of where we are now.

What's inside

Landscape. Three maturity levels, from prompt engineering through composable tooling to autonomous swarms. Five primitives every major runtime now ships. Five failure modes already visible in production: Agent Sprawl, Prompt Debt, Schema Drift, Context Rot, and Codebase Litter.

Nomenclature. Eight terms a business reader needs to follow the rest of the conversation. Agent, skill, tool, permission, hook, MCP, CLI, sub-agent. Each one has a one-sentence definition and a worked example.

Configuration. The most important shift of the last six months sits here. Microsoft's Agent Package Manager treats agent configuration the way npm treats JavaScript dependencies. One manifest, two commands, the right shape emitted for whichever runtime your engineers use. The configuration table in the deck proves the underlying claim: Cursor, Claude Code, Gemini CLI, and OpenAI Codex ship the same shape under four different naming conventions. APM is what makes that fact actionable rather than academic.

Agents and Skills. The cleanest formulation of what is actually reusable in this stack. Skills are inert knowledge with progressive disclosure baked in. Agents are the runtimes that load them. The pair is the universal interface across vendors, and SKILL.md is now adopted by more than thirty products. The same SKILL.md teaches an agent how to ship a feature, serves as the spec for product, acts as the contract for tests, and reads as the manual for the next engineer. One file. Four readers.

Memory. The context-capability paradox. More instruction makes an agent smarter and dumber at the same time. The fix is separation across cognition, memory, and tool invocation, not bigger context windows. Portable memory with a base file and project-specific overrides is the pattern holding up under load. The deck maps how the four major providers handle this, and which ones lock memory inside their platform versus letting it travel with the team.

Orchestration. LangGraph, Microsoft Agent Framework, and Pydantic AI are the three orchestrators serious teams ship in 2026. Three protocols sit beneath them. MCP for agent to tool, A2A for agent to agent, AG-UI for agent to human. Compose them, regardless of orchestrator choice. The orchestration section also covers durable execution, which matters once human-in-the-loop approvals start taking days rather than minutes.

Real-world stacks. Three concrete compositions. Pick by constraint, not by hype. The default stack pairs LangGraph with Claude Code, Linear, and Slack. The self-hosted purist swaps in Plane, Zulip, Letta, and Windmill. The Microsoft-native bet runs MAF, GitHub Projects, Teams, and Copilot. Each comes with a trade-off statement so the choice is conscious.

Economics. Token tiering across local, mid-tier cloud, and frontier models. The cost lens any small team needs before committing to an architecture. Local for verifiable work. Mid-tier for standard tasks. Frontier only for expensive-error decisions. Fallback chains degrade gracefully when rate limits hit.

Who this is for

Any AI-forward stakeholder making capital decisions about agent investment. The takeaways are framed for them. Why packaging matters at team scale. Where AI investment compounds versus where it burns. What good looks like by 2027. Technical slides are highlighted to go deeper where necessary.

Both audiences are reading the same artifact. That is intentional. The handoff cost between engineering and the business is half the problem in this space. If product and engineering describe agents differently, every decision downstream pays for that gap.

What to walk away with

The standardizing of config is ongoing but there is hope across Cursor, Claude Code, Gemini CLI, and OpenAI Codex; with all of them shipping the same five primitives. Concepts map cleanly. This is the structural fact that makes packaging both necessary and possible, and it is the single most important thing to internalise. There is still work to do, Agent Package Manager (APM) isn’t prefect, a variety of Skills package managers add to the mix. By no means perfect but there is hope in the effort that true standards will emerge. These are essential if we are to address shared knowledge between internal team. For now hire the best DevOps leaders you can to improve your AI Development Experience.

Packaging is the change that makes team-scale work compound rather than fragment. Without it, every engineer runs a different setup, every project rebuilds the same scaffolding, and every audit needs a Slack thread to reconstruct what was running last quarter. With it, the same skills, hooks, and MCP servers move cleanly from project to project and survive personnel turnover.

The winning move is not picking a platform. It is composing layers cleanly around standardised protocols. Anyone telling you to commit fully to one vendor's stack is selling something. Anyone telling you the standards are not ready is reading the announcements from six months ago. The real intellectual work in 2026 is normalising the worker interface, building the deliberation layer, and getting a per-task cost lens. Everything else is glue, and the glue got substantially better in the last six months.

Get the full report

This is the first State of Agentic Tooling report. The deck runs through all eight sections in detail, with the source citations and the engineering-specific tables behind every claim. Updates will follow as the field moves, probably twice a year, with reader feedback shaping what gets tracked next.

The full deck is around forty slides. The email form below sends it to your inbox.