The Wire — 2026-05-31

Claude vs Gemini Across 4 Security Domains: A Dead Heat — and the Hardening 63% of AI Code Skips

Across four security domains, Gemini 2.5 Flash and Claude Sonnet 4.6 produced nearly identical results: both leaked password hashes from MongoDB queries without projection, both skipped audience validation on JWT tokens, and neither model added audit logging or input rate-limiting. Among 700 AI-generated functions tested against CWE-mapped ESLint plugins, 63% shipped a vulnerability, underscoring that model choice matters far less than the hardening gaps both omit.

Why it matters

For platform engineers shipping AI-generated auth or data-access code, this confirms that standard prompt-based outputs routinely skip non-functional security guardrails, and standard code review fails to catch them because the missing patterns—audience, projection, rate limits—are invisible unless explicitly linted.

Open Source / techcrunch.com

‘What a joke’: Github Copilot’s new token-based billing spurs consternation among devs

Microsoft's GitHub Copilot switches to token-based billing on June 1, with reported cost jumps from $29 to $750 and from $50 to $3,000 per month, drawing intense backlash from developers. Critics attribute excessive token consumption to 'vibe coding' practices, while others argue Microsoft encouraged heavy usage before changing the pricing model. The shift exposes the unsustainable economics of flat-rate AI coding assistance and creates budget uncertainty for small teams and individual developers.

AI/ML / dev.to

Inference Theft Is the New AI App Security Bug: How to Protect Your LLM Endpoints

Inference theft leverages work amplification—converting a single HTTP request into expensive model calls, tool invocations, and agent loops—to drain budgets via unauthenticated AI endpoints. Effective defense requires per-request budget checks tracking input/output tokens and tool calls (e.g., estimateCostCents with token prices) before invoking models, plus hard limits on prompt size (8K chars), output tokens (800), and agent steps (5). These controls must run on every AI request, not just at login, to prevent abuse even from authenticated users.

DevTools / infoq.com

DuckDB Quack: Client/Server Protocol over HTTP for Multi-User Analytics

DuckDB's new Quack protocol brings client-server capabilities over HTTP to the previously local embedded database, enabling multi-user analytics with concurrent access and remote dataset sharing. Released under MIT, Quack achieves 3.5x faster data transfer than Arrow Flight and can complete small queries in a single network roundtrip by using DuckDB's native format, giving the team full control over protocol evolution. The open-source project plans to integrate Quack with DuckLake and ship it as part of DuckDB 2.0 later this year, garnering positive community reaction for solving horizontal scaling without sacrificing lightweight deployment.

Open Source / techmeme.com

With Microsoft's GitHub Copilot shifting to token-usage billing on June 1, many developers bemoan massive cost increases and the end of flat-rate subscriptions (Lucas Ropek/TechCrunch)

GitHub Copilot is moving to token-usage billing starting June 1, replacing flat-rate subscriptions. Developers report significant cost increases, with the shift seen as favoring enterprise users over individual developers. The article likely details the pricing change, developer backlash, and implications for AI-assisted coding tool adoption.

AI/ML / dev.to

Top API Gateways for AI Applications and Agentic Workflows

API gateways are evolving to handle AI-specific traffic patterns like streaming responses, token-level rate limiting, and prompt cost tracking. Products such as Kong and Portkey now offer native support for agentic workflows, including multi-step orchestration and fallback routing. Specialized gateways like Helicone provide observability into LLM calls, while general-purpose gateways add AI-focused plugins for caching and safety checks.

AI/ML / dev.to

I Turned Hermes Agent into a Verifiable Agent Operating System

Hermes Agent was extended into a verifiable agent operating system by separating agent state into durable layers: Hermes memory for stable facts, Hermes skills for reusable procedures, repo files for project conventions (AGENTS.md/CLAUDE.md), Multica for task ownership, session search for history, and a human approval gate for external side effects. Instead of treating all memory as a single bucket, the system routes each piece of state to the lowest durable layer suitable for its lifetime, preventing drift and enabling verification via an evidence loop (Intent → Action → Artifact → Verification → Report). The approach replaces buried task state with explicit ownership, promotes reusable fixes into skills, and enforces project rules in repo-local files — turning a chat assistant into a disciplined agent runtime with verifiable outcomes.

AI/ML / thenewstack.io

Replit’s vibe coding platform just got a Visa-backed identity layer for AI agents — and it changes how agents spend money

Replit's partnership with Visa embeds payment primitives—tokenization, authentication, wallet management—directly into its AI agent-building environment via the Trusted Agent Protocol, a cryptographic identity registry for agent verification; developers can now build agents that transact natively, with Visa's onboarding and certification defining 'Visa-trusted' agents. The integration, backed by a strategic investment and over 1,000 Visa employees already using Replit, targets agentic commerce and low-value machine-to-machine payments.

AI/ML / dev.to

I Made My AI Models Argue, Then Let Hermes Be the Judge

Council lets three AI models (free OpenRouter and local via Ollama) debate a judgment call in two rounds, then uses Hermes to deliver a single verdict, confidence score, and a breakdown of why they disagreed. Each verdict feeds a council memory that re-weights future juror trust, and the entire orchestration runs locally with zero API costs via Hermes' -z interface. The system exposes a single question box, with the raw dissent hidden behind a confidence dial that triggers when models split 2-1.

AI/ML / dev.to

Your AI Agent Should Text You First

Hermes Agent reframes AI assistants as always-on, proactive 'chiefs of staff' that text first via messaging gateways (Telegram, Discord, Slack), triggering long-running workflows with cron jobs, persistent memory, and MCP tool integrations. Its 5-step loop (Watch, Verify, Produce, Report, Learn) moves beyond reactive chatbots to autonomous operations that improve over time by saving successful workflows as skills. This shift from browser-tab interactions to resident processes running on personal servers or cloud backends reduces context re-explanation and enables agents to act without human initiation.