// topic / agents

Agents

Tool-using systems, autonomy, and what works vs. what's still demoware.

All items 90 items ← back to archive

CURSOR / SHAREUHACK·2026-05-22

Cursor Composer 2.5 becomes the in-IDE default — Build in Parallel + cloud agent dev environments + MS Teams clear the procurement bar

Cursor's Composer 2.5 (May 18 release) matched Opus 4.7 and GPT-5.5 on coding benchmarks at $0.50/M input / $2.50/M output. The new version added cloud agent dev environments, Microsoft Teams integration, and Build in Parallel — concurrent sub-agent execution on the same git working tree. The combination is the strongest model-agnostic in-IDE offer currently available.

agents · tools
COGNITION / LUSHBINARY·2026-05-22

Devin 3 hits 90% on SWE-bench Verified — Cognition completes Windsurf acquisition at $250M and bundles Devin inside the IDE

Cognition's Devin 3 model now clears 90% on SWE-bench Verified — the first SWE-bench score consistently above the 90% threshold from any autonomous engineering agent. Cognition has completed its acquisition of Windsurf (the remaining stake after Google's earlier $2.4B acqui-hire of the founders) for $250M. The combination bundles Devin Cloud and Devin Terminal CLI inside the Windsurf IDE; Windsurf Pro raised to $20/month with a new $200/month Max tier.

agents · tools
GOOGLE / TECHCRUNCH·2026-05-22

Gemini 3.5 Flash becomes default in the Gemini app and AI Mode in Search — Google bets the next wave on agents, not chatbots

Google flipped Gemini 3.5 Flash to default across both the Gemini app and AI Mode in Search globally this week. The model outperforms 3.1 Pro on coding and agentic benchmarks while running 4× faster on output tokens per second. The default-tier flip is the operational signal Google has been telegraphing since I/O — the new product surface is agentic, and Flash is the price point Google wants users to inhabit.

frontier-models · agents
GOOGLE / BLOG.GOOGLE·2026-05-22

Gemini Spark runs on dedicated cloud VMs — the persistent personal agent moves from local extension to always-on cloud service

Google's Gemini Spark, the personal AI agent introduced at I/O, runs on dedicated virtual machines in Google Cloud and stays available 24/7 — even when the user's device is off. Spark is powered by Gemini 3.5 Flash via the full Antigravity pipeline, has cross-app access to the user's Gmail, Calendar, Drive, Photos, and YouTube history, and autonomously runs multi-step tasks on the user's behalf.

agents · frontier-models
INDUSTRY / MCP ECOSYSTEM·2026-05-22

MCP server registry explosion continues — over 800 production MCP servers indexed as the agent-tool integration protocol consolidates

The Model Context Protocol (MCP) server registry now indexes over 800 production-quality MCP servers across enterprise SaaS, devtools, cloud infrastructure, and internal tooling integrations. The 2026 H1 cadence has been roughly 100-150 new servers per month — MCP has effectively become the OAuth-for-AI-agents standard, with most enterprise software vendors now shipping or planning an MCP integration as the default agent-access surface.

tools · agents
COGNITION / WINDSURF / TOOLRADAR·2026-05-22

Windsurf 2.0 + Devin bundling clarifies — quota-priced autonomous engineering vs per-token model routing now the defining IDE-tools dichotomy

Windsurf 2.0 ships with Devin Cloud and Devin Terminal CLI bundled inside the IDE; Pro raised from $15 to $20/month, with a new Max tier at $200/month including unlimited Devin Cloud agent runs. The Adaptive Model Router auto-selects between Devin and the IDE's standard coding models based on task complexity. The Cognition-Windsurf integration is the cleanest 'autonomous engineering as a bundled SKU' offer currently on the market.

agents · tools
SOURCE·2026-05-22

The default agent tier shifts — Gemini 3.5 Flash becomes the always-on model behind Spark, Search, and Antigravity

Google flipped Gemini 3.5 Flash to default in the Gemini app and AI Mode in Search globally. Spark runs on dedicated cloud VMs powered by 3.5 Flash. Antigravity 2.0 already ships Flash as default backend. Three product surfaces, one model — Google's bet is that the agent layer wins by making the cheapest model the universal default.

analysis · agents
GOOGLE / ANTIGRAVITY·2026-05-21

Google Antigravity 2.0 bundles Gemini 3.5 Flash by default — Google enters the in-IDE agent category seriously

Google's Antigravity 2.0 release bundles Gemini 3.5 Flash as the default backend and lands as a credible third entrant to the in-IDE agent category alongside Cursor and Windsurf. The pairing of Antigravity's IDE workflow with Flash-tier pricing makes Google the first major-lab vendor to package model and IDE as a single subscription rather than as separate procurement decisions.

tools · agents · industry
GOOGLE / ANTIGRAVITY·2026-05-21

Google Antigravity 2.0 wires Gemini 3.5 Flash as default backend — first major-lab IDE-plus-model bundled SKU

Google's Antigravity 2.0 IDE now ships with Gemini 3.5 Flash as the default backend, bundling model and IDE under a single Google AI subscription. The pairing makes Google the first major-lab vendor to integrate model and IDE as one procurement decision rather than two. With Flash hitting 76.2% Terminal-Bench, the bundling is no longer a capability compromise.

tools · agents
CURSOR·2026-05-21

Cursor 2.5 ships Build in Parallel + Microsoft Teams integration — coding-agent UX consolidates around concurrent execution

Cursor's 2.5 release added Build in Parallel (concurrent sub-agent execution on the same code state), Microsoft Teams integration, and matched Opus 4.7 and GPT-5.5 on benchmarks at $0.50/M input / $2.50/M output. The Teams integration is the procurement-friendly part of the release — enterprise buyers running M365 get IDE collaboration without a separate identity layer.

agents · tools
CURSOR·2026-05-21

Cursor Composer 2.5 ships multi-agent orchestration — parallel sub-agents for refactor, test, doc generation in one IDE session

Cursor's Composer 2.5 update adds multi-agent orchestration: a planner agent decomposes a task into sub-tasks, then dispatches parallel sub-agents for refactor, test-writing, and documentation generation against the same code state. The update lands as a direct competitive response to Claude Code's terminal-native multi-agent workflows and Devin's cloud-agent pattern.

agents · tools
GOOGLE / CNBC·2026-05-21

Gemini Spark personal agent enters beta — Google launches 24/7 task-running agent across connected apps

Google launched Gemini Spark, a 24/7 personal AI agent that can reason across connected Google apps, into beta this week alongside Gemini 3.5 Flash. Initial availability is restricted to Google AI Ultra subscribers and a small trusted-tester cohort. Spark joins OpenAI's Operator and Anthropic's Claude Cowork in the same-week launch cadence — the personal-agent tier is now a saturated market.

agents · frontier-models
MCP ECOSYSTEM·2026-05-21

MCP server registry crosses 4,000 published servers — protocol-level lock-in compounds

The Model Context Protocol server registry crossed 4,000 published servers in May 2026 — roughly a 6× growth since the start of the year. The vast majority are open-source and community-maintained, covering everything from cloud-provider APIs to enterprise SaaS integrations. The growth confirms MCP as the de facto integration standard for agentic tooling.

tools · agents
COGNITION / WINDSURF·2026-05-21

Windsurf 2.0 Cascade agents + Spaces task management mature — pricing pivots to quota-based at $20/mo Pro, $200/mo Max

Cognition's Windsurf 2.0 — launched April 15 and refined through May — now ships Cascade agents and Spaces task management as the default workflow surface. The pricing model also pivoted from credit-based to quota-based on March 19: $20/month Pro (up from $15), with a new $200/month Max tier. Devin Cloud and Devin Terminal CLI ship bundled into every paid tier.

tools · agents
COGNITION / WINDSURF·2026-05-21

Windsurf 2.0 bundles Devin Cloud + Devin Terminal CLI into the IDE — autonomous agents become a default IDE feature

Cognition's Windsurf 2.0 release bundles Devin Cloud and Devin Terminal CLI inside the IDE itself. The change makes autonomous cloud agents a first-class IDE feature rather than a separate product. After Devin's price drop to $20/month Core + ACU usage, the bundled experience eliminates the friction that kept most developers on Cursor's editing-first workflow.

agents · tools · industry
SOURCE·2026-05-21

Agent surface bifurcation — three distinct moats, three different races

Gemini Spark ships personal agents to consumers. Cursor 2.5 ships parallel sub-agents to IDEs. Windsurf 2.0 ships autonomous cloud agents bundled with Devin. Three product categories, three different moats, three different races. The 'agent market' is becoming three markets.

analysis · agents
GITHUB / MICROSOFT·2026-05-20

GitHub Copilot agent mode reaches GA on JetBrains — multi-IDE agentic coding now baseline

GitHub Copilot's agent mode is now generally available on JetBrains in addition to VS Code, completing the multi-IDE rollout that started in late 2025. Combined with the March 2026 agentic code review release, Copilot now spans context-gathering, autonomous PR drafting, and review-stage gating across the two largest IDE ecosystems.

agents · tools · industry
INDUSTRY ANALYSIS·2026-05-20

The 2026 default developer stack: Cursor for editing + Claude Code for autonomous tasks

Professional-developer survey data converges on a clear 2026 default: Cursor for in-IDE editing, Claude Code as a terminal-native agent for complex multi-file tasks. The single-tool-rules-all framing has dissolved into a multi-tool workflow where each agent owns a different surface area.

tools · agents · industry
GOOGLE / CNBC·2026-05-20

Google ships Gemini 3.5 Flash and Spark agent — finally a credible answer to ChatGPT and Claude

Google used the May 19-20 I/O keynote to ship Gemini 3.5 Flash (half-to-one-third the price of frontier peers, now default in the Gemini app and AI Mode search globally) plus Gemini Spark — a general-purpose agent that reasons across connected apps and takes action on the user's behalf. Spark is in beta for Google AI Ultra subscribers and trusted testers starting next week.

frontier-models · agents · google
INDUSTRY / MCP ECOSYSTEM·2026-05-20

MCP-native becomes the new baseline for agent tooling — Claude Code, Cursor, Codex all support; Copilot partial

Model Context Protocol (MCP) support has become the baseline qualifier for serious agent tooling in 2026. Claude Code is fully MCP-native; Cursor and Codex support MCP servers via config; GitHub Copilot has partial support; most autonomous agents (Devin, Replit Agent) are still building their MCP layers. The protocol is consolidating into a de facto standard.

tools · agents
SWE-BENCH / AGGREGATED·2026-05-20

SWE-bench Verified leaderboard: Mythos 93.9%, GPT-5.5 88.7%, Opus 4.7 87.6%, Cursor 86%

The May 2026 SWE-bench Verified leaderboard now has 44 evaluated models. Claude Mythos Preview leads at 93.9% — the first model to clear 90% on the canonical real-GitHub-issue-fix benchmark. GPT-5.5 follows at 88.7%, Claude Opus 4.7 (Adaptive) at 87.6%, GPT-5.3-Codex at 85.0%, and Cursor's Composer 2.5 at around 86%.

agents · benchmark
ANTHROPIC / LEADERBOARDS·2026-05-19

Claude Code holds 78.4% SWE-bench Verified lead over Codex, Cursor, Devin, Replit

Updated SWE-bench Verified leaderboards confirm Claude Code at 78.4% — meaningfully ahead of OpenAI Codex at 71.0%, Cursor agent at 67.2%, Devin at 60.8%, and Replit Agent 3 at 54.1%. The 7-point gap to second place is the widest single-agent lead the benchmark has seen.

frontier-models · agents · benchmark
GITHUB / MICROSOFT·2026-05-19

GitHub Copilot Pro and Pro+ move to AI Credits flex billing on June 1

GitHub Copilot Pro and Pro+ will move to AI Credits-based flex billing on June 1, 2026 — preserving the $10/month Pro and $39/month Pro+ price points but switching from unlimited usage to credit pools that draw against a monthly allocation.

tools · agents
CURSOR / BLOOMBERG·2026-05-18

Cursor's revenue doubles in 90 days; $50B valuation trajectory emerging

Bloomberg reports that Cursor's revenue doubled in the most recent 90-day window, with active subscription seats well into the seven figures. Internal projections cited by sources suggest a $50B valuation in any 2026 fundraise — making Cursor the highest-valued private dev tools company.

agents · industry · tools
BLOGS.NVIDIA.COM·2026-05-12

NVIDIA and SAP partner on specialized enterprise agents

Joint effort to build specialized AI agents for enterprise workflows, with a stated emphasis on trustworthiness and reliability — the practical blockers slowing real production agent deployment.

agents · enterprise
ANTHROPIC.COM·2026-05-05

Anthropic ships 10 financial-services agents + Claude Opus 4.7, plus $1.5B Blackstone-led JV

Anthropic launched a 10-agent finance pack deployable as Claude Cowork plugins, Claude Code, or headless Managed Agents — paired with Claude Opus 4.7 (64.37% on Vals AI Finance Agent benchmark, ahead of GPT-5.5's 59.96% and Gemini 3.1 Pro's 59.72%). One day earlier: a $1.5B JV with Blackstone, Hellman & Friedman, and Goldman Sachs.

agents · industry
NIST·2026-02-15

NIST launches dedicated standards initiative for autonomous AI agents

In February 2026, NIST opened a dedicated initiative to develop standards for autonomous AI agents — systems that take real-world actions without continuous human oversight. The framing is a direct response to incidents involving autonomous agents creating security vulnerabilities at scales existing frameworks weren't designed for.

policy · agents