// blog · analysis · multimodal2026-05-235 min read

The omni bet — Google goes for unified multimodal IO as Sora exits

Gemini Omni accepts text, image, audio, video and outputs the same. Sora was discontinued on April 26. The video-AI category is no longer about who has the best generator — it's about who controls the unified multimodal surface that absorbs every adjacent modality.

The competitive landscape in generative video looked stable a year ago: OpenAI's Sora was the consumer category-definer, with Veo, Runway, Kling, and others competing for the prosumer and enterprise tiers. May 2026 has rewritten that landscape twice over.

First, OpenAI discontinued Sora on April 26. The category-definer that headlined OpenAI's modality expansion just 18 months ago is gone. The replacement market is fragmented and feature-differentiated: Veo 3.1 dominates synchronized-audio generation, Kling owns 2-minute clip length, Runway controls editor workflow integration. None is dominant; all are commercially viable.

Second, Google shipped Gemini Omni at I/O on May 19 — a unified multimodal model that accepts text, image, audio, and video in a single prompt and produces video, edited photos, and digital avatars as output. Omni Flash started rolling to AI Plus / Pro / Ultra subscribers the same day.

Omni's strategic move is the unified IO surface, not the generation quality. Veo 3.1 still wins on synchronized audio. Kling still wins on clip length. Omni's bet is that the production workflow — script + reference photo + audio track + storyboard → finished cut — is the unit of work users actually want, and that owning that workflow beats winning any single sub-task. That's a credible bet, and it's the bet Google has the most resources to execute.

The competitive reframe: the "best video model" question is being replaced by the "best multimodal workflow surface" question. The model labs that ship single-modality bests will continue to exist as suppliers. The labs that ship unified multimodal surfaces (Google, with Omni; possibly Anthropic with whatever comes after their current vision capabilities; OpenAI's path is now unclear after Sora) will define the consumer category. The market is going from "which model generates the best clip" to "which surface absorbs the most of my creative process."

The throughline: the consolidation of capability into unified surfaces is the same pattern we've covered in agents (Agent 365 as the control plane) and in coding (Claude Code vs Antigravity for the developer agent layer). The model labs ship capability; the workflow layer absorbs it. Google's bet on Omni is the same bet Microsoft is making on Agent 365: own the user-facing surface, treat the capability as the supplied input.

TechCrunch — Google's Gemini Omni turns images, audio, and text into video → · eWeek — Sora Is Gone: Here Are 6 AI Video Tools Filling the Void in 2026 →