// news · multimodal · video · model2026-05-18source: google deepmind

Google Veo 3.1 ships with image + video reference inputs for conversion workflows

Google released Veo 3.1, the latest evolution of its Veo video generation line. The headline feature: 1-2 image references plus 1-2 video clip references per generation, optimized for conversion-oriented production rather than raw realism.

Veo 3.1 doesn't match Sora 2 on photorealism but is meaningfully more stable, more predictable, and more useful in commercial pipelines. The reference-input system lets production teams maintain visual consistency across many generations without prompt engineering acrobatics.

The release confirms Google's pivot to "video model as production tool" rather than "video model as novelty demo" — a pivot that puts pressure on OpenAI's post-Sora-shutdown roadmap.

Google DeepMind Veo → · MagicHour — multimodal video APIs 2026 →