// news · multimodal2026-06-11source: pixflow / llm-stats / videoany

Alibaba's Happy Horse 1.0 takes #1 on text-to-video leaderboard at 2074 arena score — first Chinese model to claim the public text-to-video crown over Veo and Kling

Alibaba's Happy Horse 1.0 leads the public text-to-video arena leaderboard at a 2074 arena score, ahead of Kling v3 (1987) and LTX-2 Fast (1935). It's the first Chinese model to top the public text-to-video ranking since Veo entered the leaderboard in 2025, and it lands while OpenAI is sunsetting Sora and Veo 3.1 holds the audio-synchronization niche.

The leaderboard win is the substantive headline. Public arena-style scoring — blind human votes between paired generations — has been Veo's and Kling's territory through 2025-2026. Happy Horse passing both by ~90 points puts an Alibaba model at the head of the consumer video-generation comparison set. For the broader generative-video category, that's the first time a Chinese open-platform offering has held the top-of-leaderboard signal against full-stack frontier US/European competition.

The Sora sunset reshapes the competitive context. OpenAI announced in March 2026 that the Sora web and app experiences end April 26, with API discontinuation September 24 — effectively conceding the standalone text-to-video product to specialized competitors. DiffusionGemma's parallel-decoding architecture is the open-weight architectural pivot from the same week; multimodal generation is bifurcating into specialist runtimes rather than generalist flagships.

See our analysis →

Pixflow — Best AI Video Generator in 2026: Runway, Veo, Seedance → · LLM-Stats — Best AI for Video Generation in 2026 — Ranked by Blind Human Votes → · VideoAny — Best AI Video Generators in 2026: Sora, Veo 2, Runway & More Compared →