// blog · analysis · multimodal2026-06-11source: analysis / ai-blogs.org

Alibaba's Happy Horse 1.0 takes the text-to-video crown — China holds the public leaderboard while US/EU labs split the production market

Happy Horse 1.0 at a 2074 arena score now leads the public text-to-video leaderboard ahead of Kling v3 and LTX-2 Fast. The headline is the first Chinese model on top of the public arena vote; the substantive read is the bifurcation between leaderboards and production workflows.

Alibaba's Happy Horse 1.0 leads the public text-to-video arena leaderboard at 2074 — about 90 points ahead of Kling v3 and LTX-2 Fast. It's the first Chinese model to claim the top of the consumer video-generation comparison set, and it lands while OpenAI is sunsetting Sora and Veo 3.1 holds the audio-synchronization niche.

What the leaderboard actually measures

Public arena scoring uses blind human votes between paired generations from a standardized prompt set. It measures "which generation does the average viewer prefer" — which is a different thing than "which model fits a production workflow." Happy Horse's win on the arena vote is a meaningful capability signal, but it doesn't automatically translate to Hollywood-studio or brand-marketing adoption.

Why Veo and Runway aren't displaced

Veo 3.1 holds the native-audio-sync niche; Runway anchors the marketer/brand-consistency workflow. Both compete on integrated workflow features — audio sync at generation time, reference image controls, character continuity across multiple clips, editor toolchains — that the arena vote doesn't surface. Production buyers care about those features more than they care about the median consumer preference between isolated generations.

What the Sora sunset does to the competitive frame

OpenAI conceded the standalone text-to-video product when it announced Sora's web/app discontinuation. That removes one US generalist competitor from the public-leaderboard surface and concentrates the category around specialists: Veo for production, Runway for marketing, Kling and Happy Horse for the consumer arena, LTX for throughput. The market is structurally healthier with specialists than it was with one frontier-lab generalist trying to do everything.

The China-frontier angle

Happy Horse joining DeepSeek, Qwen, Kling, and Moonshot at the top of public benchmark categories means the China-frontier story is now multimodal, not just text-and-code. DiffusionGemma's parallel-decoding architecture is the open-weight US/EU architectural pivot; Happy Horse is the closed-source Chinese capability pivot. Both reshape the multimodal landscape in the same week. The interesting empirical question is whether the architectural advances and the capability advances converge in the same model class, or stay separated by jurisdiction and license.

LLM-Stats — Best AI for Video Generation in 2026 — Ranked by Blind Human Votes → · Pixflow — Best AI Video Generator in 2026 →