Seedance 2.0 accepts twelve mixed inputs (images + video clips + audio) per generation
Seedance 2.0 ships unified multimodal video generation with up to twelve mixed inputs per generation: 9 images, 3 video clips, and 3 audio files. The flexibility makes it the most controllable video model on the market.
The headline use case is "match this style, transition to this scene, with this voice" in a single generation — historically requiring multiple stages of generation and editing. The model handles temporal continuity across the inputs.
Pricing is competitive with Sora 2 and undercuts Veo 3.1 for unified workflows. The convergence on twelve-input limits suggests this is hitting capacity boundaries on the underlying compute infrastructure.