Alibaba Wan 2.1: The new Star AI to generate photos and videos?

Deal Score0

Less publicized than his compatriot Deepseek On the front of artificial intelligence, Alibaba is nonetheless very active in the sector. After Qwen 2.5the Chinese giant now announces the provision in open source of its artificial intelligence model Wan 2.1, specializing in the generation of images and videos. Originally presented under the name of Wanx in January, this model is very promising, in particular by its ability to generate good quality visual content from simple textual descriptions or reference images. The new ease of access to Wan 2.1 should arouse a certain craze around this solution.

Promising performance

Especially since on paper, Wan 2.1 also stands out as a reference in terms of performance. With a score of 86.22 %in the Vbench ranking, it surpasses competing models such as Sora (84.28 %) and Luma (83.61 %). He excels in particular in the management of multi-object interactions, an essential competence to generate complex videos. As always, these results must be taken with tweezers, AI benchmarks being regularly biased by brands.

Advertising, your content continues below

One of Wan 2.1's major assets lies in the lightened version T2V-1.3B, which works with only 8.19 GB of video memory. A choice that makes it compatible with consumer public configurations and produces a 5 -second video in 480p in about four minutes. For professional applications, Alibaba offers a more robust version, T2V-14B, capable of processing 14 billion parameters and generating 720p videos.

Advertising, your content continues below

Four different models

Alibaba Cloud provides several Wan 2.1 variations, adapted to different uses:

• T2V-14B: generation of videos from text.
• T2V-1.3B: Optimized version for use on less efficient equipment.
• I2V-14B-720P: generation of videos from 720p images.
• I2V-14b-480p: same principle, but in 480p.

These models are accessible on platforms like Hugging Face and Modelscope, thus facilitating their integration by researchers, developers and businesses. You can even try for yourself, provided you are a bit of a handyman and have a machine powerful enough to run one of these models.

New capacities

Wan 2.1 also promises fairly innovative features, which give it a theoretical competitive advantage. It is thus the first open source model capable of managing text effects in Chinese and English, making it possible the dynamic integration of subtitles and artistic fonts directly in videos.

Its technical improvements also include better management of complex movements, optimization of the quality of pixels and increased fidelity to physical principles. This combination of characteristics earned it to be the only open source model classified among the best five on Hugging Face.

In addition, he supports several tasks such as the generation of videos from text (T2V) or images (I2V), as well as the edition of videos. An audio generation feature from videos (V2A) is also integrated, ensuring fluid synchronization between image and sound.

The reasoning model arrives

According to Alibaba, Wan 2.1 can cover a wide range of uses, from the content generated for social networks with special effects in cinema, including advertising and education. It can also be used for industrial needs, especially in product design or the visualization of architectural processes.

In the wake of the launch of Wan 2.1, Alibaba also unveiled a still developing version of its reasoning model, QWQ-Max, which should also be made available in open source as soon as it is officially released. A choice that obviously contrasts with the policy of certain companies such as Openai and Google, which favor closed models.

Finally, to support the launch of all these models, Alibaba also plans to invest 380 billion yuan (around 50 billion euros) over the next three years in its cloud and IA infrastructure.

Advertising, your content continues below

More Info