Alibaba launches an AI that makes all photos sing realistically

Generative AI models cover all areas, whether text with ChatGPT, video with Sora or images with Midjourney. Technology is even tackling music generation, with Adobe launching Project Music GenAI Control. Meanwhile, Alibaba unveils a stunning new tool in China: EMO.

EMO can make any portrait sing

In a research article dated February 27, 2024, Alibaba’s scientific arm unveils its AI model called EMO, which is capable of turning photos into video clips. Simply put, all you have to do is give him a portrait and during a “advanced audio and video synthesis”the person starts singing on it.

The proof in pictures since the model was used to make Joaquin Phoenix sing as Joker in the feature film of the same name, Leonardo DiCaprio, Audrey Hepburn and even the Mona Lisa.

What EMO impresses is that it doesn’t just move lips. Facial expressions, blinking and lip sync are very realistic. Alibaba is satisfied with its technology and test results. For the company that owns AliExpress, the videos are “convincing”.

An AI powered by an audio-video database

To deliver EMO, Alibaba’s scientific arm explains that it relied on audio-video data consisting of 250 hours of content and 150 million images. Using audio data coupled with facial movement information, AI can generate realistic facial expressions.

As Stable Diffusion stated at the launch of version 3, Alibaba says it is aware of the ethical issues EMO could cause. When it comes to disinformation and the use of the image of third parties, for example in an electoral context, this AI cannot escape legitimate fears. The company is also committed to developing methods to detect fake videos, such as those generated by the tool.

