Douyin dancing does not need a real person, a photo can produce high-quality video! The new byte technology even experienced the CTO hugging face. 07/12 Update SLTechnology News&Howtos

Douyin dancing does not need a real person, a photo can produce high-quality video! The new byte technology even experienced the CTO hugging face.

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

Look! Now there are four little sisters dancing in front of you:

Think it is the work released by some anchors on the short video platform?

No,No,No . The real answer is: fake, generated, and only rely on a picture of the kind!

The real way to open it is like this:

This is the latest study from the National University of Singapore and byte beat, called MagicAnimate.

Its function can be summed up simply as a formula: a picture + a set of actions = a video with no sense of violation.

Then, as soon as this technology was announced, it made a lot of waves in the science and technology circle, and many technology bigwigs and geeks got out of the game one after another.

Even HuggingFace CTO tried it with his own avatar:

By the way, he made a funny joke:

This is a workout, right? I can skip the gym this week.

There are also netizens who keep pace with the times, playing with characters from the newly released GTA6 trailer:

Even emojis have become the objects of netizens' pick.

MagicAnimate can be said to have focused the attention of the technology circle on himself, so some netizens joked:

OpenAI can take a break.

Fire [thumb] is really fire.

A picture can generate a dance so popular MagicAnimate, how to "eat"?

Needless to say, let's try it hand in hand now.

At present, the project team has opened the page of online experience in HuggingFace:

The operation is also very simple, with only three steps:

Upload a still character photo

Upload the action demo video you want to generate

To adjust the parameters, click "Animate".

For example, here are my photos and a dance clip of "subject 3" that has recently swept the world:

△ video source: Douyin (ID:QC0217) can also select the template provided at the bottom of the page to experience:

It should be noted, however, that due to the current popularity of MagicAnimate, "downtime" may occur during the generation process:

Even if you succeed in "eating", you may have to stand in line.

That's right! As of the press release, still did not wait for the result! ) in addition, MagicAnimate also gives a way of local experience in GitHub. Interested partners can give it a try.

So the next question is:

How did you do that? Overall, MagicAnimate adopts a framework based on the Diffusion Model (diffusion), which aims to enhance time consistency, maintain the authenticity of reference images, and improve animation fidelity.

To this end, the team first developed a video diffusion model (Temporal Consistency Modeling) to encode time information.

This model encodes the time information by adding the time attention module to the diffusion network, so as to ensure the time consistency between the frames in the animation.

Second, in order to maintain the appearance consistency between frames, the team introduced a new appearance encoder (Appearance Encoder) to retain the complex details of the reference image.

Different from the previous methods using CLIP coding, this encoder can extract dense visual features to guide animation, so as to better retain identity, background, clothing and other information.

Based on these two innovative technologies, the team further adopted a simple video fusion technology (Video Fusion Technique) to promote the smooth transition of long video animation.

Finally, experiments on two benchmarks show that the result of MagicAnimate is much better than that of previous methods.

Especially on the challenging TikTok dance dataset, MagicAnimate is more than 38% higher than the strongest baseline in terms of video fidelity!

The qualitative comparison given by the team is as follows:

And compared with the SOTA baseline of cross-ID, the results are as follows:

One More Thing has to say that projects such as MagicAnimate have been a bit hot lately.

Well, shortly before its debut, the Ali team also released a project called Animate Anyone, as long as "one picture" and "desired action":

As a result, some netizens also raised questions:

This seems to be a war between MagicAnimate and AnimateAnyone. Who is better?

Paper address:

Https://arxiv.org/abs/2311.16498

Reference link:

[1] https://github.com/magic-research/magic-animate

[2] https://twitter.com/cocktailpeanut/status/1732052908227588263

[3] https://twitter.com/ProductHunt/status/1732116454647136449

[4] https://twitter.com/Gradio/status/1731992981715231162

[5] https://twitter.com/dylan_ebert_/status/1732152096621813954

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.