Q&A: Stanford engineers discuss digital doubles
In terms of sheer digital ability, filmmakers today have more creative power than ever.
Motion capture makes it possible to use Josh Brolin’s movements to realistically animate the Marvel villain Thanos. Photogrammetry rigs with dozens to hundreds of cameras snap actors from countless angles and illuminations, creating 3D computer models of their faces that can then produce scenes that were never captured in reality. These “digital doubles” allow filmmakers to save money on stunt doubles, portray actors in extreme scenarios, create animated sequences an actor never actually performed, or even digitally replicate actors who have passed away.
“People think this is a new thing, but this technology has been used in film visual effects for the past five to ten years. Pretty much any Hollywood movie today incorporates this,” said Gordon Wetzstein, associate professor of electrical engineering. “To a regular person, it’s very hard to tell whether something is real or a digital double.”
Now, emerging technologies like generative AI are making digital doubles even more powerful and accessible, potentially disrupting both the film industry and society at large while also offering new tools for applications like medical diagnostics, biomechanics and 3D teleconferencing. To learn more, Stanford News sat down with Wetzstein, whose research includes creating digital facial likenesses, and Karen Liu, professor of computer science, who studies the digital replication of physical human motion.
This interview has been edited for length and clarity.
You both study emerging techniques for creating digital doubles. What do those consist of, and how do they differ from the techniques the entertainment industry already uses?
Wetzstein: Instead of using photogrammetry rigs, my group designs generative AI that learns to create 3D digital people. We scrape random, single-view images from the internet that are publicly available and use them to train AI models to generate faces that look realistic but don’t actually exist. And then those faces can be edited and animated from different perspectives. We’re not doing anything now that we couldn’t do before with enough expertise and expensive tools, but generative AI automates the process.
Liu: My research focuses on generating movements. This includes the musculoskeletal system, which has to correctly model the body type of whoever you’re trying to mimic: a top-heavy person moves differently from a bottom-heavy person.
It also involves modeling a person’s decision-making process that maps perception to action. If I want a digital double to mimic your movements, I need to know how you would react to a particular perception — would you run away or step aside or sit down? That requires lots of data so that when your model is in a situation it has never seen before, it will do the right thing.
How might those techniques change how digital doubles are used?
Liu: As the datasets, generative models, and physics simulators continue to expand and improve, it is possible to train a digital twin of you that predicts the way you move and the actions you would take based on your observation of the world.
Wetzstein: Access to some of these technologies will also expand. So far, replicating a person’s likeness with photogrammetry has been exclusive to high-cost production settings. But that’s already changing. The rise of generative AI tools like Midjourney and DALL·E 2 gives anybody the ability to create any images they want. Soon enough, this is going to be happening for videos, too, and it’s going to be indistinguishable from film content.
Wow…so, will the film industry even need actors at that point?
Wetzstein: That’s the question, right? Replacing extras in the background seems like the first application for AI-generated identities. What's the worth of the actor if you can just create a digital double and edit it any way people want? Does the actor own the copyright to that? And if they do, if you edit the digital double, is that still the actor or is that a different identity that doesn’t belong to anybody?
I don’t think the need for good actors will ever be completely replaced. And with the right legal framework, this could potentially benefit actors, too. If you have control over your digital double and sell it to different places, you could shoot a hundred movies at the same time, virtually.
Liu: We can’t create perfect digital doubles at a large scale yet. You’d need a lot of data from actors. If I only have data on Benedict Cumberbatch walking around, I can only recreate walking motion; I can’t recreate him talking like Sherlock Holmes. And when it comes to 3D human motion, data acquisition is really challenging. You need special devices to do that.
So movies would be hard, because actors do so many movements. But if I just want a short commercial of someone walking towards a chair, sitting down and drinking a beer, that could be done with a pretty reasonable amount of data.
What could we use this technology for outside of the entertainment industry?
Wetzstein: These models can take a single image of a person and extrapolate how they would plausibly look from different angles. That allows you to do things like 3D teleconferencing or photo editing: how many times have you taken a picture with friends or family and somebody isn’t looking at the camera, or has their eyes closed? You can do small edits like that.
Liu: I’m interested in digital doubles mostly for biomechanical and medical reasons. They could be a really good diagnostic tool: if we can accurately model your spine and combine that with secondary information — if we ask when it hurts when you walk, measure your muscle activation and other things — then we can probably solve an inverse problem to figure out why something hurts, or what the cause of your illness was.
Do people have anything to worry about as digital doubles technology becomes more powerful and accessible?
Liu: If I were Benedict Cumberbatch, I would not let people collect a lot of data about me, let’s just put it that way. Once someone has enough data, you never know what kind of model they could build.
Wetzstein: People always get scared about new technologies. And that's okay — with these emerging capabilities, you can make a digital double do anything. We need to be very careful about this because it has the potential to tarnish reputations or spread misinformation. It's important to start conversations with lawmakers about how to make sure these tools don't fall into the wrong hands. We're not at a point where you could use them to change the political landscape of the world, but I think we're not too far away.
But what we’re seeing now is part of a natural progression. 20 years ago, when the first “Toy Story” movie came out, computer graphics were rudimentary, and now they’ve achieved photo-realism. Is there a big outcry about that? No, because it’s being used in mostly responsible ways. We just need to make sure we’re aware of what’s going on.