The character Dex from the 2019 cinematic trailer for CD Projekt Red's Cyberpunk 2077. (Source: Goodbye Kansas) |
Dave Connely is supervisor and head of scanning at Goodbye Kansas Studios.
Digital humans have advanced over the decades, and our measurement of their realism has increased by leaps and bounds. We’re used to seeing them in games and trailers, and more and more in feature films. With the impending metaverse, they will assume a much wider role as lifelike avatars in those online worlds. More and more, studios are relying on 3D scanning to help generate the future digital population.
3D scanning technology is evolving fast, but there remains a people problem that is tough to overcome. We’re variable, uncontrollable, and unique. For example, every person’s face moves in a slightly different way. It’s why the Rock’s signature singular eyebrow raise is so recognizable.
3D scanning, or photogrammetry, is the art and science of extracting 3D information from photographs, overlapping objects, structures, or space, and converting them into digital models. The process began with surveyors and architects creating topographic maps but has now been honed to capture the minute details of the human face.
Therefore, a facial 3D scanner is, in its simplest form, a tool measuring people in real life, from which to model their digital counterparts. Scanners provide a fixed, accurate base from which to create a digital double’s unique physicality: from gestures to expressions.
The current standard for a human scanner works by using multiple photographs to find common points between images. Looking at the photos, a scanner identifies the differences between them to find the depth and perspective in a face, in a similar way to how our eyes perceive depth. This interpreted depth perception leads to a three-dimensional scanned image. From this, the data is translated into a cloud of measurement points, which converts to a mesh, joining up the dots. Think of it almost like a dot-to-dot puzzle, but in a three-dimensional space with millions of dots.
At VFX/CG facility Goodbye Kansas, we have our in-house facial scanner and a body scanner, along with an array of 80 Nikon DSLR cameras with 300 individually controllable LEDs—invaluable items in our digital human tool kit. A typical scan session with talent at the studio captures on average 70 facial shapes in two hours. Our teams use scanning workflows largely for film, television, and game cinematics, including this past summer’s Ubisoft cinematic trailer for Skull and Bones and capturing Maisie Williams for an H&M commercial.
A digital human created by Goodbye Kansas for the 2022 cinematic for Krafton’s PUBG Battlegrounds. (Source: Goodbye Kansas) |
The team at Goodbye Kansas Studios uses millisecond-timed shutter speeds in the region of 1/100th of a second to digitally capture people. Even then, slight movement happens between frames, requiring custom software to correct it. It’s incredibly difficult for a human being to keep still, and this and other human inaccuracies in the data is where we hit what we call the “problem with people.”
Human beings are variable, uncontrollable, and unique. For example, every person’s face moves in a slightly different way. It’s this humanness that can hinder the creation of digital doubles and digital humans if not managed correctly. Just being a living, breathing subject can cause inaccuracies in the data.
Scanning departments can mitigate the “problem with people” by shooting as fast as possible and preparing the actor for the process and conditions needed for the scan. However, asking a person to make specific facial expressions can be difficult for them to perform without their faces looking robotic. Not everyone is even capable of making the same range of facial expressions. Sometimes a 3D recording is taken to see how the muscle, fat, and skin side over one another and interact while making these shapes.
You might wonder, how is this “problem with people” being combatted? The answer involves tackling it on three fronts—direction, hardware, and AI.
Creating a digital human is a technical process. Technology is mechanical, but humans are not. There’s an art to employing technology with variable elements, and the role of communication when directing your human model is not to be undervalued. The way to get the best out of the talent is different for every person, and this itself directly influences the final digital human created. When the model’s face has been scanned numerous times, we can combine these images to bring together an accurate map of their face, muscles, fat, wrinkles, pores, blood flow, and hair.
Hardware has also been a driving factor in improving scans and dealing with problems with people. Processing power is needed in large quantities for scans, whether that be portable processing, such as Lidar units, which are now lighter and capture more points faster than ever, or photogrammetry solutions, which require large amounts of raw post-processing to get the results required. In some cases, what used to take days now takes seconds.
However, it is becoming less about what is captured and more about how the data is processed. And with the major tech companies also taking an interest in creating the best possible avatars, this is helping to accelerate research and advance the field. What’s more, improvements in AI technology are likely to make scanning equipment more accessible, meaning that those willing to master the craft can operate a digital scanner without prior technical expertise.
There are so many variables involved in scan accuracy, including subsurface scattering, image blur, and resolution movement, which all introduce noise to the final mesh. Our scanners meet very strict parameters to help us get the most accurate result we can, allowing for these variations. For example, we use particular LEDs built in-house to help reproduce the skin tones with an almost perfect match. We would like to think we capture 96.4% of what is there. The next 3% is incredibly hard to get, but we're working on it.
We're only one part of the package—the first step to a digital human. Our team of artists and pipeline developers are the real stars of the show when it comes to making a convincing digital double. We get them what they need to do what they do best.
A character from the 2021 trailer for Ubisoft’s Assassin’s Creed Valhalla. (Source: Goodbye Kansas) |
A time to scan, a time not to scan
We're fighting physics when using remote sensing. Most things can be scanned. Even flowing liquids can be scanned with enough high-speed cameras, but it's not always the best approach. Most scanners will always struggle with solid-color, shiny surfaces, and will need adequate preparation. The same goes for transparent materials. Usually, it boils down to the time, stylization, and budget of a project.
Digital scanning gives a ground truth representation that can only be achieved in this way. It takes a lot of guesswork out for the artist teams or for anyone building a realistic digital object. Digital scanning is also incredibly useful for accelerated set representation—for example, in Lidar data—to track cameras. This would be a very difficult manual process without such data.
Ultimately, developing great digital humans takes artistic expertise. It’s the marriage between creative intuition and the technical knowledge of operating high-quality tools, matched with the expertise and experience of the team. All of this together makes it possible to quantify how much work must be done to achieve great results for clients.
The key is still the artist. The talent of the artists is still the defining factor in the difference between a good digital human and a brilliant one. Technology does play a role, but the evolution of the innovations involved in the digital human production pipeline simply enables these artists to do what they do best.
The body scanner used at the studio. (Source: Goodbye Kansas) |