Building the Open Metaverse

Digital Humans

Dr. Mark Sagar, CEO of Soul Machines, and Vladimir Mastilović, VP of Digital Human Technology at Epic Games, join hosts Patrick Cozzi (Cesium) and Marc Petit (Epic Games) to discuss their approaches to creating digital humans and the role they will play in the metaverse.

Guests

Mark Sagar
CEO and Co-Founder, Soul Machines
Mark Sagar
CEO and Co-Founder, Soul Machines
Vladimir Mastilović
VP, Digital Humans Technology, Epic Games
Vladimir Mastilović
VP, Digital Humans Technology, Epic Games

Listen

Subscribe

Watch

Read

Announcer:

Today on Building the Open Metaverse.

Vladimir Mastilović:

Recreating reality should be automatic, as automatic as possible. Intervening with reality is where we see creative opportunity is. So we want to make the first thing automatic and the second thing easy.

Announcer:

Welcome to Building the Open Metaverse, where technology experts discuss how the community is building the open metaverse together. Hosted by Patrick Cozzi from Cesium and Marc Petit from Epic Games.

Marc Petit:

Hello, everybody, and welcome to our show, Building the Open Metaverse, the podcast where technologists share their insights on how the community is building the metaverse together. Hello, my name is Marc Petit from Epic Games, and my co-host today as usual is Patrick Cozzi from Cesium. Hey, Patrick.

Patrick Cozzi:

Hi, Marc. Hi, everybody.

Marc Petit:

Hey, guys. So today we'll talk about digital humans in the metaverse, and we have invited two of the industry’s luminaries to discuss the topic, so, Patrick, expect some deep thinking around digital humans. Our first guest is Vladimir Mastilović. He's the CEO and co-founder of 3Lateral, also my colleague at Epic Games, where he's our VP in charge of digital human technology. Vlad, welcome to the podcast.

Vladimir Mastilović:

Thanks, Marc. A very kind introduction and really good to be here.

Marc Petit:

Good to have you here. And we also have with us Dr. Mark Sagar, the CEO of Soul Machines. Mark was with us at the SIGGRAPH Birds of a Feather meeting where this podcast originated. So welcome back, Mark.

Mark Sagar:

Yeah. Nice to see you guys again.

Marc Petit:

Mark, you're the CEO of Soul Machines. You're also the director of the Laboratory for Animate Technologies at the Auckland Bioengineering Institute. I read that your team is developing autonomously animated virtual humans with virtual brains and virtual nervous systems capable of highly expressive face-to-face interaction and real time learning. So we're talking about characters that can show emotions and empathy. And I also read, Mark, that your goal is to humanize the interface between people and machines. I think is a fascinating topic. But before we get there, please tell us about your past in the metaverse through computer graphics and the work that you did at Weta, particularly on King Kong and Avatar and how it got you Academy Award recognition in the process.

Mark Sagar:

So I guess my path started with a mixture of both science and art. I used to study physics and things like that, but also would do portraits of people. Anyway, I ended up working on an eye surgery simulator where it was basically combining physical models and computer graphics. So for my PhD, I started making that into a general anatomy simulator. And then one of the pieces of anatomy is the human face, a very complicated piece. And so I got really focused on that. And then that led into building digital actors. And then I was involved in some startup companies, one's called Life Effects, this is late 1990s, early 2000s, where we're basically creating interactive digital models. That's actually a long time ago.

Mark Sagar:

And we were also trying to create realistic virtual humans. This was for film. So we did a few projects, one called The Jester and Young at Heart, which was about creating digital humans that didn't make you think that they're an artificial character. You just went straight to, oh, what's that person's life history? What are they thinking? So I wanted to create digital humans that you would actually just think about they're in a world and not worry about the artifice of that. Anyway, so all that kind of led into various things. I was in the visual effects industry for a long time, most of the time at Sony and at Weta, and that was creating digital characters.

Mark Sagar:

I was looking at both how to create realistic characters. So did work with Paul [inaudible] for Spiderman, where we were using the light stage to create Dr. Octopus and people like that. And then at Weta, during that time I'd been working also on ways to motion capture because at Life Effects and things like that, we were going, "Okay, we can capture actors," because we built our own HD motion capture system, and very high resolution we were working on projects with people like Jim Carey. We're trying to turn them into a fish for a particular project. But one of the big challenges there was the amount of data that you are capturing. And how do you actually make that into something that isn't like a video playback, like you have now with 3D video, it's like, you got tons of data, but you can't manipulate it. So it was really looking at how do we make this type of technology and this data manipulatable and animatable.

Mark Sagar:

So I started developing methods for, I guess, transferring motion data onto characters which were different. And that was starting to really look into, okay, what's the essence of facial expression and interaction, and it's almost like transcribing music. You've captured it from one instrument and you've got a piano. Now you need to turn it into a guitar. So you have to transcribe it. And very different instruments. And so this led to building facts based systems. And so this is particularly useful for characters like King Kong, where you have a performer with one facial geometry and a character like King Kong who's actually got very different facial geometry. His eyebrows will roll up rather than go up. So there's all kinds of nonlinearities in it.

Mark Sagar:

Anyway, all that led to creating those sorts of models for those sorts of characters. And then we started making that into real time systems for Avatar. So we had real time systems where the actors on the stage of Avatar were basically driving characters. So James Cameron would be watching the Na'vi characters live like he was actually in a video game, but he had a virtual camera. So anyway, through all this work, I really got interested in, okay, we've got actors performing these roles, but then it's a three year plus process to get that to screen. And there's one story. If you could have the computer characters create their own acting, act themselves, you've got infinite stories. It's completely infinite what can be done with them.

Mark Sagar:

I've done work in bioengineering before, and I'd also been very interested in neuroscience and AI, all the different things that were going on, and in physiology. So I was really interested in the combination of all these things, how do you actually create a digital character that can actually create its own expressions and act? But of course that's a big rabbit hole. Basically you're not having to make a digital brain. After I thought about this, I realized I can't go back. This is too exciting. I sort of left the visual effects industry and started a lab, which then spun out into a company, Soul Machines. Along the course of that game started making the very first autonomous, fully autonomous character, which was Baby X, which was a digital baby that you could learn and interact with and do all these types of things as almost like a blank slate. So that's a very short version of the story about... And then of course now Soul Machines and the metaverse coming around, it's like, how do you create autonomous digital characters for the metaverse? Because we are going to be interacting with all kinds of things and not everything is going to be driven by a live human.

Mark Sagar:

For example, for a business, once there's scale, you don't want to have every single avatar on there necessarily driven by a human because it's impossible to scale. Whereas that's where digital, like autonomous digital characters start allowing that to be a possibility. And then there's games and entertainment, where if you actually have characters that can actually think for themselves and do different things, that's infinite. All the types of incredible crazy games and interactions we'll be able to make will be just so fascinating. And every time you play the game, you are going to have a different experience, you know? I think that's all so exciting. So that's basically a compressed version.

Patrick Cozzi:

Dr. Sagar, really amazing journey and really high tech work you've been doing. As we bring 3D to everyone, I think operating at this scale, realism is going to be super important. Then turning over to you, Vlad, I believe your journey involved a lot of AAA game work. So tell us about your journey through the metaverse.

Vladimir Mastilović:

Yeah, my background was originally from games. The journey started maybe about 20 years ago, so it's a bit scary when I say it like that. It was sort of motivated similar to Mark, as a combination of art and technology. I could never really decide which one I liked better. And back in those days in Serbia, the formal education wasn't really understanding the multidisciplinary approach, I went with my own path. And for a while, I felt lost a little bit, to be honest, because it was a quiet world back then in gaming, if you wanted to do high end characters. For a few years, I've been doing it on my own. And honestly it felt more like a hobby than a job, but then I guess as a result of very true dedication, a few people noticed initial results. I became a technical director in a company called Image Metrics back in 2005 and had the honor to prototype two generations previous to this one, of game consoles PlayStation 3, the first back then, next-gen game for Rockstar Games.

Vladimir Mastilović:

I came up with this crazy character design, the rig for the formations and everything. And people were fascinated by the craziness of it, but told me that it's about 15 times over budget, but still they were quite impressed. I started kind of leading many... Back in those days, Image Metrics was doing a lot of work with Rockstar Games. I had the honor for more than 10 years to lead facial rigging systems for Rockstar Games, through Image Metrics. And that path, I mean, back in those days really people were telling me, “Vlad, nobody cares about high fidelity characters. We care about gameplay. It's nice what you're doing, and we appreciate your enthusiasm, but just tame it down.”

Vladimir Mastilović:

So I had a little bit of time to think about scanning and I sort of sensed in 2006, 2007 that a big change is coming and that doing art manually is going to change. Not that it doesn't have a future, but it's not going to be about manually pushing vertices. I invested a lot of my thinking into how can we acquire real world data and how we can actually make it useful for real time. So around 2012, I founded 3Lateral. And in the first days we were very focused on the pipeline. How do we ingest large amounts of data, how we process it, how we make it useful? And it was actually quite fortunate for us.

Vladimir Mastilović:

I would like to say, I knew what would happen, but there was a bit of luck in there as well. And it just so happened that for PlayStation 4 generation of games, story driven games were a big thing. Scanning was a big thing and we were ready. Actually much more than the rest of the industry. So that actually allowed 3Lateral to grow very quickly. And it allowed us to start thinking about collecting big data sets. This is something that we intuitively wanted to do because we were always attracted to ordered systems and efficiency and all of that. But the fruit of that labor was actually databases. It started being quite convenient to run initial tests on how would a digital human look like and how we could build models that would enable us to do a character rig much faster and so on and so on.

Vladimir Mastilović:

So that collaboration with Epic Games started in 2015, where we sort of kicked it off really well, it was a lot of positive energy and we were part of that campaign that announced engine becoming free. And then for a few years after that, we were really pioneering a lot of real-time motion capture with Hellblade. And then later on with a siren project that showcased real-time captured and sold digital human, and then leading to demos like Osiris Black that mapped it onto an alien character from Andy Serkis' performance. And then at that point in time, Epic Games and 3Lateral had an in-depth conversation about the future. And we felt it was such a good time kind of doing it together, and a lot of complimentary skills being invested in all that, so that we decided to join forces. And basically this time not only push the fidelity, but also democratization of this capability.

Vladimir Mastilović:

And the same time, our customers were thrilled with us with quality and service, but also frustrated with us because we weren't available. We did our best, but the need grew so much that we became booked for many years in advance. And even though that's a success story, essentially, it was still a problematic one because we couldn't really address the needs of the industry. So that's how we formulated the plans for the metahuman product, which is in its essence about enabling others what we can do, while we continue to push the boundaries of what we can do. And it feels much more fulfilling than showcasing a demo that only we can do. So, just to give you a number that I'm very proud of.

Vladimir Mastilović:

Back in 2017, we did an analysis of the global market of rigs that were created that year. And we estimated that there would be about 50,000 digital humans created in all of the games in all of the productions in the world. Only in the last year, there is about a million meta humans created. Just meta humans. I can't imagine the rest of the productions. And that exponential increase in productivity worldwide, I'm very excited about. In the last year it's been very fulfilling. We could see many creators being able to tell stories that they otherwise wouldn't be able to. And I'm very excited to push this to even greater capability and many more characters that would be created in the future. And we are interested in reducing the skill level required to the level where you don't even have to think about technology. If you want to tell a story, we want to help everybody tell their stories through our products. And hopefully actually work together with other technologies like Mark's, for example, so that we can connect different capabilities and basically help everybody create this kind of new media, I suppose.

Marc Petit:

Great. So thank you guys. The reason we wanted the two of you together was to try to educate ourselves because we're fascinated by the space. We get a lot of questions from our audience as well, and kind of set some expectation and boundaries for digital humans in the metaverse, so it's interesting to compare and contrast your approaches, your backgrounds. I mean, somebody from visual effects, somebody from games. And you seem to be tackling the problem from different perspectives. So maybe start with you, Vlad, about the metahuman technology. Your approach relies a lot on capturing massive amount of data and using machine learning. Can you walk us through the approach here?

Vladimir Mastilović:

Yeah. Partially I've already covered our fascination with collecting real world data. I guess what we're trying to do is recreating reality should be automatic, as automatic as possible. Intervening with reality is where we see creative opportunity is. So we want to make the first thing automatic and the second thing easy. So our approach is we sort of call it the spiral of knowledge. We start with acquiring real world data. We then decompose this data into what we call atomic particles of that data. Then we build our virtual models. Then we build systems for synthesizing new data from which we basically train our systems on synthetic data, as well as the real data. And then that enables us to capture and process new amounts of real data faster and to greater level of precision.

Vladimir Mastilović:

So in every spin of this spiral, we are able to increase our capability order of magnitudes, either in fidelity or the amount of data that we capture. And I would say we're still at the beginnings. There's still many more to learn, to see, we're still very much focused on the appearance, but there is a lot to do on the behavior side of things. And the whole field is incredibly, incredibly complex. It feels like a lifetime search for something that cannot be obtained fully, in infinite amount of time. So, anyway, that's our approach. It's data driven with a sort of spiral of knowledge and then kind of building collections of tools around it to make it more efficient in the next round.

Patrick Cozzi:

Vlad, I love that you're trying to bring this to everyone. We see with the metaverse that there's just going to be so more consumers and so many more creators as well. And if I'm understanding correctly, it sounds like we're going to be able to build lots of different characters, whether it's something really realistic or something that's cartoon like, and you're trying to make that as low cost and as easy as possible. Could you tell us more?

Vladimir Mastilović:

Yeah, absolutely. We are starting from reality, because, may sound surprising, that's the easiest thing. The fantasy world is much more complicated. I guess for obvious reasons there are so many realities that people can imagine. So as it's founded in reality, that's our connection point to when we do motion capture, to the actor as well. If we understand reality, then we can understand how does that reality map onto virtual character. And then if we understand how does that virtual character map into different styles, then there is a pipeline. There is a sense of pipeline. It seems nice and elegant.

Patrick Cozzi:

Great. So let's turn to Dr. Sagar. So Vlad's approach is collecting real world data, then using machine learning, and you're also building a virtual brain and nervous systems that your characters can handle social interactions. Can you tell us about your tech?

Mark Sagar:

Yeah, so there's probably a couple of components to our technology. So one is really the whole behavioral system, which I uncovered. And then the other part is basically creating the digital human bodies that people use. And so for that, our approach is more creating ready-made type models. And so we will have a lot of our customers want, because we are creating... our focus is really digital workforce. So these are professional roles and things like that. And so our customers will want to appeal to particular demographics. So you have a digital human that kind of fits that particular role. And so to that point, we've also been scanning lots of people to build up databases of different anthropological data in a way and then combining them.

Mark Sagar:

But the other part of our company is really focused on driving, autonomous driving of the characters. And so that sort of applies to any animated character. And actually at the moment, we've got a project where we're going to be connecting up our brain to a metahuman. So that will be really fun. So the thing there is what the real goal is. How do you bring life to technology? Effectively it's the essence of animation. How do you bring a character of life? It doesn't have to be a human character. It could be a talking strawberry or something like that. And I think in the middle of this, when we get into there, we're going to have professional applications where you want to appear realistic.

Mark Sagar:

If I'm a digital doctor, you don't want me looking like a dinosaur, right? But if I'm in a social thing, then it's like a fancy dress party and you're going to want to have the most creative characters possible. The challenge, like Vlad's saying, is enormous there because probably one of the most inspiring things that I saw in that was Spore, the game Spore, where they had the character creator. You could make a whole lot of different things. It was very fun. And anybody could do it, like a kid could do it. And I imagine that we'll need that sort of level tooling as we get more into the metaverse to allow complete creative freedom.

Mark Sagar:

Now, the other thing that we do is we do digital celebrities, and this ends up being a much more... You're trying to get a model of a real person. So then all the accuracy matters. And sometimes it's real people that exist now, like we did a Will.i.am model for example, but we're also doing some celebrities that we're de-aging them. So we're working on some projects at the moment where we're putting people back in their heyday. There's artwork involved in that. It's a lot of work. So we are looking at ways that we can automate a lot of these processes as we go on. On the behavioral side, there's a sliding scale of a fully autonomous character that's like Baby X, it just does its own stuff. And then you've got more of a controllable one that might be a digital doctor or financier or something like that, where the company wants full control over their model.

Mark Sagar:

They don't want it doing something random. So if you're doing a brand ambassador, for example, you're creating a curated experience. And so what we have in the model is almost like you can set the degree of autonomy that you actually want to have in the character. So in terms of the technology, we've got the Baby X side where we're really trying to model basically the essence of behavior. So building cognitive models, core emotional models, all these things coming together. And then we have something where say it's more a corporate or customer focused thing. Then it's like, okay, we can take existing technologies and allow them to plug in those things. So we are doing, for example, if we can plug in OpenAI or something like that into there, then we can have NLG driving the characters. But a lot of the customers will actually want to have much more just standard NLP type stuff where it's a curated experience, because they don't want it to go off on some unexpected direction.

Mark Sagar:

So we try to cater across those areas. The thing that excites me the most, of course, is the fully autonomous work. Now for that, what they do share, all the models, is they kind of share the same technology, but they use different elements of it. So for example, we have models, so the emotional system, which is autonomous, is constrained by how you might interact. If it's for a very professional application, then you don't want your doctor suddenly starting to cry. Right? So that's the thing, is that then it's like, or a concierge getting upset at somebody. But the key thing that we're really aiming for is empathy. We're trying to get a degree of empathetic connection. So that means that I have to acknowledge your emotion in an interaction because you want be heard. So I guess what we're trying to do in the interaction with digital humans is create models that kind of acknowledge the whole person, if you like. So when you interact with a character you want information, you want them to help you, but you also want to be felt, I guess. You want your emotions to be accounted for, and then to be responded to in an appropriate way.

Mark Sagar:

Then of course one of our big goals with the autonomous technology is really to create AI that you can collaborate with. And so the Baby X project, one of the key elements we're looking at there is the essence of cooperation. How do humans cooperate in different tasks? Because a lot of that is where especially face-to-face interactions are the most effective interactions that people have. This is why we're using video calls instead of this isn't a text message or a phone call type thing. It's actually much more effective because you just get so much more information. So all of these factors are basically information, they're all valuable. They amplify, they control your attention, what matters. So there's a lot of things there.

Mark Sagar:

We try to take advantage of I guess the whole visual medium, because if I have a face-to-face conversation with technology, I can show you things. I can express, lots of things happen. And it's the opposite of 2001 where you had Hal. Where you've basically got a red light, which is just watching us lip reading people, but they don't even know about it. We've kind of got that same technology with a lot of the home speakers, like the Echos and so forth, because they're sitting in your room listening to you, but you don't really know them and it's a one way interaction. Whereas the face-to-face interaction, it's much more in the open. It's honest. Like if you had a digital human there watching you, and you can tell it's listening, you might go into another room or you might turn it off. It's those types of things where we can see... It's just respect for privacy, I think, which will matter more and more. Because ultimately for us to use these technologies, we have to trust them. That's the only way it's ever going to be accepted by society.

Marc Petit:

So that's interesting. Vlad, you mentioned the millions of meta humans, or at least one million or more than one million. So how far away from making these photorealistic characters available to all creator, and then I'll have the same question for Mark about autonomy, can you kind of get us a sense of how far we are from this being widely available?

Vladimir Mastilović:

Well, I guess it depends on the definition of widely available. Do you mean in the hands of-

Marc Petit:

Yeah. I mean, we're heading towards a creator economy and I'm sure people in the metaverse, a lot of people want to create their own either fantastical or realistic representation.

Vladimir Mastilović:

I would say that in three to five years, we're going to start seeing some real world usage of digital humans that are being used by wider population to tell a story. I sometimes imagine metaverse like interactive YouTube, right? I remember a show from '96, I was so disappointed to hear from an expert on television, they said computers will never be able to play video because that's just too much information for computers. I was really in a bad mood for days after hearing that because I loved the idea of dealing with video and computers. And fortunately, that was so wrong. And then I also remember the time when posting a video to internet was something like science fiction.

Vladimir Mastilović:

I try to imagine the metaverse is going to be, and I sort of see it as interactive YouTube, not literally, but a place where people can create interactive content and invite others to participate in that. I guess it's impossible to imagine all use cases, what that will imply, but I would imagine that people will be building virtual worlds easily, parametrically or by description. And then, I don't know, I guess Holodeck in Star Trek is a good analogy. Again, through description of some kind, creating people as well, and then defining some high level actions that these people will be doing. For example, not so high level, not like go angrily into the street, but more like from here to there walk and open the door and then say this or react to this person.

Vladimir Mastilović:

So I guess it's a little bit down-to-earth, I realize, but we're talking about the timeline of three to five years. But I truly believe in Marc's approach where at some point these are able to simulate intelligence. And I think simulate intelligence is the key thing. I don't think that they will ever be intelligent, but maybe in some more distant future. But I do believe that it's going to be quite interesting to see these virtual simulated spaces where even the characters will be simulated and they will have their own personality. I'm sure that Marc will pleasantly surprise us with how awesome that's going to be. But I would say, maybe I'm wrong, but that's probably five plus years, that kind of future. But I think this simpler future in three to five years is still going to be very engaging and very fun. And if we enable some sort of agency from the user to participate in that space together with these NPCs and their friends, I see that as an opportunity for a lot of fun.

Marc Petit:

Yeah, I can attest to that quickly, Vlad. The minute we put the meta humans in the hands of our Unreal Fellowship participants, the variety of stories, the quality of the stories shot up to the moon. These people in five weeks were able to create multicharacter stories, very emotional, in virtually no time. So I think you're right. Maybe it'll be too modest. We've already thanks to the meta human creator gone a long way there. So Mark, what's the roadmap to some level of autonomy, what do you think is ahead of us?

Mark Sagar:

Oh yeah. So before I say that, I just want to say how cool those videos are that have been made with the meta humans, they look awesome. That's fantastic. Saw one of them even had Mike Seymour pop up in it as well, which was quite funny. So it's really cool to actually see people creating these quality experiences with the digital humans. What I'm really excited about is make that interactive. If you get that quality and it's interactive, that's super exciting. But you can see all the stepping stones to make that happen. So I don't think we're too far off. So touching on some of the things, we currently actually have a lot of our digital humans already out there working in different things.

Mark Sagar:

At the moment it's more web based, but because the technology's all 3D, you can run it in an engine in a virtual space or augmented reality space. So we've been doing various projects around that. I guess Vlad's example of walk angrily to the whatever, that's the sort of thing which we are really looking at. It becomes a virtual character that you direct. But there's different use cases because you'll have something where you want to create a story, and looking at it, the layers of control that you run in a digital character I think reflect what happens in like, say you're a director, I want to tell a story and I'm just giving high level commands to actors, but the actors are then interpreting them and doing their own thing. But then the director will say, "Oh, can you do that again, but more happy," or something like that.

Mark Sagar:

Then if a director's working with an animator, the animator's putting different expression and control in there. So I think that the multi-level approach is the right one to go with this, where you've got very high level commands for the non-animator, non-technical. I just want my digital human to go and do that, versus, okay, I actually want to create a very refined experience and I'm going to direct each part of it. Either like I'm creating an animation. And so I think that there's room for all of that. And especially when we start looking at personality models, the way in which something behaves actually really conveys so much personality. So when we first did motion capture of Jim Carey, this was probably about 24, 25 years ago, or something like that, we took his motion capture data and we put it onto a... We had Peter Jackson coming to visit at the time, so we put it on a Weta cave troll, because they'd sent over a model.

Mark Sagar:

The personality just came completely through on the cave troll. It was like totally different appearance, but the essence of the person was really coming through. And so that whole area is really exciting, is the fidelity of the animation. Now, you need to have control over the different aspects of the animation depending on your task. Now, when we talk about autonomy, there's many layers to that. So you can have full autonomous character in my mind has its own mind and values. Depending on where you draw the line, that can be a long way off. Vlad saying with truly intelligent characters, that could be a really AGI type technology, that can be 50 years away. We will get a lot of technologies which do some very impressive things I think sooner than that.

Mark Sagar:

It's also the way that you are relating to information. So if you think about internet, we've got all this information out there, especially metaverse, everything is going to be connected. I see the digital humans as the nexus. They connect to all of that, because how do you make everything going on in the internet meaningful? Normally we're used to somebody telling us about something or showing us things. We need a way to simplify information. That's why we talk to other people, what's going on, and show me this. So if you watch Vlad's example of the interactive YouTube channel, I think one of the most incredible things about YouTube is the educational thing. If you want to fix something, you just go into YouTube and it tells you how to play a guitar or fix a sink or anything like that.

Vladimir Mastilović:

You know, not enough people talk about that. Everybody who's got YouTube has free education now, and nobody's excited about it. I mean, not literally nobody but not enough people. I'm really glad that you mention it.

Mark Sagar:

I think it's massive. With the digital humans, it's about interactively showing you that. So if you are training or learning something, you now have an assistant that can... You've basically created that interactive YouTube video. And so a lot of our focus, a lot of the technology we've been developing is to create exactly that sort of thing at the moment. So you can have this backwards and forth that's this feedback. When you're learning something, you might get stuck on something or you might want to go back to something. And all these types of things I think are really important. Because everybody learns at different paces and you also want to get the user involved. Like we did a language tutor model where you are actually getting the person to speak to a digital person because that's the closest that they'll get to speaking to a real person when they go out on the street. They have to basically put themselves out there a little bit. We've got projects going on in healthcare. So we're doing therapy assistants and things like that, where people will actually disclose more to a digital human than they will to a real person, because they don't feel judged. There's all kinds of really interesting potential here.

Patrick Cozzi:

Yeah. I agree. Really exciting opportunities. So Vlad, Dr. Sagar, I want to change gears a little bit. And Dr. Sagar, you may remember from the SIGGRAPH BOF we did last summer that really kicked off the podcast, there was a theme across that entire event around open standards and interoperability in the metaverse. How do we build an ecosystem where many different participants can contribute? How do we have lots of different pieces of software at all become best at what they do? And then we have everything kind of work together. And this is a topic really dear to Marc and I. I've done a lot of work in open standards myself. And when we look at where we are and then where we need to be for the complexities of digital humans, I'd love to hear both of your perspectives. Vlad, maybe do you want to start? Or Dr. Sagar, please start.

Mark Sagar:

Vlad, do you want to go first?

Vladimir Mastilović:

No, no, it's fine. We're probably going to say the same sort of things.

Mark Sagar:

I mean, one of the things is that we are really looking at how do we basically connect the autonomous animation system to any other model? And so, as I mentioned earlier, we're currently connecting one up to meta human. Looking at what does it take to basically connect it to a meta human model. We're also looking at completely nonhuman characters, what happens if it's got a completely different skeleton and face and different types of things like that. Also of course you've got all the different platforms that you might be working on. There's the animation connection, which I think is actually pretty easy really, because it's a mapping problem. So you can do that for different people's rigs and things like that.

Mark Sagar:

To make it really democratized for anybody to use, we reverse back from, okay, I've got some technical skill to, hey, I just want to have a digital human in my application and I just want to drop it in and make it do this stuff. That's something where these high level controls over what... You're saying to think about behaviors, information sources, and this is where it can get quite complex. And this is sort of at the heart of the problem that we're trying to solve, is you are combining complex information from different sources and you are having to create an interactive experience out of that. There's layers and layers to that.

Mark Sagar:

To that end, we're really looking at a sort of animation API type thing that can be plugged into lots of different models or backwards and forwards. And I think that for that those open standards is really key. So it's something I'm very keen to be involved in. Just another thing to a meta human is that with a meta human, you get all these amazing hairstyles, you get clothing, you get all this. These are other things, because you need to communicate that as well. There's so much choice that's possible. This is a really cool thing, how do you make those a sort of standard, is it a fashion standard? I need to wear exactly this shirt? What happens if I've got a really... And we start getting into if people purchase something on the internet, so I've got an NFT thing and I've bought some fancy shirt and now I want to put that shirt on a meta human, how do I transfer that data and have it work? Because I've paid for it. I own this. But I now need it to work on a meta human.

Mark Sagar:

I think we're going to have those types of things where we might have universal rigs, with the whole facial animation systems and things like that. At a core level, like with meta human, you're kind of moving the different facial regions and muscles around. We do the same sort of thing and that's your musical notes.

Mark Sagar:

As long as the animation systems are controlling that, things like ARKit, for example, I mean, people have a basis of facial animation, it's kind of close to facts type thing, which makes sense, because these are the muscles that you can drive. Getting into body animation, things like that, there's two approaches to that. There's a data driven approach where say I've got a walk cycle, or I've got particular motions that I'm trying to blend, versus I'm creating the motion. So this is a more physics driven or intent driven model. And that's where, okay, I'm moving my arm to pick up a thing and I'm actually controlling the body. One of the things we've been exploring is muscle driven animation. So it's kind of animated from the inside out, but you've got both and it's all very... they have different degrees of complexity and speed to get things going.

Marc Petit:

So, Vlad, you extract your parametrial presentation from a lot of scan data. You think that could be something that could become more generic as some sort of synthetic representation of a digital human?

Vladimir Mastilović:

Yeah. So when we were starting, we didn't know what we will end up with. Right? And how do you standardize something that you don't know? I think this is the same problem now. Our decision was smaller and simpler. We wanted to standardize a human face and we were thinking, well, what will not change about human face? And that is anatomy. So we created this subtraction layer, which we are calling rig logic. And we said, rig logic is somewhat representative of anatomy and what lies underneath is going to change. But this thing will not, this interface. So that's how we split the problem. So we've sort of evolved underlying complexity of rig logic while kind of keeping this the same. That in time became the standard interface for meta humans too.

Vladimir Mastilović:

Now, we are quite proud of what we achieve with meta humans, but we are not happy yet. They're not photo real. So district logic will continue to evolve into various kind of machine learning matrixes and models and combinations of different things. But we will try to keep this interface the same. I think this is one of the steps towards that standardization. Physics is certainly another one because physics again will not change, the rules of physics. And as we extend to the body, I would expect that won't change any time soon either. So I think this is a good way, kind of good philosophy to go.

Vladimir Mastilović:

I love the idea about the clothing and kind of owning your asset throughout the spaces. Definitely we are thinking that direction. That's still a little bit further out, but I definitely support that. I think the big challenge will be, although I do think this is going in the right direction, building an open metaverse will require certain openness from the companies that are involved in building it. And at the same time, it's a competitive space as well. So that's going to be a challenge. But I am optimistic more than I would be in other industries because of the mentality of people that are in this industry, I suppose. And again, understanding that metaverse is exciting only if it's connected, if it's not, then it's kind of a boring place. So, yeah.

Marc Petit:

So you think that the work that you've done on faces, you can extend to the entire human body and create those kind of standardized interface to interact with?

Vladimir Mastilović:

I believe so. Yeah. I think muscle inspired locomotion, physics inspired locomotion that takes certain inputs from the real world is definitely the way to go. And I hope that people like Mark will then work on certain logic, how we excite these muscles and why, so that we can bring that sort of autonomy to these virtual beings.

Marc Petit:

So Mark, and this idea of exchanging brains or combining brains that you see ways to express those learnings and those, you know-

Mark Sagar:

Yeah. Well, I mean, we are thinking about that a lot at the moment. We have a whole technology initiative, which we're calling Ubiquity, which is really allowing how do we connect to other people's models? Because it is a thing where I totally agree with Vlad, the metaverse is going to be so big that people are going to want all kinds of models. Some people want to look like a Minecraft character and other people will want to look like a photorealistic Tyrannosaurus Rex and other people will want to look like a celebrity. So these are all very, very different controls. But I think the one common factor that people are going to want is expressivity.

Mark Sagar:

So the ability to really express emotion, to communicate, to get really good lip sync, all these types of things, that's the common factor across the... But saying that, actually you probably still have some South Park sort of characters as well, if you wanted to, but people will have all of these. So I think it's a land of plenty in a way, because I think everybody's going to want to have different things for different cases. I mean, in the original Snow Crash book, that was really interesting, because it was like, okay, certain people could afford a high-quality avatar. Others would have more pixelated ones. Others could do all kinds of different things. And then you've got the whole programmable components of what they can do too. So these are exciting.

Mark Sagar:

We've also got the problem to really look at in terms of standards of interaction with elements in the virtual worlds. So if I'm picking up a virtual object, we have to make sure that the digital human's body, the hands don't go through the object, there's got to be collision detection. There's got to be those types of things. Even collision detection, a lot of people do things, they touch their face or they go, "Hmm." Now for digital human expressing, that's actually a hard problem because unless you're doing full on collision detection of absolutely everything and then that's changing the way the face is, if I do this or hold my mouth, my expressions are different. That's actually a pretty hard problem to solve. So we will get to those points where we need to as a fidelity goes up and up. And I think we get much, much faster, whether it's a physics space approach or a machine learning approach, because you can do this stuff with lots of data too, but we will just keep adding to the awesomeness of what's able to be achieved there.

Marc Petit:

Well, as expected, Patrick, this is a very deep and endless topic, but I think we should wrap up for the sake of keeping this in an acceptable length. So one question I would like to ask, especially as plans are down the line for each of you, is there any topics that we haven't discussed today we should have really evoked? So maybe Vlad?

Vladimir Mastilović:

Well, there's many things that we didn't mention. Like you say, it's difficult to kind of pack it in about one hour. I'm deeply interested in the process of forming an identity and how we reach decisions. I think that's a very deep rabbit hole that Mark is obviously jumping into, but identity I think in the metaverse, this is all about that, right? It's about expressing yourself through that. I think it has a very deep implication in parts of the industry that are not necessarily considered metaverse, like we briefly mentioned garments and clothing. Even though that's not computer graphics, it is the way in which we express our identity. I think there is going to be many other industries which will be drawn in the metaverse regardless of whether they want it or not eventually. So I guess we did mention it for a bit, but I guess it deserves one more mention.

Mark Sagar:

One of the things I'm interested in, we've got what I'd call the continuous metaverse and then a bubble metaverse possible. So for example, you could have potentially an experience where you're interacting with the character and at one moment you are running in Unreal Engine and you are in a full 3D environment. And then you may pop out of that and you may go into, let's just say for the sake of Minecraft or something like that, and you're in a very different engine, which has got its own rules or so forth. And then you may even be on a cell phone, augmented reality experience. For these things, do we have multiple instances of a character? So you have your unreal version, you have your Minecraft version, it's a different geometry because it's simplified. And then you've got your one which is optimized for AR for example or whatever it is, a webpage.

Mark Sagar:

In terms of the metaverse question, do we want to have the same engine driving all of those, or are they engines built for the particular world that's created? I see this with some of the various web three projects, is that if I go into like say a decentral land or sandbox type model, then the character that you put in there, or Roblox or something, you've got the rules that it has to operate by. Now, you may want to do more than they're going to allow you to. So the owner of that particular metaverse might allow you to do anything. So then, for example, do you go into unreal and you basically do your entire game controller and create your own experience that you can do in any different way.

Mark Sagar:

So we're going to have those types of things where if we swap a character across, there's also what's the abilities of the character in these environments. And then we get into things like communication between characters. If I'm talking to another character in unreal, and I'm having conversation, how does that work versus if I was in Roblox and I've got little speech bubbles or something like that? So we've got not only the creation of the models, but we've also got to think about how do they communicate, how do we interact with an object? I want to do a Minecraft thing versus something in high fidelity unreal model. These things are really going to matter because this is part of the, how do I transfer my character or do I have multiple identities? Do I exist as one model in unreal and another model in Roblox? Am I the same or different?

Marc Petit:

In a previous episode we talked to Kim Davidson, the CEO of SideFX about proceduralism. He has postulated that we could actually have a parametrial representation of the world and kind of generate at run time, slightly different in different environment. And I guess with the parametrial presentation of a human, if you can carry the essence of its personality and its knowledge, you could re grow that human in a different manner in the Lego world than you would do for the photorealistic world. So you think you could get there?

Mark Sagar:

Yeah. I think that's a really cool approach. That's a really neat way to think about the problem.

Vladimir Mastilović:

You ever think about what kind of hardware in the world will be required to run all of this virtual worlds?

Marc Petit:

We did have a conversation with Bill Vass, and we are far from it, from the kind of compute they will need, but, well, we're probably going to get there at some point, so.

Mark Sagar:

This is one of the things, if you're mining Bitcoin, then that's kind of wasted energy. So maybe all the Bitcoin mining could go towards powering virtual characters in the metaverse and do something more useful.

Vladimir Mastilović:

Maybe still with the same outcome, you know?

Mark Sagar:

Yeah. That's right. But see, that actually results in a value, it's a more globally thoughtful use of compute power.

Patrick Cozzi:

Yeah. I think there's no doubt that the more compute power that becomes available, the more interesting things we're going to immediately want to do with it for the metaverse. So to wrap things up, we always like to end the episode with a shout out, if there is an individual, an organization that you'd like to give a shout out to. Dr. Sagar, would you like to start?

Mark Sagar:

Well, I'd like to shout out to everybody at my company, Soul Machines, they're doing an incredible job on all kinds of different areas. Then also I'd like to shout out to pretty much everybody that's worked on virtual humans because you guys really know what a challenge this is and how difficult it is. When you get down to the details of an eye or an eyelid, or just how the lips are moving, all those types of things, it's such a complex art form in a way, you guys know the pain. Producing absolutely beautiful results and they just keep getting better and better each year.

Vladimir Mastilović:

That's a tough one to follow. I will have to say, I didn't think of it that way, Mark, but I really support it. I guess in order to do this, what we do, especially back in the day when it wasn't so exciting, you would have to be a believer beyond reason a little bit. So, yeah, definitely. I will join that. I will also shout out to our whole team who often doesn't get the spotlight, often because they don't want to, because they're a bunch of freaks in the best possible way. Focused on their part of the whole story. But yeah, there's hundreds of them, and yeah, that's what I'll do. Hey guys and girls and everybody else.

Marc Petit:

Dr. Sagar, Vlad, thank you so much for those insights. This is one of the most important topics in the metaverse, this human representation, human interaction, empathy. Thank you so much for your insight. We know it's the beginning of a long road. I want to also thank our audience. We're getting good feedback on the podcast. Thank you, everybody. Keep on telling us what you want to hear about. Keep on supporting us. It motivates us to keep us going, right, Patrick?

Patrick Cozzi:

Absolutely.

Marc Petit:

All right. With this, again thank you everybody, Dr. Sagar, Vladimir Mastilović, thank you so much for being there with us. Have a good day, everybody.

Patrick Cozzi:

Thank you, everybody. Thanks for listening.