The perfect Virtual YouTuber setup for Metaverse VTubing

June 27, 2023
Posted by

The Virtual YouTubers (or Vtuber) are on the rise. A trend that started in 2018 in Japan is now swapping over from Japan to the rest of the world with English speaking VTubers like Codemiko, Cory Strassburger’s Blu but also upcoming ones like Lil Happa. There are many of them with a big following, some of them have millions of subscribers, making them very interesting for merchandise such as nendroids, avatar skins for the Metaverse, brand partnerships and collaborations with other YouTubers.

But what is the technology behind these virtual characters, mostly in the form of anime boys and girls but also pixar-style-like 3D characters already?

The setup of a Virtual YouTuber mostly involves facial recognition, motion/gesture recognition and animation software. While there is the classical minimalistic VTuber setup using software like VTube studio or live 2D where gestures of the VTuber are recognized via camera using AI and machine learning and then trigger predefined animations, this is not the setup we want to discuss here. As we are focusing on VTubing in the Metaverse we need to go one step further and look into the production approach using motion capture suits and gloves, iPhone X for facial recognition, a real-time rendering engine and VTuber 3D space. 

Example setup with XSens, Manus gloves and iPhone X

And here is a very good example of how far you can push graphics quality already.

1. The avatar / digital character

First of all, you need a digital character (avatar) for your VTuber career. It’s not the easiest part but today there are many good options for this. The avatar needs to act and move naturally and it should be a bit unique. It needs to be fully 'rigged' with a bone-rig for the body and blendshapes for the face before it can move in a natural way. The easiest way to start, is to download a model from websites like TurboSquid, Sketchfab or CGTrader. You can also use Meta Human Creator from Unreal Engine or create a character at If you use Yabal for VTubing you can use all of these options and also you can use our anime character and tweak it to your needs. 

2. Motion Capture System

To control your digital character with your body in real-time you need a motion capture rig. There are different vendors of motion capture systems we already worked with:

XSens: This is by far the most expensive system but also the best for VTubing we think. With the Awinda solution you have just straps and not a full suit you need to jump in. Especially for long VTubing sessions it’s perfect. It also doesn’t get affected by magnetic interference. You need to invest more than €6k to get the Awinda. If you want to use gloves to animate your fingers you need to add the manus gloves. We prefer this setup to really create the best animation quality, but of course if you are on a budget you should look into the Rokoko direction. 

Rokoko Smartsuit Pro: Much more affordable than the XSens is the suit from Rokoko. They also offer gloves. It starts from €2745 without gloves. In January 2022 they will start shipping the new version called Smartsuit Pro II. We hope this one will be more immune to magnetic interference. We can say that the current version heavily reacts to magnetic fields and you will end up recalibrating every 30 minutes or so. 

Perception Neuron: This is also a very interesting option. As they also offer a version without a suit just the straps it's perfectly designed for VTubing cases. We might order the perception neuron 3 and will write a review about it. From their website they say it’s not immune to magnetic fields but we will see. 

3. Face recognition with iPhone X

The best solution to animate face is by using an iPhone X or better. The ARKit face capture works quite well and is reliable. Of course your digital character need to have the blend shapes supporting AR Kit. The company Polywink offers a 24h service to create those blendshapes for you. It’s €299.

4.Real-time engine to bring it all together

To pull it all together you need 3D animation software such as Unreal, Unity 3D or iClone. It is really easy to stream motion capture data into all major 3D animation software systems. The mocap vendors offer plugins for the real-time engines to stream data directly in. 

5. Metaverse space for you and your followers

So now you are able to stream your animated character on Twitch or YouTube. But this is not yet a Metaverse show. To bring this to the Metaverse you would need a virtual 3D-simulated space where your character lives in and your fans can also join with their own avatar. Just imagine how cool it would be if your followers were in your space together with you. Sure you can invite them to join a minecraft session or something like this. But hey, how would you show up with your digital character there, animating it in real-time? 

When you use Yabal for VTubing you automatically have your own virtual space where your followers can join with their avatar. You can allow your followers to alter the space and als modify your character for free or as a way to let them support you and pay you some money. This can also be integrated with Twitch bits.

6. One more thing

While you can for sure put all the pieces together for yourself, it can be overwhelming to create a character, setup the motion capture pipeline, setup Unreal Engine or Unity etc. Just because you want to entertain your followers as a digital character. That’s why we offer all this a VTuber-space-as-a-service for you. You get your character, spaces, environment for fans to join with their own avatar, Twitch integration, effects, virtual multicam support and more. 

We are preparing for our first Metaverse VTuber accelerator including a creator account for Yabal, free mocap suit equipment during the course of the accelerator and more. If you are interested please join our discord server