Did you ever solve the neck/head seam issue? That's the one that always gets me with combined body and face captures. The body solver and face solver both have opinions about the neck and they don't agree. We ended up blending between them with a manual weight mask and just accepted that the neck was going to be a manual cleanup job every time.
combining body and face mocap data without losing your mind
This came up in a project I wrapped last month and I want to get a proper thread going on it because most of the advice I found online was either five years old or assumed a full studio pipeline I don't have access to.
The setup I was working with: Rokoko Smartsuit Pro II for body, Live Link Face on an iPhone 12 Pro for facial capture. Both recording simultaneously to separate apps, both synced to the same timecode source via a Tentacle Sync E clipped to the actor. Body exports to FBX, face exports as a CSV of ARKit blendshape values. The goal is getting them onto a single MetaHuman rig inside Unreal Engine 5 without the neck becoming a crime scene.
The main headache is that head rotation lives in two places at once. Your body solver is tracking the head bone from the suit's neck sensor, and your ARKit solve is also encoding head rotation inside the 52-blendshape set — specifically in the headYaw, headPitch, and headRoll values. If you drive both simultaneously you get a double-transform situation and the head starts doing things no human neck can do.
My current fix: strip the head rotation channels from the Rokoko bake entirely and let Live Link Face own all head movement. This works pretty well because the iPhone's ARKit head tracking is honestly more stable than the suit's neck sensor anyway, especially for subtle dialogue-range motion. The downside is any extreme head movements during bigger body actions need to be checked carefully since you're relying entirely on what the phone camera can see.
I tried MetaHuman Animator briefly as an alternative to the DIY approach. The face solve quality is noticeably better than raw ARKit blendshapes — the jaw and lip region in particular gets more nuance — but the workflow for bringing in external body data alongside it is clunky and the documentation assumes you're using their full capture setup.
Faceware's real-time tracker is another option I've heard recommended for the face side of this, but it's priced for teams and I haven't had a chance to evaluate it. Anyone here used Faceware alongside suit-based body capture? Curious how the merge workflow compares.
Happy to share the Python script I wrote to strip and re-merge the channels in MotionBuilder if anyone wants it — it's rough but functional.
Combining body and face mocap is such a pain, pun intended. The timing drift between two separate capture sessions always bites you. We ended up doing a manual sync pass in MotionBuilder using a specific reference gesture the actor did at the start of each take. Tedious but gave us clean data. Are you capturing simultaneously or in separate sessions?