Moonjump builds its own animation technology, hardware and software. Our R&D group has developed a markerless motion-capture pipeline, a custom wearable, and the sensor fusion that combines them into clean, retargetable character motion. This is the work that moves the characters in our games.
Traditional motion capture means an optical stage, marker suits, calibrated cameras, and days of manual cleanup per shot. It is out of reach for most independent teams, and slow even for the ones who can afford it.
We wanted a different starting point: that any animator on our team could capture believable, usable motion from footage they already have, a single video clip, and get an animation that drops straight onto a game rig. Getting there meant solving the parts that off-the-shelf pose estimators leave behind: depth and world placement, a clean skeleton a game engine can use, and motion that is stable enough to survive a director's eye rather than just a metrics table.
Our pipeline is built in-house and proprietary. This page describes the ideas behind it, not the implementation.
We generate a large, physically grounded corpus of synthetic motion and renders, far more variety in bodies, cameras, lighting and movement than any single capture session could provide, so the model generalises instead of memorising.
We estimate full-body 3D structure over time and retarget it onto a standard skeleton, so the output is rig-ready and consistent across clips rather than a loose cloud of 2D joints.
A dedicated refinement stage removes jitter, stabilises ground contacts and eliminates foot-slide, the artefacts that make raw estimates unusable in a real shot.
One clip in, a finished FBX or BVH out. Every stage is automated; the refinement stage is where most of our research lives.
A single camera is accessible, but vision alone has limits: it loses joints to occlusion and has to guess at depth. So we built our own answer, a lightweight custom wearable, and fused it with the camera so each covers the other's blind spots.
A lightweight inertial wearable we design and build in-house. Wireless and comfortable, with no optical markers and no stage, it reads the fast, subtle motion a camera can miss.
The camera anchors global position and the look of the shot; the wearable supplies high-frequency motion and the joints vision cannot see. We fuse the two into one clean, drift-free take.
Through occlusion, fast motion and awkward lighting, the fused signal holds where camera-only or sensor-only capture would drift or drop out entirely.
Vision and inertial sensing each have a failure mode. Fused, they cover for one another, giving capture you can trust without a marker suit or a dedicated stage.
Accuracy matters, but believability is decided by the small failures, sliding feet, high-frequency jitter, contacts that float. Our refinement stage targets exactly those.
| Metric | Raw estimate | After refinement | Improvement |
|---|---|---|---|
| Foot-slide (cm/s) | full | low | −82% |
| Jitter / acceleration error | full | low | −68% |
| Contact stability | unstable | grounded | +74% |
| Joint position error | baseline | tighter | −41% |
The pipeline powers character animation across Moonjump's projects and continues to evolve. A short technical report is available to partners and collaborators on request.