Research // R&D · Moonjump Game Studio

Production motion capture from a single camera.

Moonjump builds its own animation technology, hardware and software. Our R&D group has developed a markerless motion-capture pipeline, a custom wearable, and the sensor fusion that combines them into clean, retargetable character motion. This is the work that moves the characters in our games.

Read the approach Talk to the research team

01 — The problem

Good motion is the most expensive thing in a small studio.

Traditional motion capture means an optical stage, marker suits, calibrated cameras, and days of manual cleanup per shot. It is out of reach for most independent teams, and slow even for the ones who can afford it.

We wanted a different starting point: that any animator on our team could capture believable, usable motion from footage they already have, a single video clip, and get an animation that drops straight onto a game rig. Getting there meant solving the parts that off-the-shelf pose estimators leave behind: depth and world placement, a clean skeleton a game engine can use, and motion that is stable enough to survive a director's eye rather than just a metrics table.

02 — The approach

Teach a model what real motion looks like, then hold it to a production standard.

Our pipeline is built in-house and proprietary. This page describes the ideas behind it, not the implementation.

01 / Data

Learn from synthetic motion

We generate a large, physically grounded corpus of synthetic motion and renders, far more variety in bodies, cameras, lighting and movement than any single capture session could provide, so the model generalises instead of memorising.

02 / Structure

Recover structure, not just pose

We estimate full-body 3D structure over time and retarget it onto a standard skeleton, so the output is rig-ready and consistent across clips rather than a loose cloud of 2D joints.

03 / Refinement

Refine for production

A dedicated refinement stage removes jitter, stabilises ground contacts and eliminates foot-slide, the artefacts that make raw estimates unusable in a real shot.

03 — The pipeline

From a video clip to an engine-ready animation.

One clip in, a finished FBX or BVH out. Every stage is automated; the refinement stage is where most of our research lives.

Video

A single monocular clip. Phone footage is fine.

3D structure

Full-body pose and structure recovered over time.

Retarget

Mapped onto a standard, rig-ready skeleton.

Refinement

De-jitter, contact grounding, foot-slide removal.

OUT

Export

FBX / BVH, ready for the engine.

04 — The hardware

A custom wearable, fused with the camera.

A single camera is accessible, but vision alone has limits: it loses joints to occlusion and has to guess at depth. So we built our own answer, a lightweight custom wearable, and fused it with the camera so each covers the other's blind spots.

Wearable

A custom wearable

A lightweight inertial wearable we design and build in-house. Wireless and comfortable, with no optical markers and no stage, it reads the fast, subtle motion a camera can miss.

Fusion

Camera and wearable, fused

The camera anchors global position and the look of the shot; the wearable supplies high-frequency motion and the joints vision cannot see. We fuse the two into one clean, drift-free take.

Robust

Capture that does not break

Through occlusion, fast motion and awkward lighting, the fused signal holds where camera-only or sensor-only capture would drift or drop out entirely.

Vision and inertial sensing each have a failure mode. Fused, they cover for one another, giving capture you can trust without a marker suit or a dedicated stage.

05 — What we measure

The difference is in the artefacts you stop seeing.

Accuracy matters, but believability is decided by the small failures, sliding feet, high-frequency jitter, contacts that float. Our refinement stage targets exactly those.

−82%

Foot-slide vs. the raw estimate

−68%

Motion jitter (acceleration error)

<60s

Typical clip, end to end

Markers, suits or stage time

Internal evaluation · refined output vs. raw monocular estimate
Metric	Raw estimate	After refinement	Improvement
Foot-slide (cm/s)	full	low	−82%
Jitter / acceleration error	full	low	−68%
Contact stability	unstable	grounded	+74%
Joint position error	baseline	tighter	−41%

On these figures: values are from Moonjump's internal evaluation set and are representative of typical clips. Replace with your confirmed benchmark numbers before this page is made public.

06 — What it unlocks

Animation that keeps up with a small team.

Capture from anything. A phone clip becomes a usable animation, no booking, no booth.
Prototype in minutes. Block out gameplay and cutscenes with real motion early, when it still changes the design.
Fill a world cheaply. Crowd and background characters get believable movement without a capture budget per shot.
Built for solo developers too. The barrier to good character animation drops from a studio to a single creator.

07 — Status

In active development, and already in our own productions.

The pipeline powers character animation across Moonjump's projects and continues to evolve. A short technical report is available to partners and collaborators on request.

Request the technical report