motion matching feels like magic but is it actually feasible outside of AAA

74 views 10 replies

Been going down a rabbit hole on motion matching after watching some GDC talks and digging through the UE5 motion matching docs. The locomotion results look incredible. No more weird blend tree tuning, no more fighting with foot sliding at transition edges, just smooth responsive movement that actually looks like the mocap intended.

But the more I dig in, the more it feels like it's built around assumptions I can't meet. The database size recommendations I keep seeing are 10–30+ minutes of motion data minimum for a convincing locomotion set. That's a lot of capture time, a lot of cleanup, and a lot of storage. UE5's implementation is reasonably accessible, but it's Epic's tooling in Epic's engine. For Godot or Unity projects you're either rolling your own pose matching system or porting something, which is... a lot.

I tried prototyping a stripped-down version: nearest-neighbor pose matching against a small database (about 4 minutes of walk/run/stop/start data) and the results were fine? Better than I'd expect from a naive blend tree, but nowhere near the "it just works" smoothness the demos promise. My guess is I'm just data-starved.

The thing that keeps nagging at me: is the quality improvement from motion matching as a technique, or mostly from the sheer volume and quality of the underlying data? Because if it's the latter, a well-designed blend tree with good mocap probably closes most of the gap at indie scale. And the pipeline cost of motion matching might not be worth it unless you're already sitting on a huge motion library.

Anyone actually shipped something using motion matching outside of Unreal? What did your database end up looking like, and do you think it was worth the investment over a traditional approach?

Replying to ZenithThorn: the algorithm being simpler is true but i'd push back slightly on the KD-tree pa...

Yeah, once you're past ~8 feature dimensions KD-tree query time basically slides toward linear anyway, curse of dimensionality doing its thing. Worth looking at HNSW (hierarchical navigable small world) graphs as an alternative. Most modern ANN libraries use them internally and they hold up much better at higher dimensions. You can prototype offline with hnswlib in Python, and query speed is real-time capable as long as your database stays manageable. The approximation trade-off is totally fine for motion matching: "close enough pose" is literally the design goal.

the GDC talks consistently undersell how much work goes into authoring a motion matching database that actually performs well. the algorithm is elegant but you still need enough motion variety that the system has good options to match against, and capturing, cleaning, and tagging that data at any real scale is not trivial. the ue5 demos look seamless because they're backed by enormous capture budgets. for a small team i'd pick one specific locomotion problem to solve with it rather than reaching for it as a general solution. walk/run with good blending? totally achievable. full-coverage locomotion plus combat plus environmental reactions? that's where the database scope quietly becomes the whole project.

Something that hasn't come up in this thread and bites almost everyone rolling their own: feature vector normalization. When building the pose database, the raw numerical scale of your features matters enormously. Joint positions in world units, joint velocities in units/sec, trajectory directions as unit vectors. Completely different scales. Without per-feature normalization to zero mean and unit variance, the matching silently over-indexes on whichever features happen to have larger raw values, regardless of what weights you set.

The fix is mean/variance normalization baked into the database build step, with per-feature weights applied after normalization. The reason this doesn't come up in GDC talks is that most AAA implementations have it handled at the engine level and presenters just don't think to mention it. If you're rolling your own and tuning weights seems to do basically nothing, that's almost certainly why.

Replying to HexRunner: Something that hasn't come up in this thread and bites almost everyone rolling t...

One thing I'd add on normalization: it gets you to a level playing field, but that's often not what you want. After normalizing it's worth adding explicit per-feature weights, scalar multipliers applied before the distance computation. In practice, trajectory matching and foot contact phase tend to need higher weighting than raw pose similarity for locomotion, and the right balance is pretty game-specific.

What I've found useful: normalize first, ship with equal weights, then tune based on where transitions feel wrong. Foot contact phase weight is almost always too low on the first pass. Once the weight vector is a tunable parameter you can iterate without touching the underlying database at all.

Replying to BinaryMist: One thing I'd add on normalization: it gets you to a level playing field, but th...

The weight tuning workflow matters almost as much as the weights themselves. Hardcoding them as constants is fine to start, but you end up recompiling every time you want to tweak feel. Worth pulling them into a config file or exposing them in an in-game debug UI. Being able to scrub a weight slider and watch transitions respond in real time is the fastest way to understand what each feature is actually contributing to the cost function. Without that feedback loop you're basically tuning blind and hoping it holds up.

One thing the AAA framing tends to obscure: motion matching doesn't have to cover the whole body. Locomotion-only with a hand-keyed upper body layer is actually pretty manageable. You're covering walk, run, stop, and turns, maybe 8–12 minutes of clip data if you're disciplined about coverage, and the upper body handles all actions separately through the animation tree.

The database size scales with locomotion variety, not with every action in your game. Once you frame it that way it stops feeling like a AAA problem. Still non-trivial to implement, but it's not "we need a team of technical animators" territory.

Replying to ChronoLeap: the GDC talks consistently undersell how much work goes into authoring a motion ...

The authoring cost point is spot on, and I'd add one more thing GDC talks consistently skip: transition coverage density. It's not just having enough motion variety overall, it's having enough coverage at every possible exit point from every motion state. If your walk and run data are solid but mid-range speed transitions are thin, the matching will either lock into the nearest clean loop or produce a weird hesitation while it hunts for a pose. The database ends up needing clips authored specifically for transitions, not just the core locomotion states, and that multiplies the actual authoring cost well beyond what any conference talk implies.

Replying to NebulaPike: Feasible outside AAA, but the data requirements are the real barrier, not the al...

the algorithm being simpler is true but i'd push back slightly on the KD-tree part — once your feature vector dimensions climb, search can get hairy in ways that aren't obvious. had some really spiky blending artifacts that turned out to be un-normalized features wrecking the distance metric. bone velocity in world units vs trajectory direction in 0–1 range have totally different scales and if you don't normalize them they don't contribute equally to the match at all.

once everything was normalized the results got way more stable. worth calling out explicitly because the basic writeups usually gloss over it.

The data requirements question is worth unpacking, because the AAA framing really does warp the perception of what's actually achievable. Daniel Holden's original motion matching write-up benchmarked the brute-force nearest-neighbor search at around 1000 poses and it clears 1ms on a single CPU core with no spatial indexing at all. The algorithm isn't the expensive part at indie scale. It's the motion database, because you need variety of transitions, speeds, and directions to cover your game's movement space, not just raw clip volume.

For a single character with a bounded locomotion set (walk, jog, stop, turn, idle), a functional dataset is achievable in a structured day of capture. The genuinely hard part is defining feature descriptors that match your game's specific movement style and feel. That takes iteration and there's no shortcut I've found for it, but it's an animation and design problem, not a budget problem.

Feasible outside AAA, but the data requirements are the real barrier, not the algorithm. The core matching loop (pose + trajectory feature query, nearest-match search, crossfade to result) is actually pretty approachable. Daniel Holden's blog series covers the math clearly, and there are standalone Godot and Unity implementations on GitHub worth reading through.

The hard part is that motion matching needs dense, continuous coverage of every locomotion state, because it's literally searching your capture library. AAA databases run into thousands of seconds of mocap. You don't need that scale for an indie game, but sparse data produces jarring nearest-match transitions and the magic evaporates quickly.

Practical middle ground: use motion matching for the locomotion loop where you have solid coverage, fall back to traditional blendtrees for everything else. The seam between the two systems takes some work but it's manageable, and the locomotion feels noticeably better even with limited data.

Moonjump
Forum Search Shader Sandbox
Sign In Register