Great writeup — I went down this same rabbit hole last year. One thing I'd add: raw MediaPipe output is pretty jittery, especially on fast movements, and feeding that directly into Blender produces unpleasant noise in the curves. I had good results running a simple one-euro filter over the joint positions before export — it's a low-latency smoothing filter designed exactly for this use case, much better than a plain moving average because it adapts to velocity.
There's a Python implementation that drops in easily if you're scripting the pipeline. The parameters need tuning per joint type — fingers need different settings than hips — but once dialed in the difference is dramatic.
Also worth noting: MediaPipe's world landmark coordinates drift pretty badly during lateral movement since there's no absolute positional tracking. If you need the character to actually travel across the scene rather than staying rooted, you'll want to integrate something like optical flow from the video to estimate translation, or just accept that you'll be manually adjusting root motion in Blender afterward.


