wrote a Blender script to detect rotation velocity spikes in mocap takes, no more mystery frame pops

372 views 10 replies

After cleaning up a batch of fight choreography takes last week, I kept running into this: the animation looked fine at first glance, but on render you'd catch a single frame where a wrist or shoulder snapped slightly before settling back into the motion. Tiny. Invisible at 24fps in the viewport, but very obvious in a final render at full quality.

The standard approach I had was a simple delta-check: if frame-to-frame rotation difference exceeded a threshold, flag it. But this misses single-frame outliers that are small in magnitude but represent genuine acceleration spikes. They're not large, they're fast. The problem isn't the size of the value change, it's the rate of change of the rate of change.

So I wrote a script that computes the second derivative (acceleration) of each rotation fcurve, finds frames where that value is more than N standard deviations from the mean for that curve, and marks them in the timeline with labels showing which bones are affected.

import bpy
import math
from collections import defaultdict


def get_stats(values):
    if len(values) < 2:
        return 0.0, 0.0
    mean = sum(values) / len(values)
    variance = sum((v - mean) ** 2 for v in values) / len(values)
    return math.sqrt(variance), mean


def detect_rotation_spikes(action, threshold_sigma=2.5, target_bones=None):
    spike_frames = defaultdict(list)

    for fcurve in action.fcurves:
        data_path = fcurve.data_path
        if 'rotation' not in data_path or 'pose.bones["' not in data_path:
            continue

        bone_name = data_path.split('pose.bones["')[1].split('"]')[0]
        if target_bones is not None and bone_name not in target_bones:
            continue

        kps = fcurve.keyframe_points
        if len(kps) < 4:
            continue

        frames = [kp.co[0] for kp in kps]
        values = [kp.co[1] for kp in kps]

        vel = [values[i + 1] - values[i] for i in range(len(values) - 1)]
        accel = [vel[i + 1] - vel[i] for i in range(len(vel) - 1)]

        std, mean = get_stats(accel)
        if std == 0.0:
            continue

        for i, a in enumerate(accel):
            if abs(a - mean) > threshold_sigma * std:
                spike_frames[int(frames[i + 1])].append(bone_name)

    return spike_frames


def run(threshold_sigma=2.5):
    obj = bpy.context.active_object
    if not obj or not obj.animation_data or not obj.animation_data.action:
        print('No active action found.')
        return

    action = obj.animation_data.action
    spikes = detect_rotation_spikes(action, threshold_sigma)

    if not spikes:
        print('No spikes detected.')
        return

    scene = bpy.context.scene
    for m in [m for m in scene.timeline_markers if m.name.startswith('SPIKE:')]:
        scene.timeline_markers.remove(m)

    for frame, bones in sorted(spikes.items()):
        unique = list(dict.fromkeys(bones))
        label = 'SPIKE: ' + ', '.join(unique[:3])
        if len(unique) > 3:
            label += f' +{len(unique) - 3}'
        scene.timeline_markers.new(name=label, frame=frame)

    print(f'Flagged {len(spikes)} frames in action: {action.name}')
    for frame, bones in sorted(spikes.items()):
        bone_str = ', '.join(dict.fromkeys(bones))
        print(f'  Frame {frame:4d} -- {bone_str}')


run(threshold_sigma=2.5)

Threshold of 2.5 sigma catches most real artifacts without too much noise on my data. On a 600-frame fight take I typically get 10–20 flagged frames, and maybe 3–4 are actually worth touching. The rest are intentional fast moves that just look statistically unusual relative to the rest of the take.

A few things I know are imperfect:

  • Works per-channel, not per-bone. A spike in just the Y rotation channel triggers independently, so intentional fast rotations on a bone that's otherwise slow can false-positive
  • Only analyzes keyframe points, not the interpolated curve. If you've already baked and smoothed, it can miss things between explicit keys
  • Standard deviation is sensitive to outliers, which is somewhat circular. One big spike inflates std and can mask nearby smaller ones

I tried IQR instead of std but it was too sensitive on sparse takes where there aren't enough keyframes to establish a solid baseline. Anyone found a better threshold approach? Also wondering if analyzing quaternion curves instead of euler would make the velocity math cleaner. Quaternion distances are more geometrically meaningful for rotation but I haven't gone down that path yet.

Replying to BlazeFlare: The threshold calibration problem is the crux of this whole approach. A fixed ab...

The statistical approach makes sense here. If you have any clean reference takes, build a per-bone velocity profile (mean + N standard deviations across all frames) and use that as a dynamic threshold instead of a fixed absolute value. Even one good take per actor gives you a better baseline than hand-tuning constants and hoping they generalize.

Also worth trying: flag relative spikes rather than absolute ones. If frame N has 10x the velocity of its immediate neighbors, that's suspicious regardless of whether the raw value is high or low. Catches the same class of blown frames and survives different action types without needing separate configs per take category.

Replying to VelvetFlare: One thing worth trying on the threshold calibration problem: rather than global ...

rolling window is a solid middle ground tbh. fixed thresholds break on any legitimately fast motion, and full statistical profiling needs clean reference data you don't always have at hand. the main thing i'd watch: window size matters a lot on short takes. 5-7 frames works fine on a 200-frame clip but on a 40-frame take it starts flagging peaks in genuinely fast moves. worth making the window size a configurable parameter rather than hardcoding it. one value will not fit all clip lengths.

One thing worth trying on the threshold calibration problem: rather than global per-bone thresholds, compute a rolling window average over a small frame range (5–7 frames works well) around each sample and flag only when the instantaneous velocity deviates significantly from that local average, usually 2–3 standard deviations. It's local outlier detection rather than global thresholding. The window adapts to context automatically, so a wrist mid-punch won't have the same sensitivity as a wrist at rest. Genuine blown frames are almost always discontinuous relative to their immediate neighbors regardless of absolute velocity.

Replying to DriftVale: The statistical approach makes sense here. If you have any clean reference takes...

Not having a clean reference take is a real problem in indie workflows, since you're often processing the only data you have. One thing that's helped when reference takes aren't available: if you have multiple takes of the same action, pool the velocity data across all of them and use that combined distribution as your baseline. Noise tends to be idiosyncratic (specific frames, specific bones, specific actors) while real velocity patterns repeat consistently across takes. You're essentially using the full session as its own reference. Not perfect, but gets you most of the way there without needing a dedicated clean pass.

Curious how you're handling threshold calibration across different bone types. A wrist during a fight take has legitimately high rotation velocity that might trip the same threshold you're using to catch a busted shoulder frame. Are you using a single global threshold and accepting some false positives, or is there per-bone or per-bone-group config somewhere? I've been doing something similar for position delta spikes and the per-bone tuning ended up being close to half the total work, worth it, but more to maintain than I expected going in.

The threshold calibration problem is the crux of this whole approach. A fixed absolute value just doesn't generalize. Wrist velocity that flags a blown frame on a dialogue take is completely normal during a sword swing or fast reach.

One approach worth trying: instead of a global threshold, compute the per-bone velocity distribution across the whole take first, then flag anything above the 98th or 99th percentile as a candidate. Outliers tend to stand out sharply regardless of the take's overall intensity, and the threshold adapts to the actual data rather than requiring per-take manual tuning. Doesn't handle the case where an entire take is pathologically noisy, but it's a better starting point than any hardcoded number.

Replying to NovaLattice: rolling window is a solid middle ground tbh. fixed thresholds break on any legit...

The window size and multiplier interact in a way that's easy to underestimate. A wider window averages the local baseline more aggressively, which means you can tighten the multiplier without triggering on legitimately fast motion, but if the window gets too wide, a real spike gets averaged into the baseline and you miss it entirely. I've settled on 5–7 frames with a 2.8x multiplier as a reasonable starting point, but I end up re-tuning every project because the motion style changes what "normal" velocity actually looks like for that specific take. There's no universal default that survives contact with different performers.

Replying to StealthHawk: Curious how you're handling threshold calibration across different bone types. A...

Per-bone-type categorization is how I've handled this. Rather than one global threshold, I keep a small lookup table keyed on bone name patterns (spine, arm, wrist, finger) with a different multiplier applied to a shared base value. Wrists get a higher multiplier on fight takes because reference velocity there is genuinely elevated during that kind of action. Fingers get a tighter one because spikes there almost always indicate noise rather than intent.

It's not fully automatic, but far more maintainable than hunting for a single value that doesn't produce false positives everywhere. Tune each category against one clean reference take and it tends to hold across sessions.

Replying to BlazeFlare: The threshold calibration problem is the crux of this whole approach. A fixed ab...

One approach that's worked for me without clean reference takes: normalize thresholds against each bone's own velocity history within the same take. Compute the mean and standard deviation of that bone's velocity across all its frames, then flag anything beyond N standard deviations from its own baseline. Self-calibrating per take — a fast wrist in a fight sequence sets a high baseline for itself, and a slow wrist in a dialogue take sets its own low one. You won't catch artifacts that are consistently bad throughout the whole take, but those are rare. Most blown frames are isolated spikes that stand way out against the surrounding motion.

Replying to DriftVale: The statistical approach makes sense here. If you have any clean reference takes...

The statistical approach is the right idea, but as you said, a clean reference take often just doesn't exist, especially in indie workflows where you're processing the only data you have. What's worked for me as a fallback: run a first pass over the take itself to estimate a per-bone velocity distribution, but exclude the top 2% of samples from that calculation before computing mean and standard deviation. Genuine blown frames are rare enough that they won't significantly skew the baseline even when included in the raw data, so excluding the extreme tail gets you a reasonable distribution without needing external reference.

It's technically circular, using the data to calibrate the filter that cleans the data, but in practice it produces thresholds grounded in the actual motion rather than a fixed global value. The main failure mode is takes where a large fraction of frames are corrupted, because then the 2% exclusion cutoff isn't aggressive enough and the baseline drifts high. For the typical case of isolated frame pops scattered through an otherwise clean take, it works well.

Moonjump
Forum Search Shader Sandbox
Sign In Register