Game Dev Mechanics: Skeletal Animation & Skinning — How It Works

Kieren Hovasapian March 5, 2026 13 min read

Drag to rotate
Mesh deforms as bones animate

Bone 0 — Root

Bone 1

Bone 2

Bone 3 — Tip

Every time a character in a video game walks, jumps, attacks, or waves — the motion you see is powered by skeletal animation, one of the most foundational techniques in real-time game development. Rather than animating every individual vertex on a mesh by hand, skeletal animation uses a hierarchical structure of virtual bones to drive mesh deformation. It's efficient, flexible, and elegantly mathematical.

In this article, we'll break down how skeletal animation and skinning actually work under the hood — from bone hierarchies and transform matrices, to the per-vertex math that deforms your mesh in real time, to keyframe interpolation and animation blending. Whether you're writing your own animation system from scratch or simply want to understand what's happening inside Unity, Unreal, or Godot, this is the place to start.

What Is a Skeleton?

In skeletal animation, a skeleton (also called a rig) is a hierarchical tree of transforms — each node called a bone. Despite the name, bones are not geometry; they're invisible, logical transforms that define local coordinate spaces. A human character might have 60–100 bones: one for the pelvis, one for each spine segment, two for collarbones, upper arms, forearms, hands, and fingers, down to toes and facial joints.

The hierarchy is the key insight. Each bone has a parent, and transforms propagate downward through the tree. Move the upper arm bone, and the forearm and hand automatically follow — just like the real world. This is what makes skeletal animation so powerful: you only need to pose a small number of logical transforms, and the full character mesh deforms as a consequence.

Bones are typically represented by two pieces of data:

Local transform — the bone's position, rotation, and scale relative to its parent bone
World transform — the bone's position, rotation, and scale in world space, derived by combining all ancestor transforms up the chain

Transforms and the Bone Hierarchy

Each bone's transform is represented as a 4×4 matrix — or equivalently, a translation vector, a rotation quaternion, and a scale vector (often abbreviated TRS). For a given bone $b$ with parent $p$, its world-space transform $M_b^{world}$ is:

$$M_b^{world} = M_p^{world} \cdot M_b^{local}$$

And each local matrix is the product of translation, rotation, and scale components:

$$M_b^{local} = T_b \cdot R_b \cdot S_b$$

Walking the entire hierarchy from root to leaf and multiplying matrices at each step is called forward kinematics. Starting from the root (e.g., the hips), each child bone's world transform depends only on its parent's world transform and its own local transform. This makes a simple recursive traversal — or equivalently, a top-down iteration over bones sorted by depth — sufficient to update the entire skeleton.

def compute_world_transforms(bones):
    # bones must be ordered parent-before-child
    world_transforms = {}
    for bone in bones:
        local = bone.local_transform  # 4x4 matrix
        if bone.parent is None:
            world_transforms[bone.name] = local
        else:
            parent_world = world_transforms[bone.parent.name]
            world_transforms[bone.name] = parent_world @ local
    return world_transforms

In practice, game engines perform this traversal every frame for every animated character, so it needs to be fast. Engines like Unreal use SIMD-accelerated matrix multiplication and process entire skeletons in a few microseconds using job systems that parallelize across characters.

The Bind Pose and the Inverse Bind Matrix

Before we can animate anything, we need to understand the bind pose (also called the rest pose or T-pose). This is the exact pose the skeleton was in when the mesh was originally sculpted and rigged — typically a neutral standing pose with arms outstretched. All vertex positions in the mesh file are defined relative to this pose.

When animating, we need a way to express: how much has this bone moved from where it was in the bind pose? We capture this with the inverse bind matrix.

Let $B_i$ be the world-space matrix of bone $i$ in the bind pose. Its inverse $B_i^{-1}$ transforms a vertex from bind-pose world space into bone $i$'s local coordinate frame. During animation, when bone $i$ has a new world-space matrix $M_i$, the skinning matrix is:

$$S_i = M_i \cdot B_i^{-1}$$

This composed matrix does exactly what we need: it first un-does the bind pose transform (moving the vertex into the bone's local space as if it were a child of that bone), then re-applies the new animated world transform. Without the inverse bind matrix, you'd be double-transforming every vertex — a common beginner mistake that results in a wildly exploding mesh.

The inverse bind matrices are computed once at load time and never change. Only $M_i$ changes each frame as the skeleton is posed.

Skinning: Binding Vertices to Bones

Now we have a posed skeleton with skinning matrices. But how do we actually move the mesh vertices? This is skinning.

During rigging, each vertex is assigned to one or more bones with associated blend weights. A weight $w_i$ is a value between 0 and 1 representing how strongly bone $i$ influences that vertex, with all weights summing to 1:

$$\sum_{i=0}^{n-1} w_i = 1$$

A vertex at the elbow crease might be 50% upper arm and 50% forearm. A vertex in the middle of the thigh might be 100% thigh bone. A vertex at the shoulder might be split across three bones to achieve smooth rolling deformation. These weights are painted by artists using a weight painting tool in software like Blender or Maya — it's part science, part craft.

In real-time rendering, vertices are typically limited to 4 bone influences (some modern engines support 8). This constraint keeps per-vertex data size predictable — you store exactly 4 bone indices and 4 weights per vertex — and maps cleanly to vec4 types in shader code.

Linear Blend Skinning — The Core Algorithm

Linear Blend Skinning (LBS), also called smooth skinning or matrix palette skinning, is the standard algorithm that deforms mesh vertices at runtime. Given a vertex $v$ in bind-pose space, its animated position $v'$ is the weighted sum of where each influencing bone would independently place it:

$$v' = \sum_{i=0}^{n-1} w_i \cdot S_i \cdot v$$

This is an elegant idea: each bone "votes" on where the vertex should end up, and the votes are weighted by influence strength. The result is smooth, continuous deformation across bone boundaries — far better than the rigid, segmented look of older skeletal systems that assigned each vertex to exactly one bone.

Because this computation happens per-vertex and is entirely independent between vertices, it's a perfect fit for the GPU. Here's a typical LBS vertex shader in GLSL:

// Bone matrices uploaded as a uniform array (up to 64 bones here)
uniform mat4 u_BoneMatrices[64];

// Per-vertex attributes (packed in a VBO)
in vec3  a_Position;
in vec3  a_Normal;
in ivec4 a_BoneIDs;     // indices into u_BoneMatrices
in vec4  a_BoneWeights; // must sum to 1.0

uniform mat4 u_ProjectionView;

void main() {
    // Accumulate the weighted bone transforms into one skin matrix
    mat4 skinMatrix  = u_BoneMatrices[a_BoneIDs.x] * a_BoneWeights.x;
    skinMatrix      += u_BoneMatrices[a_BoneIDs.y] * a_BoneWeights.y;
    skinMatrix      += u_BoneMatrices[a_BoneIDs.z] * a_BoneWeights.z;
    skinMatrix      += u_BoneMatrices[a_BoneIDs.w] * a_BoneWeights.w;

    vec4 skinnedPos    = skinMatrix * vec4(a_Position, 1.0);
    vec3 skinnedNormal = normalize(mat3(skinMatrix) * a_Normal);

    gl_Position = u_ProjectionView * skinnedPos;
}

Note that normals must also be skinned. If you transform vertex positions but not normals, your lighting will look wrong whenever a character deforms — the normals will point in stale bind-pose directions while the mesh has moved. Normals are transformed by the 3×3 upper-left submatrix of the skin matrix. For uniform scales (which most characters have), the regular 3×3 is fine; for non-uniform scales, you'd technically need the inverse-transpose.

The Candy Wrapper Problem

LBS has one well-known failure mode: the candy wrapper artifact (also called the "collapsing elbow" problem). When two bones rotate in opposite directions about a shared twist axis — like a forearm twisting — the averaged skinning matrices can cause the mesh volume to collapse inward, looking exactly like a twisted candy wrapper.

The root cause is that LBS blends matrices, and matrix interpolation is not geometrically meaningful for rotations. Adding two rotation matrices together doesn't give you the rotation halfway between them — it gives you something in between that may not even be a valid rotation matrix.

Common solutions include:

Corrective blend shapes — artist-authored morph targets that activate at specific joint angles to push the mesh back into a plausible shape
Dual quaternion skinning — a mathematical improvement that interpolates in a rotation-preserving space
Additional twist bones — inserting extra bones along forearm and upper arm segments to distribute the twist, reducing the error at any single joint

Keyframe Animation and Interpolation

A skeleton can be posed statically, but to animate it we need animation clips — time-sampled sequences of bone transforms. Rather than storing a full skeleton pose at every frame, animation clips use keyframes: sparse pose samples at important moments, with interpolation filling the frames in between.

For translations and scales, linear interpolation (LERP) works well:

$$v(t) = (1 - t) \cdot v_0 + t \cdot v_1, \quad t \in [0, 1]$$

For rotations, however, LERP on Euler angles produces gimbal lock, non-constant angular velocity, and visual pops. Instead, rotations are stored as unit quaternions and interpolated using Spherical Linear Interpolation (SLERP):

$$\text{slerp}(q_0, q_1, t) = \frac{\sin((1-t)\,\Omega)}{\sin\Omega}\, q_0 + \frac{\sin(t\,\Omega)}{\sin\Omega}\, q_1$$

Where $\Omega = \cos^{-1}(q_0 \cdot q_1)$ is the angle between the two quaternions on the unit hypersphere. SLERP produces smooth, constant-speed rotation along the shortest arc — giving natural, artifact-free motion for any joint at any angle.

BonePose SampleClip(AnimationClip clip, string boneName, float time) {
    var track = clip.GetTrack(boneName);

    // Find the two keyframes surrounding 'time'
    Keyframe kf0 = track.GetKeyframeBefore(time);
    Keyframe kf1 = track.GetKeyframeAfter(time);

    // Normalized blend factor [0..1] between the two keyframes
    float t = (time - kf0.time) / (kf1.time - kf0.time);

    return new BonePose {
        position = Vector3.Lerp(kf0.position, kf1.position, t),
        rotation = Quaternion.Slerp(kf0.rotation, kf1.rotation, t),
        scale    = Vector3.Lerp(kf0.scale,    kf1.scale,    t)
    };
}

In modern engines, keyframe data is often further compressed — storing only the channels that actually animate (many bones are static for most of a clip), using quantized 16-bit values for rotations, and encoding curves as cubic Hermite splines rather than dense linear keyframe arrays. This can reduce animation memory by 5–10x with virtually no visual difference.

Animation Blending

Real characters almost never play a single animation at a time. They transition between walk and run, overlay an upper-body aim pose on top of a lower-body locomotion cycle, or react to a hit mid-swing. This is solved through animation blending.

The simplest form is a two-clip linear blend. Given clip A (e.g., walk) and clip B (e.g., run), and a blend factor $\alpha \in [0, 1]$ driven by the character's speed:

$$pose = (1 - \alpha) \cdot pose_A + \alpha \cdot pose_B$$

Translations and scales blend with LERP; rotations blend with SLERP. The result is a seamless crossfade that can be driven continuously — as the character accelerates, $\alpha$ rises smoothly from 0 to 1, blending from a full walk to a full run with all poses in between.

More sophisticated systems use blend trees — directed acyclic graphs where multiple clips are blended based on input parameters. A 2D directional blend tree might take four clips (strafe left, strafe right, walk forward, walk backward) and blend them using barycentric coordinates based on the character's current velocity vector. The animator authors the clips; the blend tree handles all the in-between states automatically.

Additive animation is another powerful layer. Rather than blending two full-body poses, an additive clip stores the difference from a reference pose — a delta. This delta is then added on top of any other animation. Character breathing, subtle head-bob, a weapon recoil, or a facial expression can all be implemented as additive layers that stack cleanly on top of base locomotion without the animator needing to re-author every combination.

Masked blending (also called layered blending) lets you apply different animations to different parts of the skeleton simultaneously. A character can run with their legs (lower body mask) while reloading a weapon with their arms (upper body mask). Each bone in the skeleton has a per-layer weight, making it possible to blend in a recoil animation only on the spine, arms, and head while the legs continue their locomotion cycle undisturbed.

Dual Quaternion Skinning

Dual Quaternion Skinning (DQS), introduced by Ladislav Kavan et al. in 2007, directly addresses LBS's volume-collapse problem. Instead of blending 4×4 matrices, DQS represents each bone transform as a dual quaternion:

$$\hat{q} = q_0 + \epsilon\, q_e$$

Here $q_0$ is a unit quaternion encoding rotation, $q_e$ is a quaternion encoding translation, and $\epsilon$ is the dual unit ($\epsilon^2 = 0$). Dual quaternions can represent any rigid-body motion (rotation + translation) in a single compact object with provably better interpolation properties than matrices.

The skinning blend becomes a weighted average of dual quaternions followed by normalization and application to the vertex. Because the interpolation stays on the space of rigid motions, the "candy wrapper" collapse cannot occur — volume is preserved across twist rotations.

uniform vec4 u_DQReal[64]; // real part (rotation quaternion)
uniform vec4 u_DQDual[64]; // dual part (encodes translation)

vec3 dqSkinPosition(vec3 pos, ivec4 ids, vec4 weights) {
    // Blend dual quaternions linearly
    vec4 blendReal = u_DQReal[ids.x] * weights.x
                   + u_DQReal[ids.y] * weights.y
                   + u_DQReal[ids.z] * weights.z
                   + u_DQReal[ids.w] * weights.w;
    vec4 blendDual = u_DQDual[ids.x] * weights.x
                   + u_DQDual[ids.y] * weights.y
                   + u_DQDual[ids.z] * weights.z
                   + u_DQDual[ids.w] * weights.w;

    // Normalize the real part to keep it on the unit hypersphere
    float len  = length(blendReal);
    blendReal /= len;
    blendDual /= len;

    // Apply the dual quaternion rigid transform to the position
    vec3 r = blendReal.xyz;
    float r0 = blendReal.w;
    vec3 d = blendDual.xyz;
    float d0 = blendDual.w;

    vec3 translation = 2.0 * (r0 * d - d0 * r + cross(r, d));
    vec3 rotated = pos + 2.0 * cross(r, cross(r, pos) + r0 * pos);
    return rotated + translation;
}

DQS does have trade-offs: it doesn't handle non-uniform scales, and can produce a "bulging" artifact near 180° flips between opposing rotations. Most production engines expose both LBS and DQS as per-mesh options, letting technical artists choose based on visual results.

Performance Considerations

Skeletal animation runs every frame for every character, so performance matters enormously in games with dense crowds or high character counts.

GPU skinning — by doing all the LBS math in the vertex shader, the CPU only needs to upload a compact array of bone matrices. The GPU processes all vertices in parallel, with no per-vertex work on the CPU at all.
Bone matrix palettes — the array of skinning matrices $S_i$ uploaded to the GPU is called the matrix palette. Keeping this array small (fewer bones) reduces upload cost and fits better in GPU constant buffers.
Animation LOD — distant characters can use reduced-bone skeletons, lower keyframe sample rates, or even completely frozen poses. Unity and Unreal both support animation LOD that seamlessly downgrades update frequency with distance.
Bone culling — for off-screen characters, skip the forward kinematics traversal entirely. For partially visible characters, skip fine detail bones (individual fingers, facial joints) that won't be visible at their current screen size.
Deformation via compute shaders — some modern engines move skinning from the vertex shader into a dedicated compute pass that writes deformed vertex positions into a GPU buffer, allowing reuse across multiple render passes (shadows, reflections, etc.) without re-skinning.

The Full Per-Frame Pipeline

Putting it all together, here is the complete pipeline from animation data to deformed mesh executed every frame:

def update_character(character, delta_time):
    # Step 1: Advance the animation time
    character.anim_time += delta_time

    # Step 2: Sample each bone's local transform from the active clip(s)
    local_poses = {}
    for bone in character.skeleton.bones:
        pose_a = sample_clip(character.clip_a, bone.name, character.anim_time)
        pose_b = sample_clip(character.clip_b, bone.name, character.anim_time)
        local_poses[bone.name] = blend_poses(pose_a, pose_b, character.blend_alpha)

    # Step 3: Apply any additive layers (breathing, recoil, IK overrides, etc.)
    local_poses = apply_additive_layer(local_poses, character.additive_clip)

    # Step 4: Forward kinematics — propagate world transforms down the hierarchy
    world_transforms = compute_world_transforms(character.skeleton, local_poses)

    # Step 5: Compute skinning matrices: S_i = M_i * InvBind_i
    skinning_matrices = []
    for bone in character.skeleton.bones:
        S = world_transforms[bone.name] @ bone.inverse_bind_matrix
        skinning_matrices.append(S)

    # Step 6: Upload the matrix palette to the GPU
    upload_bone_matrices_to_gpu(skinning_matrices)

    # Step 7: Vertex shader runs LBS per-vertex (see GLSL above)
    #         This step happens automatically during the render pass

Real-World Examples

Skeletal animation is used in virtually every 3D game, but some titles push the technique to its limits in interesting ways:

The Last of Us Part II uses a dense bone hierarchy with corrective blend shapes and a proprietary muscle simulation layer to achieve hyper-realistic skin deformation during extreme movements. Joel and Ellie's faces alone use hundreds of bones and blend shapes.
Sekiro: Shadows Die Twice relies on extremely precise samurai combat timing, requiring fine-tuned skeleton rigs with extensive corrective shapes on the arms and torso to sell sword swings convincingly at fast playback speeds — where LBS artifacts would be particularly visible.
Horizon Zero Dawn applies skeletal animation to its machine enemies, blending mechanical joint rotations with procedural IK for precise ground contact on uneven terrain. The result is characters that feel simultaneously robotic and organically responsive.
Fortnite renders hundreds of simultaneously animated characters with aggressive animation LOD, GPU skinning, and a custom animation update scheduler that spreads high-quality updates across multiple frames, maintaining performance on a huge variety of hardware.
Half-Life: Alyx leverages VR's first-person perspective to demand extremely high-quality hand and arm skinning with dual quaternion skinning enabled — wrist twists and grip poses are constantly in the player's close-up view, making any LBS artifacts immediately apparent.

Conclusion

Skeletal animation is one of the most elegant systems in game development. Rather than simulating the full physical complexity of organic motion, it uses a lightweight hierarchical proxy — the skeleton — to drive mesh deformation through weighted matrix blending. The math is rich but tractable, and the result scales from a simple 20-bone cartoon character all the way to a 300-bone photorealistic human face running at 60 frames per second.

Understanding LBS, inverse bind matrices, SLERP interpolation, and animation blending gives you a solid foundation not just for writing your own animation systems, but for making better use of the tools you already have. The next time you watch a character smoothly transition from walking to sprinting, or see a hand close convincingly around a door handle, you'll know exactly what's happening under the hood: a carefully weighted sum of matrix transforms, computed thousands of times per second, turning a static mesh into living motion.

Comments

Like this article? Consider supporting us

Your support helps us keep creating free game dev content, tutorials, and tools.

Newsletter access
Public posts & updates
Community access

Join Free

Popular

Everything in Free
Members-only game dev articles
Behind-the-scenes dev logs
Support the forum community
Vote on future game features

Everything in Supporter
Your name in game credits
Priority feature requests
Direct developer access
Monthly asset downloads