Toggle SSAO to compare lighting
There is a category of visual effects in game development that you barely notice when they are present, but the moment they are disabled, everything looks flat, plastic, and unconvincing. Screen Space Ambient Occlusion — SSAO — is the most prominent member of that category. First introduced to games by Crytek in Crysis (2007), SSAO darkens the small crevices, corners, and contact points between surfaces where ambient light has difficulty reaching. It is cheap enough to run in real time, subtle enough to look natural, and impactful enough that virtually every 3D game released in the last fifteen years ships with some variant of it enabled.
This article walks through the complete algorithm: the ambient occlusion integral, the G-buffer setup, hemisphere kernel generation, the per-pixel occlusion pass, the blur step, and the major variants (HBAO, GTAO) that have evolved since the original technique.
What Is Ambient Occlusion?
To understand SSAO, you need to understand what ambient occlusion is physically describing. In the real world, light does not come only from discrete light sources — it bounces off every surface and fills a scene with diffuse illumination from all directions. This omnidirectional ambient light is what prevents shadowed areas from being completely black. But even this ambient light is not uniform across a surface. A flat tabletop in the open receives light from the entire hemisphere above it. The underside of that same table, or the corner where two walls meet, is partially occluded — some incoming directions are blocked by nearby geometry, so less ambient light arrives there.
Ambient Occlusion (AO) quantifies this effect mathematically. For a surface point $p$ with outward normal $\hat{n}$, the occlusion factor $O$ is defined as:
$$O(p, \hat{n}) = \frac{1}{\pi} \int_{\Omega} V(p, \omega)\,(\hat{n} \cdot \omega)\, d\omega$$
Where $\Omega$ is the hemisphere of directions above the surface, $\omega$ is a direction within that hemisphere, $V(p, \omega)$ is a binary visibility function ($1$ if the ray from $p$ in direction $\omega$ hits no geometry within some radius, $0$ if it does), and $(\hat{n} \cdot \omega)$ is a cosine weighting — directions nearly perpendicular to the surface contribute less because they graze the surface at a shallow angle.
The final ambient term applied to a pixel is:
$$L_{ambient}(p) = L_a \cdot (1 - O(p, \hat{n}))$$
A point in the open has $O \approx 0$, receiving full ambient light. A point deep in a crack might have $O \approx 0.8$, making it appear much darker. The tricky part is computing this integral in real time.
Baked AO Versus Screen Space AO
The classical solution is to bake ambient occlusion offline. An artist runs a ray-casting process that fires hundreds of rays from every texel on every surface in the scene and stores the result in a texture. The quality is superb — it is effectively a Monte Carlo approximation of the full integral above. The problem is that it is static. A door swings open and the baked AO no longer matches the geometry. A chest opens, a bridge collapses, a character walks into a room — none of it updates the AO.
Screen Space Ambient Occlusion solves this by computing occlusion at runtime using only the data already present on the GPU after a geometry pass: the depth buffer and the surface normals. The trade-off is that it can only see what the camera can see. Geometry that is off-screen or behind another surface is invisible to the algorithm, which creates occasional artifacts. In practice, for the kinds of contact points and crevices that matter most visually, screen-space information is almost always sufficient.
The G-Buffer Prerequisites
SSAO requires at minimum two pieces of information per pixel, typically obtained from a geometry pre-pass (or from a deferred rendering G-buffer):
- View-space positions: The 3D position of each pixel's surface in camera space. Often reconstructed from the depth buffer using the inverse projection matrix rather than stored explicitly, saving memory bandwidth.
- View-space normals: The surface normal at each pixel, transformed into camera space.
Working in view space (camera space) is convenient because the hemisphere sampling is always relative to the camera, making range checks and depth comparisons simpler. The position at a pixel with normalized device coordinate $\text{ndc}$ and depth $d$ can be reconstructed as:
$$p_{view} = P^{-1} \cdot \begin{pmatrix} \text{ndc}_x \\ \text{ndc}_y \\ d \\ 1 \end{pmatrix}$$
where $P^{-1}$ is the inverse of the projection matrix.
Generating the Hemisphere Sample Kernel
The core idea of SSAO is to test $N$ sample points distributed in a hemisphere oriented to the surface normal. If a sample point is located inside geometry (its position is behind the depth buffer at that screen location), it contributes to occlusion. To do this well, you need a set of sample directions that:
- Are distributed across the hemisphere (not the full sphere — you do not want to sample below the surface)
- Are biased toward the center of the hemisphere (more samples close to the surface point, fewer at the far edge)
- Are randomized enough to avoid banding artifacts
The kernel is generated once on the CPU and uploaded as a uniform array:
// C# / Unity-style kernel generation
Vector3[] GenerateSSAOKernel(int kernelSize) {
var kernel = new Vector3[kernelSize];
var rng = new System.Random();
for (int i = 0; i < kernelSize; i++) {
var sample = new Vector3(
(float)rng.NextDouble() * 2f - 1f, // X: [-1, 1]
(float)rng.NextDouble() * 2f - 1f, // Y: [-1, 1]
(float)rng.NextDouble() // Z: [0, 1] — hemisphere only
);
sample = Vector3.Normalize(sample);
sample *= (float)rng.NextDouble(); // Random magnitude
// Accelerate interpolation toward origin
// More samples cluster near the surface point
float scale = (float)i / kernelSize;
scale = Mathf.Lerp(0.1f, 1.0f, scale * scale);
sample *= scale;
kernel[i] = sample;
}
return kernel;
}
The quadratic scaling of scale is the critical detail. Because nearby occluders have a stronger visual influence than distant ones, you want most of your samples concentrated close to the surface point and fewer distributed at the full radius. The $\text{scale} = t^2$ interpolation produces exactly this distribution.
The Noise Texture
Even with 64 samples, the hemisphere is too sparsely sampled to be noise-free. The solution is to use a small random rotation texture — typically 4×4 pixels — that tiles across the screen. Each pixel gets a slightly different rotation applied to its kernel, breaking up any regular patterns. Banding becomes noise, and noise is trivially removed with a blur pass.
// Generate a 4x4 noise texture on CPU
Vector3[] GenerateSSAONoise(int size) {
var noise = new Vector3[size * size];
for (int i = 0; i < size * size; i++) {
noise[i] = new Vector3(
(float)rng.NextDouble() * 2f - 1f,
(float)rng.NextDouble() * 2f - 1f,
0f // Rotation around Z axis only
);
}
return noise;
}
The noise vector is a random direction in the XY plane. It is used in the shader to construct a tangent frame via the Gram-Schmidt process, which rotates the kernel into a random orientation at each pixel while still keeping it aligned to the surface normal.
The SSAO Fragment Shader
This is where the algorithm happens. For each pixel on screen, the shader samples the G-buffer to get position and normal, builds a TBN (Tangent-Bitangent-Normal) matrix using the noise vector, then iterates over all kernel samples, projecting each into screen space to look up the actual scene depth at that location.
// SSAO Fragment Shader (GLSL)
uniform sampler2D gPosition; // View-space position buffer
uniform sampler2D gNormal; // View-space normal buffer
uniform sampler2D texNoise; // 4x4 random rotation texture
uniform vec3 samples[64]; // Hemisphere kernel
uniform mat4 projection;
const vec2 noiseScale = vec2(screenWidth / 4.0, screenHeight / 4.0);
const int kernelSize = 64;
const float radius = 0.5; // World-space hemisphere radius
const float bias = 0.025; // Depth bias to avoid acne
void main() {
vec2 texCoords = gl_FragCoord.xy / vec2(screenWidth, screenHeight);
// Sample G-buffer
vec3 fragPos = texture(gPosition, texCoords).xyz;
vec3 normal = normalize(texture(gNormal, texCoords).xyz);
vec3 randomVec = normalize(texture(texNoise, texCoords * noiseScale).xyz);
// Build TBN matrix — orients kernel to surface normal
// Uses Gram-Schmidt to make randomVec perpendicular to normal
vec3 tangent = normalize(randomVec - normal * dot(randomVec, normal));
vec3 bitangent = cross(normal, tangent);
mat3 TBN = mat3(tangent, bitangent, normal);
float occlusion = 0.0;
for (int i = 0; i < kernelSize; ++i) {
// Transform sample from tangent space to view space
vec3 samplePos = TBN * samples[i];
samplePos = fragPos + samplePos * radius;
// Project sample position to get its screen-space UV
vec4 offset = vec4(samplePos, 1.0);
offset = projection * offset; // View -> clip space
offset.xyz /= offset.w; // Perspective divide
offset.xyz = offset.xyz * 0.5 + 0.5; // NDC [-1,1] to UV [0,1]
// Get actual scene depth at sample UV
float sampleDepth = texture(gPosition, offset.xy).z;
// Range check: prevent samples far from fragment from contributing
// This avoids a halo effect at geometry silhouettes
float rangeCheck = smoothstep(0.0, 1.0, radius / abs(fragPos.z - sampleDepth));
// Occlusion check: is the sample inside geometry?
// sampleDepth >= samplePos.z means real geometry is at or beyond sample
occlusion += (sampleDepth >= samplePos.z + bias ? 1.0 : 0.0) * rangeCheck;
}
// Normalize and invert (1 = fully lit, 0 = fully occluded)
occlusion = 1.0 - (occlusion / float(kernelSize));
fragColor = vec4(occlusion, occlusion, occlusion, 1.0);
}
The bias parameter deserves attention. Without it, a flat surface would occlude itself — the kernel samples points along the hemisphere, and floating point imprecision causes them to register as slightly below the surface, generating false occlusion. The bias pushes the comparison threshold slightly away from the surface, eliminating this self-occlusion acne at the cost of a very slight loss of fine detail in tight crevices.
The rangeCheck prevents a classic SSAO artifact: haloing. When a foreground object is close to a background surface, some kernel samples reach beyond the foreground object into empty space, but the background depth buffer shows deep geometry there. Without the range check, the algorithm would incorrectly occlude the foreground object's edge. The smooth falloff suppresses contributions from samples whose depth discrepancy is large relative to the sampling radius.
The Blur Pass
The raw SSAO output is noisy due to the sparse kernel and random rotation. A simple box blur over a 4×4 region is applied as a final pass to reconstruct a clean occlusion map:
// SSAO Blur Fragment Shader
uniform sampler2D ssaoInput;
void main() {
vec2 texelSize = 1.0 / vec2(textureSize(ssaoInput, 0));
float result = 0.0;
for (int x = -2; x < 2; ++x) {
for (int y = -2; y < 2; ++y) {
vec2 offset = vec2(float(x), float(y)) * texelSize;
result += texture(ssaoInput, texCoords + offset).r;
}
}
fragColor = vec4(vec3(result / 16.0), 1.0);
}
Higher quality implementations use a bilateral blur (also called a cross-bilateral or joint bilateral filter), which is a blur that respects depth discontinuities. At geometry edges, a standard blur would smear occlusion across the boundary, creating a dark halo. A bilateral blur compares the depth of the blur sample against the center pixel and reduces its contribution when the depths differ significantly — preserving sharp edges in the occlusion map.
Compositing and Final Output
The blurred SSAO texture is applied as a multiplicative factor to the ambient lighting term during the final lighting pass. Only the ambient component is modulated — direct lighting from point lights, spotlights, and the sun is unaffected, since AO only describes the availability of ambient (omnidirectional) light:
// Final lighting pass (deferred or forward+)
float ao = texture(ssaoBlurred, texCoords).r;
vec3 ambient = ambientColor * albedo * ao; // AO modulates only ambient
vec3 diffuse = calculateDiffuse(normal, lightDir, albedo);
vec3 specular = calculateSpecular(normal, lightDir, viewDir, roughness);
fragColor = vec4(ambient + diffuse + specular, 1.0);
Tuning SSAO: Key Parameters
Getting SSAO to look right in a specific game requires tuning several variables:
- Kernel size: 8–16 samples for performance-sensitive targets (mobile, low settings), 32–64 samples for high quality. More samples reduce noise before blurring but increase fragment shader cost linearly.
- Radius: The world-space radius of the sampling hemisphere. Too small and only micro-detail is captured; too large and large geometric features like walls cast incorrect ambient shadows across each other. Typically 0.3–1.0 meters for human-scale environments.
- Bias: 0.01–0.05 in view space. Increase to fix acne on flat or low-poly surfaces; decrease to recover fine detail in tight crevices.
- Power / strength: A post-process exponent on the final AO value, $\text{AO}_{out} = \text{AO}^{k}$. Values above 1 darken crevices more aggressively for a stylized look; values below 1 produce a subtler result.
Variants and Modern Successors
The original SSAO has well-known weaknesses: it treats all hemisphere samples equally regardless of direction, it does not respect the cosine-weighted integral exactly, and its sampling pattern can produce obvious patterns at low sample counts. Several successors address these issues:
HBAO (Horizon-Based Ambient Occlusion)
Introduced by NVIDIA in 2008, HBAO fires rays along the surface in multiple directions and computes the maximum elevation angle (the horizon angle) in each direction. The occlusion is computed from how much of the hemisphere is blocked by the horizon, which more closely approximates the cosine-weighted integral and produces smoother, more physically plausible results. HBAO became the standard in many AAA PC titles through the 2010s.
GTAO (Ground Truth Ambient Occlusion)
GTAO integrates the visibility function analytically over cones in the hemisphere, rather than sampling it at points. By computing the horizon angle along multiple slices and integrating over each slice analytically, it approaches ground-truth accuracy with a fraction of the samples SSAO would need. This is the variant shipped with Unreal Engine 5 and used in titles like Cyberpunk 2077 and Fortnite.
VXAO (Voxel Ambient Occlusion)
Rather than relying on screen-space data, VXAO voxelizes the visible scene into a 3D grid and traces cone-rays through the voxel structure. This eliminates screen-space artifacts (off-screen occluders, silhouette halos) at the cost of higher memory and compute requirements. Intel developed this technique and it appeared in several Nvidia GameWorks-supported titles.
RTAO (Ray-Traced Ambient Occlusion)
With DirectX 12 Ultimate and NVIDIA RTX hardware, it is now viable to compute AO by actually tracing short rays from each surface point using the BVH (Bounding Volume Hierarchy) accelerated by dedicated RT cores. The result is geometrically correct — no screen-space artifacts, no banding, exact visibility — but costs GPU budget that might be better spent on reflections or shadows. RTAO is increasingly used in console generation titles and high-end PC games as a replacement for SSAO/HBAO.
Performance Optimization Strategies
SSAO runs at full screen resolution by default, which is expensive. Several techniques reduce cost without significant quality loss:
- Half-resolution rendering: Run the SSAO pass at half width and height, then upsample. The 4x reduction in pixel count is barely perceptible in the final result, especially after blurring. Most games use this by default.
- Temporal accumulation (TSSAO): Spread sampling across multiple frames using different kernel rotations each frame and reproject previous results. Allows very high sample counts amortized over time. The challenge is handling disocclusion — pixels that were offscreen or behind geometry last frame need special treatment to avoid ghosting.
- Interleaved sampling: Divide the screen into tiles and use a different kernel offset for each tile, then reconstruct using a spatial filter. Related to temporal accumulation but operates within a single frame.
- Separate horizontal/vertical blur passes: A full 2D box blur of radius $r$ costs $O(r^2)$ per pixel. Separating it into one horizontal and one vertical pass reduces cost to $O(r)$ per pixel, a significant saving for large blur radii.
Where You Have Already Seen This
Crysis (2007) is the origin of real-time SSAO in games — it debuted alongside the technique in the same year Crytek engineer Vladimir Kajalin published the original paper. The dark contact shadows in grassy terrain and building interiors were immediately recognizable as something new. Batman: Arkham Asylum and its sequels used SSAO prominently for the moody, high-contrast look of Gotham's architecture. The Witcher 3: Wild Hunt shipped with HBAO+ on PC, giving its dense forests and ruined battlefields natural-looking ground contact. Red Dead Redemption 2 layers multiple AO techniques — baked AO on static props, SSAO for dynamic interaction, and additional contact shadow passes. Cyberpunk 2077 uses GTAO as its baseline and RTAO on supported hardware.
Disable AO in any of these titles and the visual regression is immediate: surfaces appear to float, interiors look flat, and the sense of physical weight that objects press into each other vanishes. It is not the most glamorous effect — it produces a subtle darkening in crevices rather than dramatic shafts of light — but it is one of the most perceptually important contributions to a convincing 3D world.
Putting It All Together
SSAO is an elegant compromise between physical accuracy and real-time performance. It approximates a continuous integral over the visible hemisphere by sampling a finite set of points, uses screen-space data to avoid expensive scene traversal, and recovers noise with a cheap blur pass. The core loop — orient kernel to normal, project sample to screen, compare depth — is only a few dozen lines of GLSL, yet the visual payoff is immediate and significant.
For new implementations, starting with the original point-sample approach at 32 samples and a 4×4 noise tile is the right entry point. Once that baseline is working and composited correctly, upgrading to a bilateral blur and then to temporal accumulation will handle the remaining quality gaps without a proportional increase in runtime cost. From there, the jump to HBAO or GTAO is a matter of replacing the sampling loop with an analytical horizon integration — the infrastructure of G-buffer, noise, blur, and compositing remains identical.
Comments