Every frame your game engine renders, the GPU must process every triangle submitted to it, whether the player can see it or not. A dungeon room sealed behind a closed door, a city block hidden behind a massive skyscraper, the far side of a mountain range: if you submit this geometry for rendering, the GPU processes it and then discards it during depth testing. This wasted work accumulates, and in dense scenes it can consume most of your frame budget on geometry the player will never see.
Occlusion culling solves this by determining, before the GPU gets involved, which objects are completely hidden behind other solid objects. Skipping that invisible geometry can cut draw calls and triangle counts by 70–90% in interior or dense urban scenes.
How It Differs from Frustum Culling
Before diving in, it helps to distinguish occlusion culling from the simpler technique it builds upon: frustum culling. Frustum culling discards objects entirely outside the camera's view frustum, the pyramidal volume of visible space. If an object is behind the camera, or far off to the side beyond the field of view, it's simply not submitted to the GPU at all.
Occlusion culling is the next step: of the objects inside the frustum, which ones are completely hidden behind other solid objects? The two techniques are complementary and stack multiplicatively. Frustum culling might reduce your scene from 10,000 objects to 2,000 (those within the camera's view). Occlusion culling then reduces that 2,000 down to perhaps 300–400 (those actually visible with no occluders in the way).
The key concepts are occluders and occludees:
- Occluders: solid, opaque objects large enough to fully hide other geometry (walls, buildings, terrain, large boulders).
- Occludees: objects that might be hidden (enemies, props, smaller details, distant geometry).
In practice, any solid object can be both. A large building occludes the alley behind it, but is itself an occludee relative to an even taller skyscraper in front of it.
Types of Occlusion Culling
Hardware Occlusion Queries
Modern GPUs expose an API that lets you ask: “How many pixels would this bounding box cover if rendered right now?” You draw simplified proxy geometry, usually axis-aligned bounding boxes (AABBs), with color and depth writes disabled. The GPU counts the fragments that pass the depth test. If the count is zero, the real mesh is fully occluded and can be safely skipped.
The critical catch is GPU query latency. Results from the GPU don't return to the CPU immediately, typically arriving 1–2 frames later. Games handle this by using the previous frame's query results to cull the current frame, accepting the occasional one-frame pop-in when a previously hidden object suddenly becomes visible. Unity's Built-in Renderer and Unreal Engine's Hardware Occlusion Queries both use this approach.
Portal/Cell-Based Culling
Used extensively in classic engines like Quake, Doom 3, and early console titles, this approach divides the world into cells (rooms, corridors, open areas) connected by portals (doorways, windows, archways). The renderer starts from the cell containing the camera and recursively determines which cells are visible through the portal chain.
If the camera is in Room A looking through a doorway into Room B, only the portion of Room B visible through the doorway's portal rectangle is rendered. Room C, connected to Room B through a door the camera can't see through, is culled entirely. In interior environments, this is highly efficient: the level structure itself enforces visibility, and the layout the designer created becomes the visibility system.
Precomputed Visibility Sets (PVS)
Rather than computing visibility at runtime, PVS algorithms bake visibility information offline. The world is divided into cells, and for each cell, an offline compiler precomputes which other cells could possibly be visible from anywhere within it. At runtime, the camera looks up its current cell's PVS, typically a compact bitset, and skips everything not listed.
Quake's PVS system, introduced in 1996, showed how much work could be shifted offline. The BSP compiler would spend minutes computing visibility for complex maps, but the runtime cost was trivial: a bitset lookup and a few pointer dereferences. The tradeoff is bake time and memory footprint, but for static environments with predictable topology, it produces near-perfect culling with essentially zero runtime overhead.
Software Occlusion Culling
Instead of using the GPU for occlusion tests, software occlusion culling rasterizes a low-resolution depth buffer entirely on the CPU using SIMD instructions. Large occluder objects are drawn into this CPU-side depth buffer first, then candidate occludees are projected and tested against it. Intel's open-source Masked Software Occlusion Culling library, used in Star Citizen and other titles, can test millions of triangles per second on modern CPUs, with zero GPU round-trip latency.
The Algorithm in Detail: Hierarchical Z-Buffer (Hi-Z)
The most general and widely-adopted technique in modern real-time engines is the hierarchical Z-buffer, also called Hi-Z occlusion culling. Here's how it works step by step.
Step 1: Render Occluders to a Depth Buffer
First, render your occluder objects to a dedicated depth buffer, typically at reduced resolution (256×144 or 512×288 is common). Only depth values are written; no color output is needed. This is the occlusion prepass. Large, simple geometry makes the best occluders: a low-poly proxy building covers the same screen area as a 10,000-triangle detailed version, at a fraction of the rasterization cost. Many engines maintain a separate simplified “occluder mesh” alongside the high-detail render mesh for exactly this reason.
Step 2: Build the Hi-Z Mipmap
Generate a mipmap chain from the depth buffer, but instead of averaging neighboring pixels (as in standard mipmapping), store the maximum depth from the four corresponding pixels at each level:
$$\text{HiZ}[i][x,y] = \max\!\left(\text{HiZ}[i-1][2x,\,2y],\;\text{HiZ}[i-1][2x+1,\,2y],\;\text{HiZ}[i-1][2x,\,2y+1],\;\text{HiZ}[i-1][2x+1,\,2y+1]\right)$$
This creates a conservative “what is the farthest depth value in this screen region?” hierarchy. At the coarsest mip level, a single texel represents the maximum depth across the entire screen. This is sometimes called a depth pyramid or max-depth mipmap.
Step 3: Project Occludee Bounding Boxes
For each candidate occludee, project its AABB into screen space. This yields a 2D screen-space rectangle and the box's minimum depth, the depth of the nearest corner to the camera. Call this $z_{near}$: the closest this object can possibly be. If even that nearest point is occluded, the entire object must be occluded.
Step 4: Sample the Depth Pyramid
Select the mipmap level whose texel size approximately matches the projected rectangle's screen-space dimensions. This is the key insight: a single texture sample at the right mip level covers the entire projected footprint of the bounding box. Then test:
$$\text{if}\quad z_{near} > \text{HiZ}[\text{level}][x_{center}, y_{center}] \quad \Rightarrow \quad \text{object is fully occluded, skip it}$$
If the nearest point of the bounding box is farther from the camera than the maximum depth in that screen region (meaning every pixel in that region belongs to geometry closer than our object), the occludee cannot possibly be visible. The entire object is culled in a single texture lookup.
The test is conservative: it may classify some occluded objects as visible (false positives, which cause harmless extra rendering), but it will never incorrectly cull a truly visible object (no false negatives, which would cause objects to pop out of existence). This conservatism is by design.
A Simple Raycast Occlusion Test
For smaller scenes or prototypes where a full Hi-Z pipeline is overkill, a center-point raycast test provides a reasonable approximation with minimal code. For each object, cast a ray from the camera to the object's center. If another solid object intercepts the ray first, the object is considered occluded.
function cullOccludedObjects(camera, objects) {
const raycaster = new THREE.Raycaster();
raycaster.near = 0.5;
const visible = [];
for (const obj of objects) {
const toObj = new THREE.Vector3()
.subVectors(obj.position, camera.position);
const dist = toObj.length();
const dir = toObj.normalize();
raycaster.set(camera.position, dir);
const hits = raycaster.intersectObjects(objects);
// If the closest hit is not our object itself,
// something else is blocking the line of sight
const occluded = hits.length > 0 && hits[0].object !== obj;
if (!occluded) visible.push(obj);
}
return visible;
}This runs in $O(n^2)$ time, testing each of the $n$ objects against all $n$ objects. For production use, pair it with a Bounding Volume Hierarchy (BVH) to reduce the inner loop to $O(\log n)$, or prefilter the occluder set to only large, solid objects and test against those instead of the full scene.
Implementation Considerations
Occluder Quality and Selection
Not every object makes a good occluder. Effective occluders are large, roughly convex, and opaque. A 200-triangle low-poly building proxy is a better occluder than the 12,000-triangle final asset: it's faster to rasterize into the depth prepass and provides essentially the same occlusion coverage. Detail geometry like railings, cables, and foliage makes poor occluders and should be excluded from the occluder set entirely.
Temporal Reprojection
Computing a fresh depth prepass every frame has a cost. A common optimization is temporal reprojection: reuse the previous frame's depth buffer, reprojected into the current frame's camera view using the camera's motion matrix. Objects visible last frame are likely still visible this frame, and the reprojection introduces at most one frame of latency. Unreal Engine's HDRP occlusion culling and Unity's GPU Occlusion both use this approach.
Two-Phase Rendering Architecture
Most production engines split rendering into explicit phases:
- Depth prepass: Render occluder geometry to a low-res depth buffer only (fast, no color writes, no fragment shading)
- Occlusion test: Test all candidate occludees against the depth pyramid; build the draw list of surviving visible objects
- Main render pass: Render only surviving geometry with full shading, lighting, and materials
The depth prepass typically adds only 5–10% overhead, while the main pass in occluded scenes runs with 60–80% fewer draw calls and triangles.
Conservative vs. Exact Rasterization
Standard GPU rasterization is not conservative: pixels on triangle edges may be missed due to floating-point discretization. For an occluder depth prepass, missing a pixel means potentially showing geometry that should be hidden (a false positive, which is harmless). For the occludee bounding box test, however, you need the projected rectangle to cover every pixel it theoretically touches. Many engines slightly expand the projected rectangle in screen space before testing to guarantee this, preventing false negatives at the cost of a small increase in false positives.
Real-World Examples
Quake (1996) pioneered PVS occlusion with its BSP tree and leaf visibility algorithm. The world was divided into convex BSP leaves, and the offline VIS compiler spent minutes computing which leaves could see each other. At runtime, rendering was limited to the current leaf's PVS, typically 10–20% of the total map geometry, making complex indoor levels feasible on Pentium hardware.
Unreal Engine 5 combines hardware occlusion queries with its Nanite virtualized geometry system. For non-Nanite meshes, UE5 tests against a hierarchical depth buffer reprojected from the previous frame. For Nanite geometry, occlusion operates at the cluster level: groups of 128 triangles are individually tested, enabling fine-grained culling within even a single large mesh.
Minecraft uses a chunk-based visibility flood-fill. Starting from the chunk containing the player, the engine recursively determines which “sides” of each 16×16×16 chunk can see each other (whether you can traverse from one face to the opposite face without hitting opaque blocks). This efficiently skips entire sections of the voxel world without ray testing.
Star Citizen relies heavily on Intel's Masked Software Occlusion Culling library for its massive space station interiors. By rasterizing occluder triangles on the CPU with AVX2 SIMD, it tests thousands of potential occludees per millisecond with no GPU round-trip latency, which matters when the CPU needs to make culling decisions before issuing draw calls.
The Source Engine (Half-Life 2, Portal) uses manually-placed area portal entities in level geometry. When a portal is flagged as closed (a door shuts, a window is blocked), rendering stops propagating through it. Level designers use these to explicitly guide the engine's visibility. It's less automatic than Hi-Z but gives artists precise control over performance.
Summary
Occlusion culling is among the most effective rendering optimizations for interior environments, dense cities, caves, and scenes with high depth complexity. The core idea is always the same: identify geometry the camera cannot see, and don't draw it. Whether implemented through hardware queries, precomputed PVS, portal graphs, or software depth buffers, the technique can reduce GPU workload by an order of magnitude, making scenes with millions of triangles viable in real time.
In the interactive demo at the top of this article, you can see occlusion culling in action. Drag to orbit the camera around the city, and watch which buildings are marked visible (blue) versus occluded by other buildings (red). Toggle culling on to actually hide the occluded objects and watch the visible count drop, exactly what the GPU would benefit from in a real game.
Comments