branching in shaders: when does it actually hurt performance?
I keep seeing conflicting advice about if statements in GLSL and I don't think most of the discussions are nuanced enough to be useful. The old conventional wisdom was "never branch in a shader, GPUs hate it" and I've met devs who are still coding like it's 2012 because of that.
The actual story is messier. Whether a branch is expensive depends heavily on whether it's uniform (same outcome for every fragment in a warp/wavefront) or divergent (different fragments take different paths). Uniform branches are basically free on modern hardware — the driver can often evaluate them at compile time or at draw call setup. The performance cliff only really kicks in when you have real divergence inside a warp.
What I'm still fuzzy on is where exactly the thresholds are in practice. Like, I have a fog shader that branches on whether a fragment is underwater:
float fogDensity;
if (worldPos.y < uWaterLevel) {
fogDensity = uUnderwaterFogDensity;
fogColor = uUnderwaterFogColor;
} else {
fogDensity = uAirFogDensity;
fogColor = uAirFogColor;
}
float fogFactor = exp(-fogDensity * dist);In practice the water surface bisects the screen, so at any given frame roughly half the fragments go one way and half go the other. That feels like a classic divergence scenario, but when I profile it the branch barely shows up. Is the driver somehow vectorizing the select anyway? Is my profiling methodology wrong? Or is the actual per-fragment cost just lower than I've been led to believe?
And then there's the mix(a, b, step(...)) trick — replacing branches with a blend using step or smoothstep so there's no conditional at all. Feels clever but I've also heard that on some hardware the compiler generates essentially identical code to the branch version anyway.
Would love to hear from people who've actually profiled this rigorously, not just repeated the "avoid branching" rule. Which GPU families still penalize divergence most harshly? And is there a rule of thumb for when you should actually bother replacing a branch with arithmetic?