does anyone actually unit test game logic or do we all just playtest and pray

150 views 10 replies

I've been trying to get more disciplined about testing my game code and it's... rough. Like obviously you can unit test pure utility functions: damage calculations, stat modifiers, save file serialization. Those are easy and I actually test those without much trouble.

But anything that touches nodes, the scene tree, or engine timing falls apart immediately. How do you even test that a state machine transitions correctly when it depends on Area2D enter/exit signals? Or that a dialogue runner advances properly when input is pressed? You'd need basically a full engine instance running just to assert one thing.

I've been using GUT (Godot Unit Testing) and it helps for pure logic classes, but anything with signals gets messy fast. What I've settled on is a pretty aggressive split: pure logic with zero Node dependencies, then thin adapter classes that wire them to the engine. At least the logic layer is testable. And honestly the constraint is kind of useful. It forces decoupling you probably should've had anyway.

Unity people seem to have it slightly better with the Test Runner and the EditMode/PlayMode separation. But from what I've seen it still breaks down the moment you touch physics or coroutine timing in any real way.

Genuinely curious what people are actually doing here. Are you writing tests at all? Where do you draw the line between what's worth testing and what you just playtest? Or is game code fundamentally resistant to unit testing and we're all just coping when we call our architecture "testable"?

Replying to ByteLark: Been using GUT (Godot Unit Test) for about a year. Honest answer: yes, but only ...

same here with GUT. one thing that bit me early: assert_almost_eq for floats uses a default epsilon that's way too loose if you're testing anything with fractional damage multipliers. had a rounding bug in a status effect that was silently passing every test because the error fell within the default tolerance. worth setting a tighter epsilon explicitly any time your calculations involve decimals at all.

unit test passing hiding bug

Been using GUT (Godot Unit Test) for about a year. Honest answer: yes, but only for the things that actually deserve it. Damage formulas, buff stacking, inventory math, save data serialization. Pure functions with well-defined inputs and outputs. Easy to test, has caught real regressions more than once.

Where it falls apart is anything that needs a live scene tree. Testing a state machine coupled to an AnimationPlayer is annoying enough that I usually just don't. The real fix is pushing that logic out of nodes into plain RefCounted classes that don't extend Node, then testing those directly, but that requires the discipline to design it that way from the start, which I don't always have when I'm moving fast.

The rule I've landed on: write tests for anything with a formula or a data contract. Skip tests for anything fundamentally about node lifecycle or rendered output. And write at least one smoke test that boots a stripped-down scene and runs a critical player flow, not for coverage, just to catch "crashes on startup" regressions before they reach anyone else.

the thing that finally made testing actually stick for me was separating game logic from game context. once damage math, inventory queries, and save serialization are plain functions with zero node dependencies, GUT tests basically write themselves. the stuff that's genuinely hard to test (scene trees, physics, signals) usually doesn't need unit tests anyway, those are integration concerns. honestly if a "logic" class requires a Node to exist just to run, that's a design problem before it's a testing problem.

enlightenment brain expanding meme
Replying to PulseSage: the thing that finally made testing actually stick for me was separating game lo...

Wish someone had told me this plainly when I started. Once business logic genuinely has no dependency on SceneTree or Node lifecycle, testing really is just calling functions. I started treating anything that touches get_node() or awaits signals as "integration territory", not untestable, but tested differently, usually through a minimal headless scene rather than a pure unit test suite.

The hard habit is not reaching for the tree from inside logic classes. Feels overly restrictive at first, but once it sticks, it pays off in more than just testability. The logic is easier to reason about on its own, which helps every time you're debugging something at 2am with no idea why a system is misbehaving.

The framing that finally made testing click for me: stop trying to test systems, start testing decisions. I don't test "does the enemy AI state machine work". That's an integration concern with too much setup overhead. I test "given these inputs, does choose_action() return the right output."

Anything that takes game state as parameters and returns a deterministic result is a clean unit test candidate: damage formula, loot roll, pathfinding cost function, serialization round-trip. Anything that is game state (nodes, signals, scene trees) belongs in integration or playtest territory. Once you draw that line, what's actually testable becomes obvious and setup stops feeling like more work than it's worth.

Replying to NebulaLattice: What finally made this work for me was thinking in three buckets. Pure logic (da...

The three-bucket framing is solid and basically where I've landed too. One thing I'd add that made the suite actually sustainable: explicitly mark which tests are blocking (must be green before a build ships) versus informational (useful signal, not a hard gate). If everything blocks, a single flaky scene-instantiation test becomes a morale drain and people start skipping the whole suite. If nothing blocks, it drifts toward being ignored entirely.

I run pure logic tests as CI blockers and keep integration/scene tests as a separate manual review pass. The flaky tests don't stall the build, but they're still there when a weird bug shows up and you need them.

Replying to BlazeFlare: same here with GUT. one thing that bit me early: assert_almost_eq for floats use...
lmao yes, lost twenty minutes to this testing a multi-hit crit multiplier chain. ended up writing a tiny wrapper:
func assert_near(a: float, b: float, msg := "") -> void:
    assert_true(absf(a - b) < 0.0001, msg if msg else "%.6f != %.6f" % [a, b])
the default epsilon is embarrassingly loose for anything involving stacked multipliers. math is hard calculator exploding

What finally made this work for me was thinking in three buckets. Pure logic (damage formulas, inventory math, save serialization, anything with zero Node dependency) gets unit tests with GUT. It's actually easy because there's nothing to mock. Multi-system integration behaviors get a dedicated headless test scene where you can step through real game flow. Gameplay feel and timing, whether something actually feels right, that's playtesting.

The failure mode is trying to run everything through the same tool. Unit testing AI state transitions with mocked signals produces tests that are more brittle than the code they cover. Trying to playtest save file migration is its own nightmare. Match the approach to what you're validating and the whole thing becomes a lot less miserable.

the thing that surprised me going down the GUT rabbit hole: scene-instantiated tests (where you actually spawn a node, wire signals, and fire real events) catch a completely different class of bugs than pure logic tests. you can have all green unit tests and still have a system fall apart in a scene context because a signal fires in the wrong order or a node isn't in the tree yet. i try to keep at least one integration-ish test per major system even if it's painful to set up. pure tests are fast and stable, scene tests are slow and gross but they catch the real ones. developer praying before running tests
Replying to FrostPulse: The framing that finally made testing click for me: stop trying to test systems,...
The "test decisions, not systems" framing is really useful. I've started applying a similar heuristic: if I can't express the expected output as a simple assertion (given this input, return this value), then what I'm testing is probably too entangled to unit-test and needs to be broken apart first. The pressure toward testability ends up shaping the design in a good direction. Cleaner boundaries as a side effect, not the goal.
Moonjump
Forum Search Shader Sandbox
Sign In Register