behavior trees feel like overkill for most game enemy AI — anyone else just using utility scoring?

397 views 11 replies

okay so I've been building out enemy AI for a wave-based action game in Godot 4 and went through the whole "should I implement a proper behavior tree" phase. watched the videos, started writing node base classes and composite nodes, and after a week realized I was writing a framework instead of a game.

ended up pivoting to a simple utility scorer and honestly it's been way more enjoyable to tune. each action has an evaluate() function returning a 0–1 weight based on game state, and the AI just picks the highest scorer each tick:

class_name UtilityAI extends Node

var actions: Array[AIAction] = []

func tick(context: AIContext) -> void:
    var best_action: AIAction = null
    var best_score := 0.0

    for action in actions:
        if not action.can_execute(context):
            continue
        var score := action.evaluate(context)
        if score > best_score:
            best_score = score
            best_action = action

    if best_action and best_score > 0.0:
        best_action.execute(context)

each AIAction is a Resource subclass. the scoring functions are where enemy personality comes from. an aggressive enemy weights attack actions higher when in range, a skittish one cranks retreat when health drops below a threshold. tuning feels like adjusting sliders instead of debugging composite node traversal logic.

where it breaks down: sequenced behavior. "patrol → investigate sound → engage" has ordering constraints that utility scoring doesn't model naturally. I ended up adding cooldown tracking per-action to prevent thrashing, and at that point I started wondering if I was just reinventing BTs from the bottom up anyway.

I know Limbo AI is the go-to Godot BT plugin and people seem to love it, but for simpler enemy archetypes does anyone else find utility AI just... gets there faster? or is there a complexity threshold where BTs become clearly the right call and I'm just not there yet?

utility AI has been my default for a few years and the tuning argument is the real killer feature. when a designer says "enemies feel too passive in the second half", with a BT you're probably restructuring subtrees. with utility scoring you adjust a weight or tweak a response curve. the feedback loop difference is night and day.

 night and day comparison slider

the one place i'll genuinely give BTs credit: sequential scripted behavior. if a boss needs to do A, then B, then C with conditional fallbacks, utility scoring doesn't model that naturally. my current setup is a hybrid: utility scoring for general behavior, a small state machine just for scripted attack sequences. clean separation, works well in practice.

honestly utility scoring wins for basically everything i've shipped. the only time i've seriously considered a full BT was when i needed a designer-readable visual representation, at which point i was in UE5 and using their built-in editor anyway, not writing one in code.

writing BT nodes in code gives you the structural overhead without the visibility benefit. a weighted scorer is just a list of functions. adding a new behavior is one new function. no restructuring the whole tree, no figuring out where a new node slots in.

the one real edge case is strictly sequenced multi-step behaviors: patrol, then investigate, then attack, then retreat in that order. but i usually handle that with a state machine and utility scoring within each state. gets you 90% of what a BT offers with way less ceremony.

rube goldberg machine simple task
Replying to ObsidianWren: The designer-readable argument is one I've made myself, but it tends to fall apa...

The 15–20 node threshold tracks with my experience. UE5's BT editor at least has subtree collapse which buys you a bit more headroom before it becomes unnavigable. But past a certain point the tooling can only do so much. You're still managing a sprawling graph, and the "designers can read this" argument quietly dies.

I showed a 60-node behavior tree to a designer once and they looked at me like I'd handed them a circuit schematic. Just silently pushed it back across the desk. We switched to utility scoring the following sprint.

pushing paper back across desk slowly
Replying to NovaRunner: GOAP is genuinely useful when you need it, but worth flagging the planning cost ...

hit this exact wall. 22 actions, ~15 boolean world state flags, 8 enemies replanning every frame. 3ms spike on its own just from the planners. switched to event-triggered replanning (only replan when world state actually changes) and that bought a lot back, but then you get stale plans when something shifts subtly without firing an event. ended up being more total complexity than just a well-tuned behavior tree would have been for that project. GOAP genuinely earns its keep when the planning space is open-ended and emergent behavior matters. for "enemy with a fixed combat moveset" it's probably not pulling its weight.

Replying to VelvetFlare: utility AI has been my default for a few years and the tuning argument is the re...

the scoring function itself is where it gets hairy for me too. linear weighted sums are easy to reason about but terrible at modeling "only do X if A and B and C are all true." you end up multiplying terms together and then one factor near zero silently collapses the whole score to nothing and the action just never fires and you have no idea why.

what actually helped: hard precondition filters that gate actions out of consideration entirely, then weighted scoring only on what's left. debugging is so much cleaner because you can log "this action was filtered out" vs "this action scored 0.03 and lost." completely different problems that need completely different fixes.

Replying to PixelPulse: honestly utility scoring wins for basically everything i've shipped. the only ti...

The designer-readable argument is one I've made myself, but it tends to fall apart in practice. Behavior trees are readable at around 15–20 nodes. Past that you're navigating a sprawling graph that even the person who built it needs 10 minutes to re-orient in. I've handed BT visualizations to non-programmers and watched them bounce off immediately once the tree was large enough to represent anything real.

What's actually worked better for "designer-readable" on my projects: a simple data config where each action has named weights and labeled curve presets. A designer can tune values without knowing what's underneath. Less visual, but way more maintainable, and honestly easier to explain and hand off.

Surprised nobody's mentioned GOAP in this thread. For AI that needs to plan sequences (enemy decides to take cover, then reload, then flank), utility scoring alone gets awkward because you end up encoding temporal dependencies into your scores, adding penalties for actions that don't chain well with whatever came before. GOAP handles that naturally through preconditions and effects without baking the chaining logic into the scores themselves.

That said, GOAP has its own tuning surface. Getting the action graph modeled correctly takes real iteration, and if you're not careful about search depth the planner can spike at the wrong moment. My current setup: utility scoring for frame-to-frame decisions, a small GOAP planner with about 8 actions for mid-level goal selection. More code than pure utility, but the enemies stopped feeling purely reactive and started feeling like they were actually working toward something.

Replying to CosmicVale: Surprised nobody's mentioned GOAP in this thread. For AI that needs to plan sequ...

GOAP is genuinely useful when you need it, but worth flagging the planning cost before committing. Once your action set grows past ~15 entries and your world state representation gets complex, replanning frequency becomes a real concern, especially for enemies that need to react quickly to changing situations. F.E.A.R.'s original implementation worked partly because they kept the action set deliberately small and the world state representation lean. That constraint is easy to violate as a project grows.

For a wave-based action game like OP's, I'd still lean toward utility scoring. GOAP makes sense when enemies need unified multi-step plans — take cover, reload, then flank as a single reasoned sequence. If your enemies don't need that level of sequential reasoning, you're adding infrastructure you'll probably never fully use.

Replying to NeonThorn: the scoring function itself is where it gets hairy for me too. linear weighted s...

yeah this is exactly what pushed me toward response curves. once you're modeling "only attack if health > 50% AND target in range AND cooldown ready," a plain weighted sum breaks down: a single high-weight factor can dominate even when the others are zero. response curves let each input map to a 0–1 score nonlinearly, so you tune the shape of each consideration independently. then you multiply them for the hard-AND case rather than summing, which naturally zeros out if any single factor fails.

Dave Mark's GDC talk "Architecture Tricks: Managing Behaviors in Time, Space, and Depth" covers this really well. old talk but the curve evaluation stuff is still the clearest explanation i've found for why you'd do it this way rather than fighting weights.

One thing worth adding here: the choice between behavior trees and utility scoring tends to get framed as all-or-nothing when in practice you're probably going to layer them anyway, especially once you have more than one enemy archetype.

My current setup: each enemy has a small state machine with maybe four states (patrol, alerted, combat, retreat) that handles mode transitions. Utility scoring runs entirely inside the combat state to pick moment-to-moment actions. The state machine is trivial to read and debug. The utility scorer handles the nuance that flat state machines are bad at.

Behavior trees are actually decent for the outer layer specifically. The "abort if condition fails" decorator pattern maps well to "stop fighting and flee when health drops below 20%." The problem is when people reach for them for the inner decision loop too, which is where they get unwieldy. Utility scoring is great at "what's the best action right now" but awkward at "when should I fundamentally change what I'm doing." Flip those responsibilities and both tools stay manageable.

I ended up settling on a hybrid that works well for me: utility scoring handles all the moment-to-moment decisions (who to target, when to dodge, whether to reposition) and a small behavior tree sits on top purely for high-level phase transitions. Combat start, aggro state, desperation phase at low health. The BT never has more than 8–10 nodes. It's basically a selector over major states, each of which hands off entirely to the utility scorer.

You get the readability of "here are the phases this enemy can be in" without the BT needing to know anything about how decisions happen inside each phase. At that scale it's almost more documentation than logic. Feels like the right split of responsibilities for most cases.

Moonjump
Forum Search Shader Sandbox
Sign In Register