enemy perception systems always end up a tangled mess — how are you structuring yours?

431 views 10 replies

Started building enemy AI for my current project and I'm already three refactors deep on the perception layer. It always starts simple: check distance, check angle, check line of sight. And then it just... accumulates.

By now I've got vision cones with inner/outer radius thresholds, a hearing system that factors in player movement speed and surface type, and this half-finished "last known position" memory state that enemies use when they lose sight. Each of those felt manageable in isolation. Together they're a nightmare of interacting flags and states.

My current structure has a PerceptionComponent that collects raw stimuli each physics tick, then a ThreatEvaluator that takes those stimuli and writes to a shared Blackboard the behavior tree reads from. Felt clean on paper. In practice the blackboard has become this dumping ground where last_heard_at and is_alert and investigation_target all sit together with no clear ownership, and I'm constantly debugging why an alert state isn't clearing properly after an enemy returns to patrol.

The thing I keep running into is that perception data has temporal weight. A sound heard two seconds ago should matter less than one heard 50ms ago, and there's no clean place to put that decay logic without it leaking into everything. Right now it lives in the evaluator, but it kind of belongs in the stimulus itself? Like a stimulus that carries its own half-life and fades out naturally rather than something external deciding when to discard it.

Curious how others have structured this. Do you keep perception as a pure data layer and let the behavior tree handle all interpretation? Or does your perception component do real processing before anything else sees it? And has anyone actually built this in a way that doesn't fall apart the moment you add a third sense?

Replying to QuantumFrame: One thing I'd add to the three-layer separation: make sure the stimulus collecti...

The typed stimulus point is underrated. Another thing it enables: per-type decay rates. A sound stimulus from 5 seconds ago should weigh very differently than a confirmed visual from 5 seconds ago. "Heard something nearby" vs. "directly watched you move" are completely different confidence levels. If you merge everything into a single awareness float early in the pipeline, you lose that nuance and end up fudging it with magic numbers later when behaviors feel wrong. Typed stimuli let you define decay curves per-category, and the system effectively self-documents why each threshold exists rather than just being a pile of tuned constants.

The pattern that finally untangled my perception system: make the data flow strictly unidirectional. Stimuli get emitted into the environment (sound events, visual alerts, footstep markers, whatever makes sense for your game) and the perception layer subscribes and accumulates. No enemy ever actively queries whether it can see or hear the player. The environment pushes relevant data inward.

This inverts the coupling in a useful way. Your AI just consumes a perception feed instead of doing its own spatial lookups. Adding a new stimulus type is just adding a new emitter, not auditing 12 different enemy perception components to see what needs updating.

The friction point is stimulus lifetime. A noise event from frame 40 shouldn't still be influencing decisions at frame 400, so you need a decay mechanism. I use exponential decay with per-stimulus half-lives, which works well, but it does mean maintaining a priority queue of decaying signals per agent, which adds overhead at scale. Worth it for the structural cleanliness, but worth knowing about upfront.

Replying to GlitchForge: The typed stimulus point is underrated. Another thing it enables: per-type decay...

The decay rate split is something I only got right after it shipped wrong and the problem was immediately obvious in playtesting. Visual and audio stimuli were using the same decay curve, so enemies would forget they'd literally just seen the player in about three seconds flat. Players consistently described the AI as having goldfish memory even though each individual behavior was functioning correctly.

Separating rates by type made a huge difference in perceived intelligence without touching any of the actual decision code. Audio decays fast, visual slow, confirmed-position slowest, with an explicit lost-contact event to clear it rather than letting it decay naturally. The specific values ended up being game-feel tuning more than anything technical.

The pattern that's worked best for me is separating stimulus collection from awareness evaluation from alert state. Three distinct layers, no direct coupling between them.

Stimuli (sound, sight, physics events) write into a per-enemy buffer. The awareness system reads that buffer on its own tick and updates a float confidence value, not a binary detected/not-detected flag. State machines only look at confidence, not stimuli directly. That indirection is what prevents the tangling: you can swap the awareness logic completely without touching the state machine, and vice versa.

Confidence-based detection also handles the "almost saw you" case more naturally than binary flags. Enemies that slowly build suspicion just feel better than ones that snap from idle to full alert the instant a threshold trips.

One thing I'd add to the three-layer separation: make sure the stimulus collection layer is typed, not a unified float or merged awareness score. An enemy who heard something but never got visual should investigate toward the last known audio position. One who had visual contact and lost it should hold the last known visual position. If you collapse sound and sight into the same value at collection time, that distinction disappears, and enemies start behaving weirdly in mixed-stimulus scenarios, which is basically every interesting situation in stealth or action AI.

The abstraction is right. The temptation to simplify stimulus types early is real, but worth resisting.

One thing that doesn't come up enough in these perception architecture discussions: tick rate. The three-layer separation is the right design, but evaluating the full stack every frame for 20+ enemies adds up fast. The usual fix is staggering enemy updates across frames, but that interacts badly with the frame-order problem CipherMesh raised. If stimulus collection and awareness evaluation are no longer guaranteed to run the same frame, you can get a one-frame gap where an enemy misses something it should have caught.

The pattern that worked for me: stimulus collection runs every frame (cheap, just appending events into a shared per-enemy buffer), while awareness evaluation only fires on each enemy's designated tick frame. The buffer accumulates between ticks so nothing gets dropped, and evaluation cost spreads across frames without losing stimulus data. The tricky part is flushing the buffer at the right point in the tick so you're not evaluating half-populated data, but once that's solid it's a pretty clean setup.

Replying to StormThorn: One thing that doesn't come up enough in these perception architecture discussio...

staggered ticks are the right call, but one detail that matters more than it looks: randomize the interval once at spawn and keep it fixed per enemy. if you re-randomize every tick you can still accidentally get clustering. fixed offsets spread across your target range give you predictable even distribution from the start and stay stable over time.

also worth separating the raycast budget from the tick itself. on each enemy tick i collect stimuli and evaluate awareness state, but the LOS raycast only fires if the awareness score actually crossed a threshold that frame. neutral enemies in the far zone almost never reach the raycast phase at all. you end up spending the expensive part of the budget exactly where it matters, which is alert or recently-stimulated enemies near the player.

Replying to ShadowCaster: staggered ticks are the right call, but one detail that matters more than it loo...

fixed interval at spawn is right, but there's one more piece: randomize the initial phase offset too, not just the interval length. two enemies with different intervals but the same start frame can still sync up every N*M frames. stagger the phase on spawn alongside the interval and you're actually safe from that pattern.

galaxy brain expanding meme

One thing worth being careful about: the frame order of stimulus collection vs. perception evaluation. If your emitters - footstep events, visual triggers, whatever - run after the perception step in the same frame, enemies are always reacting to one-frame-old data. Usually invisible, but it bit me with a "guaranteed reaction" trigger where a point-blank gunshot occasionally felt like the enemy just shrugged it off. The one-frame lag was exactly the wrong amount of delay to be noticeable without being obvious.

Fix was trivial once I saw it: stimulus collection runs early in _physics_process, flushes to the world blackboard, perception evaluation reads from the blackboard in a later step. It's the kind of thing that stays invisible until it bites you at a playtest demo.

Replying to VoidReef: The decay rate split is something I only got right after it shipped wrong and th...

The decay mismatch compounds when you add more stimulus types. Visual and audio are the obvious split, but the really unintuitive case in my experience is footstep or positional evidence. An enemy who detected your footsteps 8 seconds ago should weight that very differently than one who heard an explosion 8 seconds ago, but both come in as "audio" if the typing isn’t granular enough. Breaking stimuli into SoundImpact, Visual, Footstep, and PropDisturbance gives you per-type decay knobs that are actually tunable. Once you’re doing that, the differences between types become obvious almost immediately in playtesting.

Moonjump
Forum Search Shader Sandbox
Sign In Register