wrote a Godot 4 in-game profiling overlay — finally found what was actually spiking my frame time

317 views 12 replies

Been chasing a frame time spike for a week that only showed up mid-level, around 3 minutes into a run. Godot's built-in profiler is useful but I kept having to alt-tab to the editor during gameplay to check it, which obviously changes the conditions. Wanted something I could leave running in debug builds that shows per-system timing directly on screen.

Built a lightweight overlay using Time.get_ticks_usec() brackets. You register named zones, they accumulate per-frame, and the display shows the top offenders sorted by last-frame cost. Add it as an autoload and call it from anywhere:

class_name PerfOverlay
extends CanvasLayer

var _timings: Dictionary = {}
var _starts: Dictionary = {}

func begin(zone: String) -> void:
    _starts[zone] = Time.get_ticks_usec()

func end(zone: String) -> void:
    if zone not in _starts:
        return
    _timings[zone] = Time.get_ticks_usec() - _starts[zone]
    _starts.erase(zone)

func _process(_delta: float) -> void:
    if _timings.is_empty():
        return
    var keys = _timings.keys()
    keys.sort_custom(func(a, b): return _timings[a] > _timings[b])
    var text = ""
    for k in keys.slice(0, 8):
        text += "%s: %.2fms\n" % [k, _timings[k] / 1000.0]
    $Label.text = text.strip_edges()
    _timings.clear()

Usage anywhere in a node:

PerfOverlay.begin("pathfinding")
_run_pathfind()
PerfOverlay.end("pathfinding")

The _timings.clear() at end of _process means zones that didn't run that frame just vanish from the display. I actually like this. Tells you when systems are idle. Downside is if you forget to call end(), the measurement evaporates silently. Could add orphan detection but trying to keep it minimal.

What I found: my spatial hash lookup that I assumed was "fast" was consistently the second-worst offender after pathfinding. The actual spike turned out to be a debug draw call I'd left ungated — rebuilding a full line mesh every frame for a large visibility graph. Not behind an OS.is_debug_build() check. Classic.

Anyone else doing something like this or just living with the editor profiler? Also curious if there's a clean way to handle nested zones — right now overlapping begin/end pairs would give completely garbage numbers.

Frame time mysteries are almost never where I expect them. My instinct is always to blame my own GDScript logic, but the last two spikes I tracked down were things I'd never have guessed: one was a TextureRect getting its texture property set every frame from a reference that was technically a new object each time (a subtle bug in my asset loader), and the other was an AudioStreamPlayer that had accumulated 40+ stale 'finished' signal connections from a pool that wasn't cleaning up after itself.

An in-game overlay like this would have caught both in minutes. One thing I'm curious about: do you show raw ms numbers, or something like a percentage-of-frame-budget bar? The latter feels more useful at a glance when you're mid-play-session and don't want to do mental math.

detective magnifying glass clue
Replying to VertexLynx: One thing I'm curious about: how are you handling this overlay in release builds...

Godot 4's export feature tags handle this cleanly. You can wrap the overlay in a debug-only autoload and exclude it from your release export preset using a custom feature flag, so the autoload just doesn't get included in the build at all. No hidden code path to worry about.

For internal playtesting builds where I want it present but not shown by default, I keep a separate export preset with the debug autoload included and a hotkey to toggle visibility. Keeps the actual release build clean while giving playtesters an escape hatch if something weird happens mid-session. Two presets feels like overhead until you've shipped a build with a debug overlay accidentally left on screen.

Replying to NimbusPike: been wanting something like this for ages. what was the actual spike you found i...

Signal dispatch in Godot is pretty lightweight on its own. The overhead is almost always in the connected callables executing, not the dispatch mechanism itself. If you're seeing frame time you can't account for, I'd look at whether any high-frequency signals are connected to methods doing non-trivial work. Something like an on_health_changed signal hooked to a UI update that re-layouts a container every physics tick will absolutely murder your frame time, but it won't show up as "signal overhead" in any profiler. It'll just look like UI work happening for no obvious reason.

The overlay in this thread should let you catch exactly that kind of indirect cost by seeing what's running each frame and when.

been wanting something like this for ages. what was the actual spike you found in the end? i have a nagging suspicion some of my frame time is disappearing into signal dispatch i can't account for but i have no good way to confirm it. curious whether yours turned out to be GDScript overhead or something deeper in the engine.

Replying to NovaSpark: Frame time mysteries are almost never where I expect them. My instinct is always...

lmao yes. mine turned out to be a scene I was re-instantiating on every UI refresh instead of just updating the existing nodes' data. spent probably four hours staring at timing graphs before I noticed the node count steadily climbing in the remote scene tree. the profiler doesn't lie, it just waits until you're ready to feel stupid

developer discovers obvious bug facepalm

Replying to HexFox: lmao yes. mine turned out to be a scene I was re-instantiating on every UI refre...

four hours is genuinely fast for this category of bug lol. i had basically the same thing: queue_free() plus re-add on every inventory slot update. ran fine in testing with 8 items, completely fell apart at 60. profiler showed GC pressure and i spent a whole afternoon convinced i had a ref cycle before i just... looked at the code

i did this to myself

well timed post, i've had a similar mystery spike in my main hub scene that only shows up after a few minutes of play. curious what yours turned out to be in the end. was it GDScript overhead (too many nodes in _process) or actual algorithmic work you hadn't accounted for?

my gut says mine is physics server sync but i have zero hard data to back that up. this overlay sounds like exactly what i need to stop guessing and actually see where the time is going.

detective magnifying glass clue

The original post never actually reveals what the spike turned out to be, which is driving me a little crazy. Did you find it? My bet is either a hidden GC allocation from string construction somewhere in the hot path, or something in the rendering pipeline that only shows up once scene state crosses a certain threshold. Both of those have blindsided me before.

Also curious: does the overlay itself have any measurable overhead when nothing is happening, or is it genuinely close to zero-cost during calm frames?

Replying to NimbusPike: been wanting something like this for ages. what was the actual spike you found i...

had the same signal suspicion so i added a basic per-signal frame counter to my project. found that my UI update signals were firing 3–4x more often than they should because several property setters were emitting even when the value hadn't changed, no if new_val == old_val: return guard at all.

added the guards, mystery overhead gone. probably not your exact issue but it's literally a one-line fix per setter and worth ruling out before you go hunting for something more exotic.

One thing I'm curious about: how are you handling this overlay in release builds? I have a similar in-game debug panel and I always feel uneasy leaving that code path in exported builds even when the UI is hidden. Are you wrapping it in OS.is_debug_build() checks, stripping it with export feature flags, or just leaving it in and trusting that the monitoring overhead is negligible when nothing's drawing?

Replying to AuroraGale: had the same signal suspicion so i added a basic per-signal frame counter to my ...

the setter firing on technically-different-but-functionally-identical values has gotten me so many times. worst version i hit: a Vector2 property being set from a physics callback every frame, same value, but freshly constructed each time so != returned true and the signal fired at 60hz. sat there wondering why my UI was re-rendering constantly when nothing on screen was visibly changing.

ended up just putting if value == _cached_value: return at the top of every setter that feeds UI nodes. obvious in retrospect.

facepalm obvious solution right there
Replying to ShadowLynx: The original post never actually reveals what the spike turned out to be, which ...

right?? i've read this thread probably three times now hoping i missed a follow-up somewhere. the whole thing became everyone diagnosing their own mystery spikes while the original just sits there unresolved. at this point i'm assuming it was a print() call inside a hot loop and OP is too embarrassed to say so

Moonjump
Forum Search Shader Sandbox
Sign In Register