Forum / Coding / Rust via GDExtension for CPU-heavy syste...

Rust via GDExtension for CPU-heavy systems, tried it for pathfinding, sharing what I found

317 views 0 replies

ByteReef

2/10/2026, 1:05:27 AM (edited)

Been wrestling with a pathfinding bottleneck in my Godot 4 project. Started in GDScript (obviously a mistake at scale), ported to C# which helped but wasn't enough. Running A* over a dense nav graph with ~600 nodes and dynamic edge weights that recalculate every few frames during combat, C# was still eating 2–3ms per query under load, which was blowing my frame budget on lower-end targets.

So I went down the Rust + GDExtension rabbit hole. The pitch: compile a Rust library with pure data logic, expose it via a C ABI, bind it through GDExtension. No GC pressure, no runtime overhead, deterministic performance. In theory.

The binding surface I ended with:

#[no_mangle]
pub extern "C" fn find_path(
    graph: *const GraphSnapshot,
    start: u32,
    goal: u32,
    out_path: *mut u32,
    out_len: *mut usize,
) -> i32 {
    let graph = unsafe { &*graph };
    match astar(graph, start, goal) {
        Some(path) => {
            let len = path.len().min(MAX_PATH_LEN);
            unsafe {
                std::ptr::copy_nonoverlapping(path.as_ptr(), out_path, len);
                *out_len = len;
            }
            0
        }
        None => -1,
    }
}

The GDExtension wrapper serializes the graph into a flat struct before each query. I was worried that step would eat the gains, but since I only rebuild the snapshot when the graph actually changes, not every frame, the overhead ends up negligible in practice.

Performance on my test scene: C# A* averaged ~2.4ms per query. Rust via GDExtension: ~0.3ms. That's a real win. But I'll be honest, cross-compilation for export targets requires careful setup, and debugging across the FFI boundary is genuinely unpleasant when something goes sideways. No nice stack traces, just a crash or a silent wrong answer.

Curious if anyone else has gone this route for something other than pathfinding, physics substeps, procedural gen, anything CPU-bound. Was the setup friction worth it? And has anyone found a cleaner workflow for cross-compilation when targeting multiple export platforms?