Skip to content

fix(webgpu): fixes per-frame memory growth#140

Open
thejustinwalsh wants to merge 6 commits into
NativeScript:masterfrom
thejustinwalsh:fix/webgpu-frame-transient-native-leak
Open

fix(webgpu): fixes per-frame memory growth#140
thejustinwalsh wants to merge 6 commits into
NativeScript:masterfrom
thejustinwalsh:fix/webgpu-frame-transient-native-leak

Conversation

@thejustinwalsh
Copy link
Copy Markdown

@thejustinwalsh thejustinwalsh commented Jun 3, 2026

Fixes a slow steady memory growth leak discovered through an accidental soak test. I left the device running for several hours and came back to a low framerate. Ran heap snapshots in incremental waves sniping the offenders until heap growth was stable while rendering at 60-120fps.

Fixes #141.

I did not run an exhaustive search across all renderering implementations to verify memory leaks outside of WebGPU. This fix targets iOS and Android via the WebGPU rendering backend.

BEFORE AFTER
leak_before_baseline.mp4
leak_after_fixed.mp4

Copilot Summary

This pull request introduces significant improvements to resource management and memory safety in the WebGPU implementation, especially regarding swapchain textures, command buffers, and the C++/Rust interface on iOS. The main themes are: (1) explicit and timely resource cleanup in the JavaScript/TypeScript bindings, (2) improved handling of swapchain textures and views, and (3) a safer, RAII-based approach to managing Rust Arc handles in the iOS C++ layer.

Resource Management Improvements (JS/TS):

  • Explicitly release command encoders, command buffers, passes, and queue-submitted commands immediately after use, instead of relying on garbage collection. This reduces memory usage and prevents resource leaks. [1] [2] [3] [4]
  • Swapchain textures and their views are now tracked per frame and explicitly released at the end of each present cycle, ensuring they are not leaked and are returned to the swapchain promptly. [1] [2] [3] [4]

Native Resource Cleanup (Rust/C++/iOS):

  • Introduced an ArcHandle RAII wrapper in C++ for Rust Arc handles exposed via the C ABI, replacing manual release calls in destructors and reducing the risk of double-free or leaks. Updated all relevant iOS WebGPU wrapper classes to use this pattern. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

Swapchain Texture Lifecycle (Rust):

  • Ensured that command buffers and encoders are dropped promptly after presentation, and that swapchain textures are properly returned to the swapchain if not presented, further reducing the risk of resource leaks or undefined behavior. [1] [2]

Summary of Key Changes:

1. Resource Management and Cleanup (JS/TS):

  • Explicitly destroy or release command encoders, command buffers, compute/render passes, and queue-submitted commands immediately after use, clearing their native references. [1] [2] [3] [4]
  • Track swapchain textures and views per frame in GPUCanvasContext, and release them at presentSurface() to prevent leaks and ensure timely return to the swapchain. [1] [2] [3] [4]

2. Rust/C++/iOS Interop Safety:

3. Swapchain Texture Handling (Rust):

  • Ensure command buffers and encoders are dropped after use, and swapchain textures are returned to the swapchain if never presented, logging warnings if discard fails. [1] [2]

These changes collectively improve memory safety, performance, and reliability of the WebGPU implementation across platforms.

Continuous WebGPU rendering leaked native heap render-proportionally: each frame
mints single-use wrappers (swapchain texture view, command encoder, render/compute
pass, command buffer) whose native Arc handle is dropped only by the V8 GC finalizer,
which a tight requestAnimationFrame loop starves. global.gc() does not flush the
finalizer tasks; backgrounding the loop drains them and the heap collapses.

Release each transient deterministically at the WebGPU-spec operation that consumes
it, with the GC finalizer kept as a backstop:

  render/compute pass -> pass.end()
  command encoder      -> encoder.finish()
  command buffer       -> queue.submit()
  swapchain view       -> the owning GPUCanvasContext.presentSurface()

The native handle lives in ArcHandle (unique_ptr + stateless custom deleter, new
ArcHandle.h, with a MutArcHandle variant for non-const C ABI releases). reset()
releases the Arc once and nulls the pointer; ~unique_ptr (run by the GC finalizer via
ObjectWrapperImpl's virtual destructor) is a no-op when already null, so exactly-once
release is a type invariant. Hand-written destructors and manual null-guards are gone;
C ABI call sites pass the raw pointer via .get(). The Rust crate is unchanged.

The swapchain view is the only transient tied to a context, so each GPUCanvasContext
tracks its own views (swapchainContext_ stamped in getCurrentTexture, registered in
GPUTexture.createView) and releases them in presentSurface(): per-context, so multiple
canvases in one isolate never drain each other's in-flight views.

JS destroy() calls are optional-chained, safe ahead of the native rebuild. Verified:
the C++ compiles and links via the Android NDK toolchain (Gradle assembleRelease).
…ten dtors)

Convert the remaining 21 persistent WebGPU wrappers to hold their Rust Arc handle in
ArcHandle/MutArcHandle, the same primitive as the five transients. Each hand-written
destructor that called a raw canvas_native_webgpu_*_release is gone; the unique_ptr
member releases the handle once on destruction, so the GC path is unchanged in
behavior but the lifecycle now lives in the type. Accessors and C ABI call sites use
handle.get(); GPUImpl's null-guarded release and GPUCanvasContextImpl's release (which
keeps its raf_.reset() ordering) fold into the same model.

Behavior-preserving cleanup: these objects are not minted per frame and do not soak;
this removes duplicated release bookkeeping and makes all 26 WebGPU wrappers
consistent. Verified compiling via the Android NDK toolchain.
presentSurface copies the swapchain texture into the read-back texture every frame
(toDataURL support) using a command encoder + command buffer, but never dropped
them, so they accumulated in wgpu-core's registry render-proportionally — a
native-heap leak. Drop the command buffer after submit and the encoder after the
copy, mirroring gpu_queue.rs::queue_submit.

Confirmed on a Moto G 2025: symbolized heapprofd named present_surface as the top
net-retained allocator; the drops remove it (~0.31 -> ~0.18 MB/s under continuous
render).
getCurrentTexture() returns a fresh per-frame GPUTexture wrapper holding an Arc
clone of the surface texture; its only non-deterministic free path was the GC
finalizer, which a tight render loop starves. Track the per-context swapchain
textures and drop their native handle at presentSurface() (the texture's point of
death) via a JS-exposed __releaseHandle on GPUTextureImpl that decrements the Arc.

__releaseHandle is distinct from destroy(): destroy() frees the GPU texture and must
never run on the swapchain texture; __releaseHandle only drops our wrapper's handle.
Completes deterministic per-frame release for the swapchain path.
getCurrentTexture() registers a wgpu Texture (carrying an auto-created surface
clear_view) in wgpu-core's hub each frame. surface present()/discard() only release
the acquired-texture ref; the hub registry holds a second ref that must be dropped
explicitly. CanvasGPUTexture::drop only did this for the None (app-created) branch —
the Some (surface) branch was a no-op with its discard commented out, so the
per-frame swapchain texture and its clear image view leaked in the hub every frame.
Symbolized heapprofd named surface_get_current_texture (present.rs:220, ash
create_image_view) as the top remaining grower.

Drop the surface texture in the Some branch (and discard first if it was acquired
but never presented). The discard's error is logged rather than fatally aborted —
the original code used handle_error_fatal, which crashes the process from within
Drop and is almost certainly why this cleanup was commented out, trading a crash for
a leak.

Verified on a Moto G 2025: continuous-render foreground soak goes from ~0.18 MB/s to
+0.002 MB/s over 5 minutes (flat; matches the blank-app baseline). This was the last
of three native leaks (per-frame wrapper Arcs; read-back encoder/buffer; this).
Trim the explanatory comment blocks added with the leak fixes down to terse
one-liners matching the surrounding code. No behavior change.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7676745-5589-48f6-a846-2aa10523cbaf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thejustinwalsh thejustinwalsh marked this pull request as ready for review June 3, 2026 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WebGPU Unbounded Heap Growth (Soak Test)

1 participant