Skip to content

Optimize camera preview with native macOS GPU path#1909

Merged
richiemcilroy merged 24 commits into
mainfrom
camera-optimisations
Jun 10, 2026
Merged

Optimize camera preview with native macOS GPU path#1909
richiemcilroy merged 24 commits into
mainfrom
camera-optimisations

Conversation

@richiemcilroy

@richiemcilroy richiemcilroy commented Jun 10, 2026

Copy link
Copy Markdown
Member

Adds a zero-copy IOSurface→GPU native preview on macOS (now the default) and speeds up the WebSocket fallback with leaner frame pipelining, smarter camera feed routing, and improved format selection.

Greptile Summary

This PR introduces a zero-copy IOSurface→Metal→wgpu native camera preview for macOS (now the default) and overhauls the WebSocket fallback path with leaner frame pipelining, per-connection stats, subscriber-aware idle skipping, and correct full-range NV12 colour handling across the GPU, worker, and Canvas2D code paths.

  • Native GPU path (camera_native.rs, iosurface_texture.rs): new NativeFrameConverter imports IOSurface planes as Metal textures and converts YUV→RGB in a single GPU pass with correct BT.601 coefficients for both video-range (420v) and full-range (420f) NV12 as well as BGRA cameras.
  • WebSocket path (camera_legacy.rs, frame_ws.rs): switches from broadcast to watch channel, adds a frame pool to avoid per-frame allocations, skips all conversion work when subscriber_count == 0, and propagates CameraPreviewState for dimension-aware scaler sizing.
  • Frontend (socket.ts, frame-worker.ts, webgpu-renderer.ts, camera.tsx, target-select-overlay.tsx): adds NV12_FULL_FORMAT_MAGIC throughout, fixes full-range coefficients in all three rendering fallbacks (WebGPU, worker, Canvas2D), lazily creates the OffscreenCanvas worker, and delegates frame rendering to CanvasControls in both camera windows.

Confidence Score: 5/5

Safe to merge — all three issues from the previous review have been addressed and no new functional defects were found in this re-review.

The three previously flagged issues are fully resolved: the Linux overload now carries _ffmpeg_sender_count; per-connection WsFrameStats replaces the module-level atomics; and full-range NV12 coefficients are correctly applied in every rendering path (GPU shader, worker, and Canvas2D fallback). The new native GPU path, frame pooling, subscriber-count skip optimisation, format selection, and frontend refactor all look technically sound.

No files require special attention.

Important Files Changed

Filename Overview
crates/recording/src/feeds/camera.rs Adds ffmpeg_sender_count tracking to skip FFmpeg frame conversion when no consumers exist; fixes Linux build by adding _ffmpeg_sender_count to the Linux overload; adds disconnected-sender pruning in send_frame_to_camera_senders; spawns a reaper thread when release_camera_thread times out
apps/desktop/src-tauri/src/camera_native.rs New file: implements zero-copy IOSurface to Metal to wgpu GPU path for camera preview on macOS; handles NV12 (video-range + full-range) and BGRA formats with correct per-format WGSL shaders; NV12 UV plane correctly halved in create_uv_texture
apps/desktop/src-tauri/src/camera_legacy.rs Significant refactor: switches frame_tx from broadcast to watch channel, adds subscriber_count to skip processing when no WS clients, adds frame pooling and pack_rows for zero-copy buffer reuse, receives full CameraPreviewState for dimension-aware scaling
apps/desktop/src-tauri/src/frame_ws.rs Moves stats from module-level atomics to per-connection WsFrameStats struct; adds NV12_FULL_FORMAT_MAGIC for full-range NV12; adds subscriber count tracking via SubscriberCountGuard RAII; switches to watch::Receiver-based frame distribution
apps/desktop/src/utils/socket.ts Major refactor: adds readNv12Metadata helper, full-range NV12 support in all fallback paths, lazy worker creation, RGBA Canvas2D pending frame scheduling, and low-power GPU preference for direct-canvas consumers
apps/desktop/src/utils/frame-worker.ts Adds NV12_FULL_MAGIC, fullRange field throughout NV12 metadata parsing and conversion; fixes uvSize to use Math.floor; validates yStride > 0 before accepting NV12 frames
apps/desktop/src-tauri/src/general_settings.rs Makes native camera preview the macOS default; adds native_camera_preview_enabled() helper that ignores the setting on non-macOS; one-time rollout migration using a raw store marker key
apps/desktop/src-tauri/src/recording.rs get_best_format rewritten: prioritises 16:9 aspect ratio, then resolution, then closeness to 60fps; cascading filter buckets avoid returning null when no ideal format exists
crates/rendering/src/iosurface_texture.rs Adds create_bgra_texture for importing plane-0 BGRA IOSurfaces into Metal; create_uv_texture already correctly halves dimensions internally; consistent with BGRA path in camera_native.rs

Comments Outside Diff (4)

  1. crates/recording/src/feeds/camera.rs, line 969-977 (link)

    P1 Linux build broken by parameter count mismatch

    The spawn_camera_setup call to setup_camera now passes 6 arguments (…, ffmpeg_sender_count, native_sender_count), but the Linux #[cfg(target_os = "linux")] overload still declares only 5 parameters (no ffmpeg_sender_count). This will produce a compile error on Linux. The Linux function needs an additional _ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize> parameter to stay in sync with the call site at line 403-411.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/recording/src/feeds/camera.rs
    Line: 969-977
    
    Comment:
    **Linux build broken by parameter count mismatch**
    
    The `spawn_camera_setup` call to `setup_camera` now passes 6 arguments (`…, ffmpeg_sender_count, native_sender_count`), but the Linux `#[cfg(target_os = "linux")]` overload still declares only 5 parameters (no `ffmpeg_sender_count`). This will produce a compile error on Linux. The Linux function needs an additional `_ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize>` parameter to stay in sync with the call site at line 403-411.
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. apps/desktop/src/utils/socket.ts, line 65-133 (link)

    P2 Full-range NV12 decoded with wrong coefficients in JS fallback

    convertNv12ToRgbaMainThread always applies the video-range BT.601 formula (Y offset of 16, as in the WGSL yuv_video_range shader entry). When the camera delivers 420f (full-range NV12), the GPU shader path correctly routes to yuv_full_range (no Y offset), but this JS Canvas2D fallback will produce visually slightly washed-out colours. For a user on a machine where WebGPU is unavailable, the preview colours will differ from what the recording captures.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: apps/desktop/src/utils/socket.ts
    Line: 65-133
    
    Comment:
    **Full-range NV12 decoded with wrong coefficients in JS fallback**
    
    `convertNv12ToRgbaMainThread` always applies the video-range BT.601 formula (Y offset of 16, as in the WGSL `yuv_video_range` shader entry). When the camera delivers `420f` (full-range NV12), the GPU shader path correctly routes to `yuv_full_range` (no Y offset), but this JS Canvas2D fallback will produce visually slightly washed-out colours. For a user on a machine where WebGPU is unavailable, the preview colours will differ from what the recording captures.
    
    How can I resolve this? If you propose a fix, please make it concise.
  3. apps/desktop/src-tauri/src/frame_ws.rs, line 8-17 (link)

    P2 Module-level stats statics are shared across all concurrent socket connections

    TOTAL_BYTES_SENT, TOTAL_FRAMES_SENT, and the other diagnostic atomics are module-level singletons. If two clients connect to create_watch_frame_ws simultaneously (e.g., two browser tabs), their stats interleave and whichever connection happens to trigger the 2-second window first swaps all counters to 0, discarding the other connection's data. Each handler instance's log line will show a misleading mix of both connections' work, and real throughput will appear halved. Moving these stats into a per-handle_socket struct would keep them isolated.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: apps/desktop/src-tauri/src/frame_ws.rs
    Line: 8-17
    
    Comment:
    **Module-level stats statics are shared across all concurrent socket connections**
    
    `TOTAL_BYTES_SENT`, `TOTAL_FRAMES_SENT`, and the other diagnostic atomics are module-level singletons. If two clients connect to `create_watch_frame_ws` simultaneously (e.g., two browser tabs), their stats interleave and whichever connection happens to trigger the 2-second window first swaps all counters to 0, discarding the other connection's data. Each handler instance's log line will show a misleading mix of both connections' work, and real throughput will appear halved. Moving these stats into a per-`handle_socket` struct would keep them isolated.
    
    How can I resolve this? If you propose a fix, please make it concise.
  4. crates/recording/src/feeds/camera.rs, line 972-979 (link)

    P1 Linux build broken by parameter count mismatch

    The #[cfg(target_os = "linux")] overload of setup_camera still declares only 5 parameters, but spawn_camera_setup (line ~403) now calls it with 6 arguments — adding ffmpeg_sender_count between native_frame_recipient and native_sender_count. Linux will fail to compile with a type mismatch: the 5th positional arg (ffmpeg_sender_count: Arc<AtomicUsize>) will be matched against _native_sender_count, and native_sender_count becomes a 6th argument with no corresponding parameter. The fix is to add _ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize> before _native_sender_count in the Linux overload.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/recording/src/feeds/camera.rs
    Line: 972-979
    
    Comment:
    **Linux build broken by parameter count mismatch**
    
    The `#[cfg(target_os = "linux")]` overload of `setup_camera` still declares only 5 parameters, but `spawn_camera_setup` (line ~403) now calls it with 6 arguments — adding `ffmpeg_sender_count` between `native_frame_recipient` and `native_sender_count`. Linux will fail to compile with a type mismatch: the 5th positional arg (`ffmpeg_sender_count: Arc<AtomicUsize>`) will be matched against `_native_sender_count`, and `native_sender_count` becomes a 6th argument with no corresponding parameter. The fix is to add `_ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize>` before `_native_sender_count` in the Linux overload.
    
    How can I resolve this? If you propose a fix, please make it concise.

Reviews (4): Last reviewed commit: "fix: validate nv12 frame metadata" | Re-trigger Greptile

@polarityinc

polarityinc Bot commented Jun 10, 2026

Copy link
Copy Markdown

Paragon Review Skipped

Hi @richiemcilroy! Your Polarity credit balance is insufficient to complete this review.

Please visit https://app.paragon.run to finish your review.

@superagent-security

Copy link
Copy Markdown

Superagent didn't find any vulnerabilities or security issues in this PR.

Comment thread apps/desktop/src-tauri/src/camera.rs Outdated
);

let (drop_tx, drop_rx) = oneshot::channel();
let renderer = ManuallyDrop::new(renderer);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ManuallyDrop here means if run_on_main_thread fails, renderer never gets dropped (and the shutdown wait will always time out). Might be safer to avoid ManuallyDrop and let the closure being dropped clean up if scheduling fails.

Suggested change
let renderer = ManuallyDrop::new(renderer);
let (drop_tx, drop_rx) = oneshot::channel();
let renderer = Box::new(renderer);
let _ = window.run_on_main_thread(move || {
drop(renderer);
let _ = drop_tx.send(());
});

Comment thread apps/desktop/src-tauri/src/camera.rs Outdated
.blur_processor
.as_mut()
.and_then(|p| p.process_returning_output())
.expect("blurred flag guarantees output");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This expect will take down the preview thread if the blur processor ever returns None unexpectedly. Since the non-blurred path is already available, consider a graceful fallback instead of panicking.

Suggested change
.expect("blurred flag guarantees output");
let Some(blur_output) = self
.blur_processor
.as_mut()
.and_then(|p| p.process_returning_output())
else {
prepared.render(surface, frame_data, frame_stride);
return true;
};

Comment thread apps/desktop/src-tauri/src/camera.rs Outdated
self.aspect_ratio = Cached::default();

let surface = self.surface.take();
let surface = ManuallyDrop::new(self.surface.take());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same pattern as above: ManuallyDrop + ignoring the result means a failure to schedule onto the main thread turns into a silent leak. Dropping an Option<Surface> via a moved closure is enough here.

Suggested change
let surface = ManuallyDrop::new(self.surface.take());
let surface = self.surface.take();
let (drop_tx, drop_rx) = oneshot::channel();
let _ = window.run_on_main_thread(move || {
drop(surface);
let _ = drop_tx.send(());
});

Comment thread crates/recording/src/feeds/camera.rs Outdated
// converted to derive VideoInfo; afterwards skip the full-frame
// copy entirely when nothing consumes ffmpeg frames.
if ready_signal.is_none()
&& ffmpeg_sender_count.load(std::sync::atomic::Ordering::Relaxed) == 0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ffmpeg_sender_count gates whether we skip the conversion, it’d be good to match the Release stores with an Acquire load here (and on the other occurrence) so this isn’t relying on relaxed visibility.

Suggested change
&& ffmpeg_sender_count.load(std::sync::atomic::Ordering::Relaxed) == 0
&& ffmpeg_sender_count.load(std::sync::atomic::Ordering::Acquire) == 0

Comment thread crates/recording/src/feeds/camera.rs Outdated
let _ = handle.join();
} else {
warn!("Camera setup thread is still running after cancellation");
let _ = std::thread::Builder::new()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If spawning the reaper thread fails, we currently drop the JoinHandle and lose the ability to ever join() it (and we also lose that signal in logs). Worth logging the error.

Suggested change
let _ = std::thread::Builder::new()
if let Err(err) = std::thread::Builder::new()
.name("camera-setup-reaper".to_string())
.spawn(move || {
let _ = handle.join();
})
{
warn!(?err, "Failed to spawn camera-setup-reaper thread");
}

.sample_buf
.format_desc()
.ok_or(NativeFrameError::NoFormatDesc)?;
match cidre::four_cc_to_str(&mut format_desc.media_sub_type().to_be_bytes()) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor readability thing: borrowing a temporary to_be_bytes() as &mut is a little surprising. A local let mut fourcc = …; makes this easier to follow.

Suggested change
match cidre::four_cc_to_str(&mut format_desc.media_sub_type().to_be_bytes()) {
let mut fourcc = format_desc.media_sub_type().to_be_bytes();
match cidre::four_cc_to_str(&mut fourcc) {

Comment thread apps/desktop/src/utils/socket.ts Outdated
initWebGPU(directCanvas as unknown as OffscreenCanvas)
.then((renderer) => {
mainThreadWebGPU = renderer;
// initWebGPU's catch covers the no-adapter case, so a separate

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but this repo tends to be strict about avoiding comments unless they capture a non-obvious invariant. This one feels like it can just be removed.

Suggested change
// initWebGPU's catch covers the no-adapter case, so a separate
const maybeSupported =
typeof navigator !== "undefined" && !!navigator.gpu;

@richiemcilroy

Copy link
Copy Markdown
Member Author

please re-review the pr @greptileai

settings,
new_frame_recipient,
native_frame_recipient,
ffmpeg_sender_count,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra arg means the setup_camera signatures need to stay in sync across cfgs. The #[cfg(target_os = "linux")] setup_camera overload currently doesn’t accept ffmpeg_sender_count, so this will fail to compile on Linux.

I’d add an _ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize> parameter to the Linux overload (even if unused for now), or otherwise keep the per-OS signatures aligned.

Comment thread apps/desktop/src/utils/socket.ts Outdated

const frameData = new Uint8ClampedArray(buffer, 0, totalSize);
renderNv12FrameCanvas2D(frameData, width, height, yStride);
renderNv12FrameCanvas2D(frameData, width, height, yStride, receivedAt);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Canvas2D NV12 path ultimately goes through the same conversion that assumes video-range (Y - 16). If the WS ever carries full-range NV12 (e.g. 420f cameras), preview colors will be off.

If that’s a real possibility, it may be worth encoding full-range vs video-range into the WS metadata (or using a distinct magic) and selecting the right coefficients in the main-thread NV12 converter.

tracing::info!("Socket connection established");
let now = std::time::Instant::now();

subscribers.fetch_add(1, Ordering::AcqRel);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that multiple subscribers are tracked, the module-level TOTAL_* / MAX_* atomics used for the 2s stats logs will mix across concurrent connections, and each connection’s swap(0) can discard the other connection’s counters.

If you expect multiple tabs/windows, pushing these counters into handle_socket (per-connection) would make the logs much easier to trust.

@richiemcilroy

Copy link
Copy Markdown
Member Author

hey @greptileai please re-review the pr

Comment on lines +44 to +59
function readNv12Metadata(buffer: ArrayBuffer): Nv12Metadata | null {
if (buffer.byteLength < NV12_METADATA_SIZE) return null;

const formatCheck = new DataView(buffer, buffer.byteLength - 4, 4);
const magic = formatCheck.getUint32(0, true);
if (magic !== NV12_VIDEO_MAGIC && magic !== NV12_FULL_MAGIC) return null;

const metadataOffset = buffer.byteLength - NV12_METADATA_SIZE;
const meta = new DataView(buffer, metadataOffset, NV12_METADATA_SIZE);
return {
yStride: meta.getUint32(0, true),
height: meta.getUint32(4, true),
width: meta.getUint32(8, true),
fullRange: magic === NV12_FULL_MAGIC,
};
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since readNv12Metadata is now the gate for all NV12 handling, it might be worth validating the buffer actually contains yStride * height + yStride * floor(height/2) bytes. Otherwise the later new Uint8ClampedArray(buffer, 0, totalSize) can throw and kill the WS message handler if a truncated frame ever shows up.

Suggested change
function readNv12Metadata(buffer: ArrayBuffer): Nv12Metadata | null {
if (buffer.byteLength < NV12_METADATA_SIZE) return null;
const formatCheck = new DataView(buffer, buffer.byteLength - 4, 4);
const magic = formatCheck.getUint32(0, true);
if (magic !== NV12_VIDEO_MAGIC && magic !== NV12_FULL_MAGIC) return null;
const metadataOffset = buffer.byteLength - NV12_METADATA_SIZE;
const meta = new DataView(buffer, metadataOffset, NV12_METADATA_SIZE);
return {
yStride: meta.getUint32(0, true),
height: meta.getUint32(4, true),
width: meta.getUint32(8, true),
fullRange: magic === NV12_FULL_MAGIC,
};
}
function readNv12Metadata(buffer: ArrayBuffer): Nv12Metadata | null {
if (buffer.byteLength < NV12_METADATA_SIZE) return null;
const formatCheck = new DataView(buffer, buffer.byteLength - 4, 4);
const magic = formatCheck.getUint32(0, true);
if (magic !== NV12_VIDEO_MAGIC && magic !== NV12_FULL_MAGIC) return null;
const metadataOffset = buffer.byteLength - NV12_METADATA_SIZE;
const meta = new DataView(buffer, metadataOffset, NV12_METADATA_SIZE);
const yStride = meta.getUint32(0, true);
const height = meta.getUint32(4, true);
const width = meta.getUint32(8, true);
const ySize = yStride * height;
const uvSize = yStride * Math.floor(height / 2);
const totalSize = ySize + uvSize;
if (yStride === 0 || height === 0 || buffer.byteLength - NV12_METADATA_SIZE < totalSize) {
return null;
}
return {
yStride,
height,
width,
fullRange: magic === NV12_FULL_MAGIC,
};
}

Comment on lines 1203 to 1205
const ySize = yStride * height;
const uvSize = yStride * (height / 2);
const totalSize = ySize + uvSize;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but potentially sharp edge: readNv12Metadata validates sizes using Math.floor(height / 2), but this block uses height / 2. If height is ever odd, this can drift (and can even end up passing a non-integer length into the typed array in other call sites).

Suggested change
const ySize = yStride * height;
const uvSize = yStride * (height / 2);
const totalSize = ySize + uvSize;
const ySize = yStride * height;
const uvSize = yStride * Math.floor(height / 2);
const totalSize = ySize + uvSize;

@richiemcilroy

Copy link
Copy Markdown
Member Author

hey @greptileai please re-review the PR fully

@richiemcilroy richiemcilroy merged commit 3d8dae0 into main Jun 10, 2026
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant