Optimize camera preview with native macOS GPU path#1909
Conversation
|
Paragon Review Skipped Hi @richiemcilroy! Your Polarity credit balance is insufficient to complete this review. Please visit https://app.paragon.run to finish your review. |
|
Superagent didn't find any vulnerabilities or security issues in this PR. |
| ); | ||
|
|
||
| let (drop_tx, drop_rx) = oneshot::channel(); | ||
| let renderer = ManuallyDrop::new(renderer); |
There was a problem hiding this comment.
ManuallyDrop here means if run_on_main_thread fails, renderer never gets dropped (and the shutdown wait will always time out). Might be safer to avoid ManuallyDrop and let the closure being dropped clean up if scheduling fails.
| let renderer = ManuallyDrop::new(renderer); | |
| let (drop_tx, drop_rx) = oneshot::channel(); | |
| let renderer = Box::new(renderer); | |
| let _ = window.run_on_main_thread(move || { | |
| drop(renderer); | |
| let _ = drop_tx.send(()); | |
| }); |
| .blur_processor | ||
| .as_mut() | ||
| .and_then(|p| p.process_returning_output()) | ||
| .expect("blurred flag guarantees output"); |
There was a problem hiding this comment.
This expect will take down the preview thread if the blur processor ever returns None unexpectedly. Since the non-blurred path is already available, consider a graceful fallback instead of panicking.
| .expect("blurred flag guarantees output"); | |
| let Some(blur_output) = self | |
| .blur_processor | |
| .as_mut() | |
| .and_then(|p| p.process_returning_output()) | |
| else { | |
| prepared.render(surface, frame_data, frame_stride); | |
| return true; | |
| }; |
| self.aspect_ratio = Cached::default(); | ||
|
|
||
| let surface = self.surface.take(); | ||
| let surface = ManuallyDrop::new(self.surface.take()); |
There was a problem hiding this comment.
Same pattern as above: ManuallyDrop + ignoring the result means a failure to schedule onto the main thread turns into a silent leak. Dropping an Option<Surface> via a moved closure is enough here.
| let surface = ManuallyDrop::new(self.surface.take()); | |
| let surface = self.surface.take(); | |
| let (drop_tx, drop_rx) = oneshot::channel(); | |
| let _ = window.run_on_main_thread(move || { | |
| drop(surface); | |
| let _ = drop_tx.send(()); | |
| }); |
| // converted to derive VideoInfo; afterwards skip the full-frame | ||
| // copy entirely when nothing consumes ffmpeg frames. | ||
| if ready_signal.is_none() | ||
| && ffmpeg_sender_count.load(std::sync::atomic::Ordering::Relaxed) == 0 |
There was a problem hiding this comment.
Since ffmpeg_sender_count gates whether we skip the conversion, it’d be good to match the Release stores with an Acquire load here (and on the other occurrence) so this isn’t relying on relaxed visibility.
| && ffmpeg_sender_count.load(std::sync::atomic::Ordering::Relaxed) == 0 | |
| && ffmpeg_sender_count.load(std::sync::atomic::Ordering::Acquire) == 0 |
| let _ = handle.join(); | ||
| } else { | ||
| warn!("Camera setup thread is still running after cancellation"); | ||
| let _ = std::thread::Builder::new() |
There was a problem hiding this comment.
If spawning the reaper thread fails, we currently drop the JoinHandle and lose the ability to ever join() it (and we also lose that signal in logs). Worth logging the error.
| let _ = std::thread::Builder::new() | |
| if let Err(err) = std::thread::Builder::new() | |
| .name("camera-setup-reaper".to_string()) | |
| .spawn(move || { | |
| let _ = handle.join(); | |
| }) | |
| { | |
| warn!(?err, "Failed to spawn camera-setup-reaper thread"); | |
| } |
| .sample_buf | ||
| .format_desc() | ||
| .ok_or(NativeFrameError::NoFormatDesc)?; | ||
| match cidre::four_cc_to_str(&mut format_desc.media_sub_type().to_be_bytes()) { |
There was a problem hiding this comment.
Minor readability thing: borrowing a temporary to_be_bytes() as &mut is a little surprising. A local let mut fourcc = …; makes this easier to follow.
| match cidre::four_cc_to_str(&mut format_desc.media_sub_type().to_be_bytes()) { | |
| let mut fourcc = format_desc.media_sub_type().to_be_bytes(); | |
| match cidre::four_cc_to_str(&mut fourcc) { |
| initWebGPU(directCanvas as unknown as OffscreenCanvas) | ||
| .then((renderer) => { | ||
| mainThreadWebGPU = renderer; | ||
| // initWebGPU's catch covers the no-adapter case, so a separate |
There was a problem hiding this comment.
Nit, but this repo tends to be strict about avoiding comments unless they capture a non-obvious invariant. This one feels like it can just be removed.
| // initWebGPU's catch covers the no-adapter case, so a separate | |
| const maybeSupported = | |
| typeof navigator !== "undefined" && !!navigator.gpu; |
|
please re-review the pr @greptileai |
| settings, | ||
| new_frame_recipient, | ||
| native_frame_recipient, | ||
| ffmpeg_sender_count, |
There was a problem hiding this comment.
This extra arg means the setup_camera signatures need to stay in sync across cfgs. The #[cfg(target_os = "linux")] setup_camera overload currently doesn’t accept ffmpeg_sender_count, so this will fail to compile on Linux.
I’d add an _ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize> parameter to the Linux overload (even if unused for now), or otherwise keep the per-OS signatures aligned.
|
|
||
| const frameData = new Uint8ClampedArray(buffer, 0, totalSize); | ||
| renderNv12FrameCanvas2D(frameData, width, height, yStride); | ||
| renderNv12FrameCanvas2D(frameData, width, height, yStride, receivedAt); |
There was a problem hiding this comment.
This Canvas2D NV12 path ultimately goes through the same conversion that assumes video-range (Y - 16). If the WS ever carries full-range NV12 (e.g. 420f cameras), preview colors will be off.
If that’s a real possibility, it may be worth encoding full-range vs video-range into the WS metadata (or using a distinct magic) and selecting the right coefficients in the main-thread NV12 converter.
| tracing::info!("Socket connection established"); | ||
| let now = std::time::Instant::now(); | ||
|
|
||
| subscribers.fetch_add(1, Ordering::AcqRel); |
There was a problem hiding this comment.
Now that multiple subscribers are tracked, the module-level TOTAL_* / MAX_* atomics used for the 2s stats logs will mix across concurrent connections, and each connection’s swap(0) can discard the other connection’s counters.
If you expect multiple tabs/windows, pushing these counters into handle_socket (per-connection) would make the logs much easier to trust.
|
hey @greptileai please re-review the pr |
| function readNv12Metadata(buffer: ArrayBuffer): Nv12Metadata | null { | ||
| if (buffer.byteLength < NV12_METADATA_SIZE) return null; | ||
|
|
||
| const formatCheck = new DataView(buffer, buffer.byteLength - 4, 4); | ||
| const magic = formatCheck.getUint32(0, true); | ||
| if (magic !== NV12_VIDEO_MAGIC && magic !== NV12_FULL_MAGIC) return null; | ||
|
|
||
| const metadataOffset = buffer.byteLength - NV12_METADATA_SIZE; | ||
| const meta = new DataView(buffer, metadataOffset, NV12_METADATA_SIZE); | ||
| return { | ||
| yStride: meta.getUint32(0, true), | ||
| height: meta.getUint32(4, true), | ||
| width: meta.getUint32(8, true), | ||
| fullRange: magic === NV12_FULL_MAGIC, | ||
| }; | ||
| } |
There was a problem hiding this comment.
Since readNv12Metadata is now the gate for all NV12 handling, it might be worth validating the buffer actually contains yStride * height + yStride * floor(height/2) bytes. Otherwise the later new Uint8ClampedArray(buffer, 0, totalSize) can throw and kill the WS message handler if a truncated frame ever shows up.
| function readNv12Metadata(buffer: ArrayBuffer): Nv12Metadata | null { | |
| if (buffer.byteLength < NV12_METADATA_SIZE) return null; | |
| const formatCheck = new DataView(buffer, buffer.byteLength - 4, 4); | |
| const magic = formatCheck.getUint32(0, true); | |
| if (magic !== NV12_VIDEO_MAGIC && magic !== NV12_FULL_MAGIC) return null; | |
| const metadataOffset = buffer.byteLength - NV12_METADATA_SIZE; | |
| const meta = new DataView(buffer, metadataOffset, NV12_METADATA_SIZE); | |
| return { | |
| yStride: meta.getUint32(0, true), | |
| height: meta.getUint32(4, true), | |
| width: meta.getUint32(8, true), | |
| fullRange: magic === NV12_FULL_MAGIC, | |
| }; | |
| } | |
| function readNv12Metadata(buffer: ArrayBuffer): Nv12Metadata | null { | |
| if (buffer.byteLength < NV12_METADATA_SIZE) return null; | |
| const formatCheck = new DataView(buffer, buffer.byteLength - 4, 4); | |
| const magic = formatCheck.getUint32(0, true); | |
| if (magic !== NV12_VIDEO_MAGIC && magic !== NV12_FULL_MAGIC) return null; | |
| const metadataOffset = buffer.byteLength - NV12_METADATA_SIZE; | |
| const meta = new DataView(buffer, metadataOffset, NV12_METADATA_SIZE); | |
| const yStride = meta.getUint32(0, true); | |
| const height = meta.getUint32(4, true); | |
| const width = meta.getUint32(8, true); | |
| const ySize = yStride * height; | |
| const uvSize = yStride * Math.floor(height / 2); | |
| const totalSize = ySize + uvSize; | |
| if (yStride === 0 || height === 0 || buffer.byteLength - NV12_METADATA_SIZE < totalSize) { | |
| return null; | |
| } | |
| return { | |
| yStride, | |
| height, | |
| width, | |
| fullRange: magic === NV12_FULL_MAGIC, | |
| }; | |
| } |
| const ySize = yStride * height; | ||
| const uvSize = yStride * (height / 2); | ||
| const totalSize = ySize + uvSize; |
There was a problem hiding this comment.
Minor but potentially sharp edge: readNv12Metadata validates sizes using Math.floor(height / 2), but this block uses height / 2. If height is ever odd, this can drift (and can even end up passing a non-integer length into the typed array in other call sites).
| const ySize = yStride * height; | |
| const uvSize = yStride * (height / 2); | |
| const totalSize = ySize + uvSize; | |
| const ySize = yStride * height; | |
| const uvSize = yStride * Math.floor(height / 2); | |
| const totalSize = ySize + uvSize; |
|
hey @greptileai please re-review the PR fully |
Adds a zero-copy IOSurface→GPU native preview on macOS (now the default) and speeds up the WebSocket fallback with leaner frame pipelining, smarter camera feed routing, and improved format selection.
Greptile Summary
This PR introduces a zero-copy IOSurface→Metal→wgpu native camera preview for macOS (now the default) and overhauls the WebSocket fallback path with leaner frame pipelining, per-connection stats, subscriber-aware idle skipping, and correct full-range NV12 colour handling across the GPU, worker, and Canvas2D code paths.
camera_native.rs,iosurface_texture.rs): newNativeFrameConverterimports IOSurface planes as Metal textures and converts YUV→RGB in a single GPU pass with correct BT.601 coefficients for both video-range (420v) and full-range (420f) NV12 as well as BGRA cameras.camera_legacy.rs,frame_ws.rs): switches from broadcast towatchchannel, adds a frame pool to avoid per-frame allocations, skips all conversion work whensubscriber_count == 0, and propagatesCameraPreviewStatefor dimension-aware scaler sizing.socket.ts,frame-worker.ts,webgpu-renderer.ts,camera.tsx,target-select-overlay.tsx): addsNV12_FULL_FORMAT_MAGICthroughout, fixes full-range coefficients in all three rendering fallbacks (WebGPU, worker, Canvas2D), lazily creates the OffscreenCanvas worker, and delegates frame rendering toCanvasControlsin both camera windows.Confidence Score: 5/5
Safe to merge — all three issues from the previous review have been addressed and no new functional defects were found in this re-review.
The three previously flagged issues are fully resolved: the Linux overload now carries _ffmpeg_sender_count; per-connection WsFrameStats replaces the module-level atomics; and full-range NV12 coefficients are correctly applied in every rendering path (GPU shader, worker, and Canvas2D fallback). The new native GPU path, frame pooling, subscriber-count skip optimisation, format selection, and frontend refactor all look technically sound.
No files require special attention.
Important Files Changed
Comments Outside Diff (4)
crates/recording/src/feeds/camera.rs, line 969-977 (link)The
spawn_camera_setupcall tosetup_cameranow passes 6 arguments (…, ffmpeg_sender_count, native_sender_count), but the Linux#[cfg(target_os = "linux")]overload still declares only 5 parameters (noffmpeg_sender_count). This will produce a compile error on Linux. The Linux function needs an additional_ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize>parameter to stay in sync with the call site at line 403-411.Prompt To Fix With AI
apps/desktop/src/utils/socket.ts, line 65-133 (link)convertNv12ToRgbaMainThreadalways applies the video-range BT.601 formula (Y offset of 16, as in the WGSLyuv_video_rangeshader entry). When the camera delivers420f(full-range NV12), the GPU shader path correctly routes toyuv_full_range(no Y offset), but this JS Canvas2D fallback will produce visually slightly washed-out colours. For a user on a machine where WebGPU is unavailable, the preview colours will differ from what the recording captures.Prompt To Fix With AI
apps/desktop/src-tauri/src/frame_ws.rs, line 8-17 (link)TOTAL_BYTES_SENT,TOTAL_FRAMES_SENT, and the other diagnostic atomics are module-level singletons. If two clients connect tocreate_watch_frame_wssimultaneously (e.g., two browser tabs), their stats interleave and whichever connection happens to trigger the 2-second window first swaps all counters to 0, discarding the other connection's data. Each handler instance's log line will show a misleading mix of both connections' work, and real throughput will appear halved. Moving these stats into a per-handle_socketstruct would keep them isolated.Prompt To Fix With AI
crates/recording/src/feeds/camera.rs, line 972-979 (link)The
#[cfg(target_os = "linux")]overload ofsetup_camerastill declares only 5 parameters, butspawn_camera_setup(line ~403) now calls it with 6 arguments — addingffmpeg_sender_countbetweennative_frame_recipientandnative_sender_count. Linux will fail to compile with a type mismatch: the 5th positional arg (ffmpeg_sender_count: Arc<AtomicUsize>) will be matched against_native_sender_count, andnative_sender_countbecomes a 6th argument with no corresponding parameter. The fix is to add_ffmpeg_sender_count: Arc<std::sync::atomic::AtomicUsize>before_native_sender_countin the Linux overload.Prompt To Fix With AI
Reviews (4): Last reviewed commit: "fix: validate nv12 frame metadata" | Re-trigger Greptile