Skip to content

fix: add iOS private AX snapshot fallback#758

Open
thymikee wants to merge 7 commits into
mainfrom
codex/ios-ax-snapshot-fallback
Open

fix: add iOS private AX snapshot fallback#758
thymikee wants to merge 7 commits into
mainfrom
codex/ios-ax-snapshot-fallback

Conversation

@thymikee

@thymikee thymikee commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

Adds a simulator-only private AX snapshot fallback for iOS when XCTest returns a sparse application/window tree or fails while serializing AX snapshots.

The fallback uses XCTest private accessibility interfaces dynamically, maps the recovered tree back into existing SnapshotNode output, and keeps normal XCTest snapshots authoritative whenever they return real content. It also adds a conservative public XCTest-query recovery tier for regular sparse snapshots before falling back to private AX; compact interactive snapshots still use the private-AX/find recovery path because Bluesky showed public queries collapse there too.

Also fixes SpringBoard permission alerts in compact interactive snapshots: modal detection now runs before the compact app-tree shortcut, and permission sheets return alert text plus actionable button refs without broad SpringBoard subtree walks.

Hardens mutating find actions on iOS: when compact interactive snapshots collapse to the application root, find retries with a full snapshot, then with a query-scoped full snapshot if unscoped AX serialization fails on unrelated feed content.

Closes #701
Closes #761

Validation

  • pnpm build
  • pnpm check:quick
  • pnpm check:unit outside sandbox after sandbox listener failures
  • pnpm build:xcuitest
  • pnpm exec vitest run src/daemon/handlers/tests/find.test.ts
  • Compared fix(ios): recover sparse snapshot trees via element queries (React Native) #761 locally on Bluesky Home: its public-query recovery did not fix snapshot -i -c, did not expose the permission alert, and find Search click failed there.
  • Prototyped fix(ios): recover sparse snapshot trees via element queries (React Native) #761-style public query recovery on this branch. Kept it only for regular sparse snapshots; compact public-query recovery and a direct KVC AX-object bridge were tested on Bluesky and dropped because they still returned only the application root.
  • Manual iOS simulator smoke: opened Settings on iPhone 17 iOS 26.2 and verified snapshot -i -c returned usable refs with the rebuilt runner.
  • Manual Bluesky iOS 27 simulator smoke with Xcode 26.2: opened ~/Developer/Bluesky, logged in, reached Home/Discover, verified full Home snapshot exposed 133 nodes including homeScreen, composeFAB, feed tabs, bottom-bar IDs, and post content.
  • Manual Bluesky iOS 27 find smoke after simulator reboot: compact Home snapshot remained sparse by design, but find Search click succeeded through the fallback and navigated to Explore/Search; find Search exists also passed.
  • Manual Bluesky iOS 27 permission smoke: reset app privacy, triggered the native Photos permission sheet from Home, verified snapshot -i -c returned an Alert node with title/message and Select Photos, Allow Full Access, and Don’t Allow button refs, then clicked the Don’t Allow ref successfully.

Touched files: 8. Scope covers the iOS XCTest runner snapshot path plus daemon find fallback handling for sparse compact iOS snapshots.

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

Size Report

Metric Base Current Diff
JS raw 1.2 MB 1.2 MB -21.1 kB
JS gzip 396.6 kB 390.8 kB -5.7 kB
npm tarball 515.3 kB 510.1 kB -5.3 kB
npm unpacked 1.7 MB 1.7 MB -16.3 kB

Startup median (7 runs, lower is better):

Scenario Base Current Diff
CLI --version 28.8 ms 28.8 ms +0.1 ms
CLI --help 45.3 ms 44.4 ms -0.9 ms

Top changed chunks:

Chunk Raw diff Gzip diff
dist/src/session.js -13.1 kB -3.4 kB
dist/src/2415.js -5.4 kB -1.7 kB
dist/src/args.js -1.7 kB -625 B
dist/src/123.js +687 B +282 B
dist/src/8173.js -706 B -201 B

@thymikee thymikee force-pushed the codex/ios-ax-snapshot-fallback branch from f854218 to 2558791 Compare June 11, 2026 06:30
@thymikee thymikee force-pushed the codex/ios-ax-snapshot-fallback branch from 2558791 to 17345dd Compare June 11, 2026 06:55

Copy link
Copy Markdown
Member Author

Code review

Verdict: minor issues — the design is sound and defensively coded, but there are several robustness gaps in the dynamic private-API plumbing and one cross-layer inconsistency.

Findings

  1. Minorsrc/daemon/handlers/find.ts:209-212 vs RunnerTests+Snapshot.swift:282-287: the daemon's sparse check requires exactly 1 Application node while the runner treats ≤2 Application/Window nodes as sparse; on real devices (where the private fallback compiles out and the daemon retry is the only safety net) a 2-node Application+Window compact snapshot never triggers the full-snapshot retry.

  2. MinorRunnerTests+Snapshot.swift:446-454: on AX snapshot failure the experimental private fallback now runs before the existing public depth-limited fallback, so non-sparse apps hitting "AX too large" with a depth flag silently switch recovery source from public to private API; the private result is accepted on a bare nodes.count > 1 check with no quality validation against the public alternative.

  3. MinorRunnerTests+AXSnapshotFallback.swift:134,141: UInt(rawElementType) traps (crashes the runner process) if the private API ever returns a negative elementType; given the file treats this data as untrusted, this should be UInt(exactly:) with a fallback.

  4. MinorRunnerAXSnapshotBridge.m:115-123,127,134: objc_msgSend is cast to an NSInteger-returning function for processID/processIdentifier, which actually return pid_t (int32); the arm64 ABI does not guarantee upper-32-bit extension on return, so the PID match in accessibilityApplicationForApplication: can spuriously fail (silently disabling the fallback) depending on compiler/Xcode version — exactly the cross-version fragility this code defends against elsewhere.

  5. MinorRunnerAXSnapshotBridge.m:238: [(NSValue *)value getValue:&frame] copies the value's own size into a CGRect-sized stack buffer; if a future XCTest boxes the frame as a different type this is a stack buffer overflow — use getValue:size: or verify objCType first.

  6. Minorsrc/daemon/handlers/find.ts:209 (isSparseIosInteractiveSnapshot): gating on backend === 'xctest' also matches the macOS runner, so the "iOS" retry fires for macOS apps too; likely benign but broader than the name and PR intent suggest.

  7. Minor — legitimately minimal screens (splash/loading) now pay one private AX snapshot per runner snapshot plus an extra full uncompacted daemon capture per mutating find; bounded (at most 2 extra captures, 750 ms cache reuse) but a recurring latency tax with no "known sparse" memoization.

  8. Minor (tests) — the most intricate new logic, appendPrivateAXNode (scope/compact/interactiveOnly filtering, parent re-linking), is pure enough to unit-test with dictionaries but has none; daemon tests cover both retry paths well but lack negative cases (non-xctest backend, role locator rethrowing).

Verified clean

Simulator-only gating is genuinely compile-time enforced (#if os(iOS) && targetEnvironment(simulator)); every private selector is respondsToSelector:-guarded and @try/@catch-wrapped, so cross-Xcode unrecognized-selector risk degrades to a logged failure; normal XCTest snapshots stay authoritative; the Xcode project uses fileSystemSynchronizedGroups so no pbxproj edit needed; the Swift import name for the bridge selector is correct.

Overall

Careful, well-layered (runner private fallback + daemon retry as the real-device net) and safe to land for simulators, but it carries real maintenance weight: ~450 lines of dynamic ObjC against undocumented XCTest internals whose selectors, return shapes, and ABI details can shift with any Xcode release. Draft #761 solves the same sparse-tree symptom with public XCUIElementQuery traversal — more maintainable and works on real devices, though it currently has a likely-broken detector and only hooks the full-tree path (not compact-interactive or AX-failure, which this PR covers). A pragmatic direction: public-API query recovery as the primary fallback (extended to the compact path) plus this PR's daemon-side find retry, keeping the private bridge only if query recovery proves insufficient.


Generated by Claude Code

… trees

Four fixes that turn the #758 private AX fallback from
works-on-one-tree-shape into reliable on Bluesky Home:

- Depth ladder: the AX server rejects bulk snapshot requests outright
  (kAXErrorIllegalArgument) once requested depth crosses a
  tree-size-dependent limit that moves with live content. Retry at
  56/40/24/12 instead of giving up after one attempt at 64.
- Real attribute identifiers: the server silently ignored the raw
  keypath strings the bridge passed, so every node came back with a
  zero frame (breaking ref taps and the interactive/compact filters,
  which is why 'snapshot -i -c' stayed sparse). Map keypaths through
  XCElementSnapshot.axAttributesForElementSnapshotKeyPaths (it returns
  an NSSet) and drop the mapper's expensive extras (automation type,
  window display id, base type) that pushed deep requests past the 30s
  main-thread watchdog.
- Viewport from the private root frame when the public windows query
  degrades to an infinite viewport, so off-screen drawer content stops
  passing the visibility filter.
- Runner source fingerprint now includes .m/.h, so bridge edits stop
  reusing stale cached runner builds.

Also hardens the bridge per review: UInt(exactly:) for untrusted
element types, pid_t-sized objc_msgSend for process id matching, and
objCType-checked NSValue frame decoding.
@thymikee

Copy link
Copy Markdown
Member Author

On-device validation on Bluesky Home — the fallback does not fire as shipped, fixed in a follow-up branch

Validated this PR on a live Bluesky Home feed (iPhone 17 Pro simulator, iOS 27, Xcode 26.2, dev-client build from ~/Developer/Bluesky, logged in). As-is, this branch reproduces the exact same failures as main on that screen: full snapshotIOS_AX_SNAPSHOT_FAILED (kAXErrorIllegalArgument), snapshot -i -c → 1 node. The runner log shows why: AGENT_DEVICE_RUNNER_PRIVATE_AX_SNAPSHOT_FAILED=Error kAXErrorIllegalArgument… — the private bridge fails with the same error as the public API.

Root causes (all verified empirically with an instrumented bridge)

  1. The AX server rejects bulk snapshot requests above a tree-size-dependent depth. maxDepth: 64 fails outright on today's Bluesky Home; the same request succeeds at depth ≤56 (the threshold moves with live feed content — which is why this PR's original validation saw 133 nodes one day and I see the failure today). Fix: a depth ladder (64→56→40→24→12) instead of a single attempt. This is the same class of mitigation Appium uses (snapshotMaxDepth reduction).
  2. Every recovered node had a zero frame. The bridge passes raw keypath strings ("frame", "label"…) as the attributes array, but the AX server expects real attribute identifiers — it silently dropped them and returned frame-less snapshots. This is why snapshot -i -c "remained sparse by design": with zero rects, nothing passes the visibility filter, and ref taps would target (0,0) anyway. Fix: map keypaths via XCElementSnapshot.axAttributesForElementSnapshotKeyPaths:isMacOS: (note: it returns an NSSet, not NSArray) — and filter out the mapper's expansions (AutomationType, WindowDisplayId, ElementBaseType), which are so expensive on this tree that even depth-24 requests blow the 30s main-thread watchdog with them included.
  3. Off-screen drawer content leaked into compact snapshots: safeSnapshotViewport degrades to an infinite viewport on exactly the apps that need this fallback (the public windows query fails too), so Bluesky's closed drawer menu (at negative x) passed the visibility filter and produced refs that tap the wrong UI. Fix: use the private root's own frame as the viewport when the public one is degenerate.
  4. .m/.h files are not in the runner source fingerprint, so every bridge edit silently reuses a stale cached runner build — worth knowing for anyone iterating on this PR.

Results with the fixes (branch fix/ios-ax-snapshot-depth-ladder, stacked on this PR)

command this PR (today, Bluesky Home) with fixes
snapshot IOS_AX_SNAPSHOT_FAILED 184 nodes, real frames, ~16s (ladder 64→56)
snapshot -i -c 1 node 43 interactive nodes with precise rects (feed tabs at x=6/141/283, drawer excluded)
ref tap from -i -c n/a lands on element centers, navigation verified
Settings (healthy app) unchanged unchanged — detector sees content, private path never fires

Also applied the review's bridge-hardening items (UInt(exactly:), pid_t-sized msgSend, objCType-checked NSValue decode).

Cross-check vs #761

#761's recovery primitive (public typed-query sweep) is demonstrably dead on this screen: the -i -c path is that sweep, and it deadlines with zero elements while coordinate taps work. Public query recovery may still help the milder failure class from #761's repro app (sparse snapshot() but working queries), and this PR already includes a public tier for the regular path — but for Bluesky-class trees the private bridge is the only thing that produces nodes, and with the fixes above it does so reliably. Suggest merging the follow-up branch into this PR before landing.

The all-structural sparse detector misses the common large-RN-tree case
where the typed-query sweep resolves one or two stray controls before
its 1s deadline: the payload has 'content', so recovery never fires,
yet 2 nodes is useless in practice. Treat deadline-truncated payloads
with <= 8 nodes as needing recovery, and only replace the original
payload when the recovered tree actually carries more nodes. Completed
sweeps on legitimately minimal screens stay untouched (not truncated).
@thymikee

Copy link
Copy Markdown
Member Author

Update: raw snapshot -i -c on Bluesky Home is now solved — fixes pushed to this branch

Re the latest status ("the raw compact snapshot itself still does not work"): that was true of this branch's previous head, and there was one more failure mode beyond the depth ladder. Both fixes are now on this PR branch (fast-forwarded 9f42418..f0ede09, two commits from fix/ios-ax-snapshot-depth-ladder).

The second failure mode explains the flaky validations: the all-structural sparse detector misses the common case where the typed-query sweep resolves one or two stray controls before its 1s deadline (e.g. root + a single "Home" button). The payload then has "content", so recovery never fires — yet 2 nodes is useless. The compact result oscillated between 1 node (recovers) and 2–5 nodes (detector blocks recovery) depending on simulator load. Fixed by treating deadline-truncated payloads with ≤8 nodes as needing recovery, with a quality guard that only swaps in the recovered tree when it actually carries more nodes. Completed sweeps on legitimately minimal screens are not truncated, so they never pay for recovery.

Current behavior on Bluesky Home (iPhone 17 Pro sim, iOS 27, dev client, live feed):

  • snapshot -i -c: 41 interactive refs, three consecutive runs, no oscillation — drawer menu correctly excluded, feed tabs at precise rects, ref taps land on element centers.
  • snapshot (full): 184 nodes with real frames via the depth ladder.
  • Settings and other healthy apps: completed sweeps, recovery never fires.

Commits on the branch: depth ladder + mapped/filtered AX attributes (zero-frame fix) + root-frame viewport + .m/.h fingerprint + review hardening (UInt(exactly:), pid ABI, NSValue objCType), and the deadline-aware detector with an XCTest case. If you have unpushed local work in your worktree, rebase on the updated remote head before pushing.

thymikee added 2 commits June 11, 2026 12:44
- Sync the setup metadata script's fingerprint extension list with the
  runtime (.m/.h were added for the ObjC bridge), fixing the cache
  metadata parity test.
- Reduce find.ts complexity flagged by fallow: hoist the node fetcher
  into createFindNodeFetcher with a recoverSparseInteractiveSnapshot
  helper, split match disambiguation and resolution scoring into
  narrowMultipleMatches/resolvedTouchScore, extract rectsMatch.
…ble in snapshot output

Two transparency gaps from #701's 'no silent fallback' requirement:

- Runner-attached snapshot messages now surface as snapshot warnings
  (readAppleSnapshotResult previously dropped them), so every recovery
  through the fallback accessibility backend or query tier is announced,
  states what it usually means (the app publishes an unhealthy
  accessibility tree - fixing the app is the real cure), and points to
  screenshot as visual truth.

- A leaf whose label merges many comma-joined segments is flagged as a
  collapsed accessible container: the app marks a container accessible,
  hiding every descendant from assistive tech and automation alike.
  Nothing can be recovered below it (VoiceOver sees the same merged
  element), so the warning names the node, estimates the merged label
  count, and gives the app-side fix plus the screenshot/coordinate-tap
  workaround.

Validated live on the lab stress fixture (adlab://stress?accessible=1):
the 6-node tree now carries '@E5 [Other] merges ~126 labels...'.
@thymikee

thymikee commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Real-world validation: production RN app login (reporter-provided build) — recovered after two more detector/plumbing fixes

Tested a reporter-provided simulator build of a production React Native app (full-screen accessibilityViewIsModal overlay shape) on iPhone 17 / iOS 26.2. Stock behavior reproduced exactly: snapshot -i → 2 nodes (a labeled application + window) while the login screen is fully rendered.

This branch initially did not recover it, for two reasons now fixed (pushed as 476979bf2):

  1. The sparse detector counted the app name as content. isSparseApplicationWindowTree required every node to be label-free and non-hittable — but the Application root carries the app's display name as its label and computes hittable (full-screen). The app name says nothing about tree health; Application/Window labels and root hittability no longer count. (Identifiers/values still do.)
  2. Snapshot warnings were silently dropped by the daemon capture chainCaptureSnapshotResult didn't carry them, so the recovery transparency added earlier only worked in the runtime/commands layer. Now threaded through to CLI output.

Result on that login screen:

Snapshot: 16 nodes (truncated)
Recovered iOS snapshot through XCTest accessibility element queries after the public snapshot tree was sparse. This usually means the app publishes an unhealthy accessibility tree - fixing the app accessibility is the real cure. ...
@e5 [text-field] "Email or username input field" [editable]
@e10 [button] "Sign in"

fill @e5 + get text @e5 round-trips correctly — the full agent workflow operates on a screen that previously exposed nothing.

Notably this shape recovered through the public query tier (XCUIElementQuery sees through the modal overlay; no private API involved) — evidence the three-tier design is right: queries first, private AX only for Bluesky-class trees where queries also fail. Healthy apps and Bluesky behavior unchanged; Swift detector tests extended with the labeled-root case; all gates green (750 tests, fallow, lint).

… through the daemon

Validated against a real-world repro (a production React Native app's
login screen, simulator build provided privately by the reporter): a
full-screen accessibilityViewIsModal overlay leaves the public snapshot
with just Application+Window. Two gaps kept recovery off:

- The sparse detector counted the Application label (the app's display
  name) as content and the full-screen root as hittable, so the app
  name alone defeated recovery. Application/Window labels and root
  hittability say nothing about tree health and no longer count.
- Interactor-level snapshot warnings were dropped by the daemon capture
  chain (only the runtime/commands layer kept them); they now thread
  through CaptureSnapshotResult into BackendSnapshotResult.

With both fixes that login screen recovers through the public query
tier: 16 nodes with every control addressable (fill @ref + read-back
verified), and the output carries the recovery warning. Bluesky-class
trees still ladder into the private fallback unchanged.
@thymikee thymikee force-pushed the codex/ios-ax-snapshot-fallback branch from 6ddfce4 to 476979b Compare June 11, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add iOS Simulator AX snapshot fallback for XCTest snapshot failures

1 participant