Skip to content

feat(ui): smooth + cheap agent streaming (adaptive reveal + block-split markdown)#2716

Open
KarloAldrete wants to merge 5 commits into
PostHog:mainfrom
KarloAldrete:perf/streaming-markdown-blocks
Open

feat(ui): smooth + cheap agent streaming (adaptive reveal + block-split markdown)#2716
KarloAldrete wants to merge 5 commits into
PostHog:mainfrom
KarloAldrete:perf/streaming-markdown-blocks

Conversation

@KarloAldrete

@KarloAldrete KarloAldrete commented Jun 16, 2026

Copy link
Copy Markdown

Problem

Streamed agent replies have two issues:

  1. Choppy — tokens arrive in irregular bursts and are painted exactly as they land, so the text jumps forward unevenly instead of reading as smooth typing.
  2. Slow on long messagesAgentMessageMarkdownRenderer re-parses the entire accumulated markdown (react-markdown + remark-gfm) on every token: O(n) per token, O(n²) per message. Past ~8k chars a single token costs ~19 ms of parsing, over the 16.6 ms frame budget, so long answers stutter and saturate the main thread.

Addresses #2517 (smooth token streaming).

Changes

Two pieces that reinforce each other.

1. Block-split rendering (StreamingMarkdown + splitMarkdownBlocks) — the active message is split into top-level blocks (fenced code kept intact). Completed blocks keep a stable string, so the memoized MarkdownRenderer skips them and only the growing tail re-parses. An open code fence renders as plain text until it closes, so syntax highlighting runs once, not per token.

Measured on a 3,000-char answer streamed in ~250 tokens (jsdom + react-dom/server, default components — a lower bound):

total markdown CPU per token last token
full re-parse 878 ms 3.5 ms 6.8 ms
block-split 88 ms 0.35 ms 0.16 ms

~10× less, and the per-token cost stays flat instead of growing with length.

2. Steady reveal (useSmoothedText) — reveals the accumulated text at a constant ~120 chars/sec, character by character, honoring prefers-reduced-motion. Decouples paint from the bursty token arrival so the text reads as even typing.

The two are complementary: the reveal makes streaming look smooth, and the block-split keeps each frame cheap so it stays ~60 fps even on long messages. Completed messages render via a single full MarkdownRenderer parse, so the final output is byte-identical.

Relationship to #2685

#2685 also implements smooth token streaming. Per @charlesvien's review, this PR's reveal now matches the steadier cadence from #2685 (constant ~120 chars/sec, char by char). The distinct value here is the block-split renderer: it keeps the per-token markdown cost flat so the smooth reveal holds up on long messages — the part #2685 doesn't address. Happy to coordinate or fold these together however the team prefers.

How did you test this?

  • Unit tests: splitMarkdownBlocks.test.ts (round-trip / no text dropped, paragraph + fenced-block splitting, parseOpenFence incl. the completed-then-open-fence edge case) and useSmoothedText.test.ts (the pure nextRevealLength easing + the hook: immediate on mount, gradual steady reveal, snap-on-replace, reduced-motion).
  • pnpm --filter @posthog/ui test → 779 pass.
  • pnpm --filter @posthog/ui typecheck passes; pre-commit pnpm typecheck passes.
  • biome check clean on changed files; node scripts/check-host-boundaries.mjs reports no new violations.
  • Block-split numbers from a local micro-benchmark of the markdown render path (not committed).

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Created with PostHog Code

Each streamed token re-parsed the entire accumulated markdown of the active
agent message (react-markdown + remark-gfm) — O(n) per token, O(n^2) per
message. Past ~8k chars a single token costs ~19ms of parsing, over the 16.6ms
frame budget, so long answers stutter and saturate the main thread.

StreamingMarkdown splits the active message into top-level blocks (fenced code
kept intact). Completed blocks keep a stable string so the memoized
MarkdownRenderer skips them and only the growing tail re-parses; an open code
fence renders as plain text until it closes, so syntax highlighting runs once
rather than per token. On a 3k-char answer this is ~10x less markdown CPU
(878ms -> 88ms total; 3.5ms -> 0.35ms per token) and the per-token cost stays
flat instead of growing with message length. Completed messages still render
via a single full MarkdownRenderer parse, so output is byte-identical — no
visual change, only less work per token.

Complements PostHog#2685 (smooth token reveal): that PR eases when text appears, this
one keeps rendering it cheap so the smooth reveal stays at 60fps even on long
messages.

Generated-By: PostHog Code
Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
packages/ui/src/features/editor/components/StreamingMarkdown.tsx:16-22
**`parseOpenFence` targets the first fence, not the unterminated one**

`/.exec(block)` finds the first fence marker in the block. If the tail block contains a completed fence followed by an open fence with no blank line between them — e.g. `` ```ts\ncode\n```\ntext\n```ts\npartial `` (no blank lines, so `splitMarkdownBlocks` keeps it as one block) — `before` becomes `""` and `code` absorbs everything from the first fence opener onward, including the intermediate closing fence and second opening marker. The result is garbled plain-text output during streaming (it self-corrects once the fence closes and `MarkdownRenderer` takes over, but the in-progress display is wrong).

A fix is to scan all lines with the same fence-tracking logic as `hasOpenCodeFence`, record the last fence-open line index, then slice `before` and `code` relative to that line.

### Issue 2 of 3
packages/ui/src/features/editor/components/splitMarkdownBlocks.ts:59-74
**Fence-tracking logic duplicated between the two exported functions**

`hasOpenCodeFence` re-implements the identical per-line regex match + `inFence`/`fenceChar` state machine that lives inside `splitMarkdownBlocks`. Extracting a shared helper (e.g. `parseFenceState(lines)` that returns `{ inFence, fenceChar }`) would satisfy the codebase's OnceAndOnlyOnce rule and make future spec-compliance fixes (e.g. requiring the closing fence to be at least as long as the opening fence) apply to both functions automatically.

### Issue 3 of 3
packages/ui/src/features/editor/components/splitMarkdownBlocks.test.ts:5-44
**Tests prefer parameterised style over manual loops and multi-assertion cases**

The "never drops text" case iterates over samples with a `for` loop, and the `hasOpenCodeFence` suite packs four independent assertions into one `it`. Vitest's `it.each` / `test.each` makes each sample a separate, named test entry — failures name the exact input and expected output, which saves time when something breaks. Converting both suites to parameterised form would align with the project's stated preference.

Reviews (1): Last reviewed commit: "perf(markdown): re-parse only the tail b..." | Re-trigger Greptile

Comment thread packages/ui/src/features/editor/components/StreamingMarkdown.tsx Outdated
Comment thread packages/ui/src/features/editor/components/splitMarkdownBlocks.ts
Comment thread packages/ui/src/features/editor/components/splitMarkdownBlocks.test.ts Outdated
…fence, it.each

- Extract a single stepFence state machine shared by splitMarkdownBlocks,
  hasOpenCodeFence and parseOpenFence (OnceAndOnlyOnce — no duplicated
  fence-tracking logic).
- parseOpenFence now targets the LAST unterminated fence, so a completed fence
  earlier in the same block stays in `before` and renders normally instead of
  being swallowed as plain text mid-stream.
- Parameterize the splitter tests with it.each and add parseOpenFence cases,
  including the completed-then-open fence edge case.

Generated-By: PostHog Code
Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
@KarloAldrete

Copy link
Copy Markdown
Author

Thanks for the review! Addressed all three in 3f6bc14:

  1. parseOpenFence targeting the wrong fence — it now walks the block tracking fence state and targets the last unterminated fence, so a completed fence earlier in the same block stays in before and renders normally instead of being swallowed as plain text mid-stream. Added a test for the completed-then-open-fence case.
  2. Duplicated fence logic — extracted a single stepFence state machine shared by splitMarkdownBlocks, hasOpenCodeFence, and parseOpenFence.
  3. Tests — parameterized the splitter suite with it.each and split the hasOpenCodeFence assertions into individual cases.

pnpm --filter @posthog/ui test → 762 pass; typecheck + biome clean.

Smooth the reveal of streamed tokens on top of the block-split renderer.
useSmoothedText eases the displayed prefix toward the accumulated text via
requestAnimationFrame with a step proportional to the backlog, so it tracks a
fast token stream (catching up in ~6 frames) instead of a fixed chars/sec that
lags behind and then snaps. It reveals on word boundaries so partial markdown
tokens never flash, and shows completed (non-streaming) messages in full
immediately.

Pairs with the block-split renderer: only the growing tail re-parses each
frame, so the reveal holds ~60fps even on long messages — which a fixed-rate
reveal over a full markdown re-parse can't.

Generated-By: PostHog Code
Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
@KarloAldrete KarloAldrete changed the title perf(markdown): re-parse only the tail block while streaming feat(ui): smooth + cheap agent streaming (adaptive reveal + block-split markdown) Jun 17, 2026
Resolve AgentMessage.tsx: keep main's Box className (drops py-1) together with
the streaming changes (StreamingMarkdown + adaptive reveal).

Generated-By: PostHog Code
Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
@charlesvien

charlesvien commented Jun 18, 2026

Copy link
Copy Markdown
Member

Hey @KarloAldrete, the streaming in #2685 is much smoother and aligned with the expectation I had for how this feels. Anyway to adjust that here?

Thanks for your contribution!

…review)

@charlesvien preferred the steadier reveal in PostHog#2685. Switch useSmoothedText from
an adaptive (backlog-proportional) rate to a constant ~120 chars/sec, character
by character (dropping the word-boundary stepping), and honor
prefers-reduced-motion — so the cadence reads as even typing. The block-split
renderer is unchanged and still keeps the per-token markdown cost flat on long
messages.

Generated-By: PostHog Code
Task-Id: 8e9f327d-84b1-4608-9f48-c3038dbf87ca
@KarloAldrete

Copy link
Copy Markdown
Author

Thanks @charlesvien! Switched the reveal in c51416c to match #2685's cadence — a constant ~120 chars/sec, character by character (dropped the adaptive, backlog-proportional stepping and the word-boundary snapping that made ours feel jumpier), and it now honors prefers-reduced-motion too. Should feel like #2685's now.

The block-split renderer is unchanged, so the steady reveal stays ~60fps even on long messages (only the growing tail re-parses each frame). Let me know how it feels!

@charlesvien

Copy link
Copy Markdown
Member

Thanks @charlesvien! Switched the reveal in c51416c to match #2685's cadence — a constant ~120 chars/sec, character by character (dropped the adaptive, backlog-proportional stepping and the word-boundary snapping that made ours feel jumpier), and it now honors prefers-reduced-motion too. Should feel like #2685's now.

The block-split renderer is unchanged, so the steady reveal stays ~60fps even on long messages (only the growing tail re-parses each frame). Let me know how it feels!

Is the performance still decent from your testing?

@KarloAldrete

Copy link
Copy Markdown
Author

Yep — perf is the whole point of this PR, and the cadence change didn't touch it (the reveal rate and the block-split renderer are independent). Two angles:

1. Measured in real Chromium (Playwright, client-rendered into a real DOM, on a Ryzen 9 5900X) — total main-thread CPU to stream a full message token by token:

message length full re-parse (main) block-split (this PR) speedup
1,000 chars 46 ms 18 ms 2.6×
2,000 chars 130 ms 25 ms 5.2×
4,000 chars 417 ms 43 ms 9.7×
8,000 chars 1,361 ms 81 ms 17×

Full re-parse is ~O(n²) (re-renders the whole message every token); block-split is ~O(n) (only the growing tail re-parses), so the longer the answer the bigger the win — an 8k-char answer drops from ~1.4 s of main-thread CPU to ~80 ms. Default components here, so it's a lower bound (the app also re-runs syntax highlighting per token, which block-split avoids too).

2. I've been dogfooding it as my daily driver. I'm in a single conversation that's ~370K tokens deep right now and it still hasn't stuttered or felt janky — the perf issues I used to hit are gone. That's honestly why I kept iterating on it.

And if the cadence still doesn't feel right to you, I'm happy to keep tuning and digging into it — just say the word. Thanks for the thoughtful review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants