Skip to content

feat: support richer token cost tracking for ask#1353

Open
jsourcebot wants to merge 6 commits into
mainfrom
jminnetian/tool-token-cost-tracking
Open

feat: support richer token cost tracking for ask#1353
jsourcebot wants to merge 6 commits into
mainfrom
jminnetian/tool-token-cost-tracking

Conversation

@jsourcebot

@jsourcebot jsourcebot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor
Screenshot 2026-06-18 at 5 43 17 PM

Adds tracking of token costs per step, also adds estimates of tool call token usage. This information is embedded in the chat history. Tool call token usages are estimates because a single step can run multiple tool calls and there is no mechanism to discern which part of the input token cost came from which tool call.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added per-step token cost tracking in chat conversations, including an “answer” step when applicable
    • Added tool token badges showing estimated token counts for tool outputs across chat thread details
    • Introduced offline token estimation utilities to support these estimates
  • Bug Fixes

    • Improved thinking-step and “answer” step indexing logic for more accurate chat details rendering
  • Tests

    • Updated and expanded tests for token estimation and chat UI rendering (including streaming/mocked message shapes)

Estimate the input-token footprint of each tool call's output (the cost
the result imposes when fed back to the model on subsequent steps) using
a local length-based estimator, persist it per tool call in the chat
message metadata, and surface it inline in each tool call row next to
the Details toggle. Estimates are ~-prefixed to keep them distinct from
the authoritative billed token totals.
Record the provider-reported input/output token usage of each agent step
in the chat message metadata and display it per step group in the
thinking steps view (joined to UI step groups via the step index now
tagged on each tool token usage entry).

Also fix the tool output estimator to measure the model-visible payload:
tools with a toModelOutput mapping (all builtins) send only their output
text to the model, so estimating the raw ToolResult object was counting
UI-only metadata the model never sees. The bytes-per-token ratio is now
a uniform ~2 chars/token, calibrated against provider-reported per-step
usage of code-heavy tool results.
Collect usage from researchStream.steps and response.messages after the
stream completes (covers approval-gated and failed tool calls, off the
hot path), nest tool estimates under their step in a single
stepTokenUsage array, and join UI steps to entries by stepIndex.
@github-actions

This comment has been minimized.

@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bbd3618b-827a-46b7-a136-fcebc6b22504

📥 Commits

Reviewing files that changed from the base of the PR and between da8cd83 and 94936b3.

📒 Files selected for processing (2)
  • CHANGELOG.md
  • packages/web/src/ee/features/chat/agent.ts
✅ Files skipped from review due to trivial changes (1)
  • CHANGELOG.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/web/src/ee/features/chat/agent.ts

Walkthrough

Adds an offline token estimation module (tokenEstimation.ts) for tool outputs, extends sbChatMessageMetadataSchema with a stepTokenUsage field, computes per-step token attribution post-stream in createMessageStream, tracks answerStepIndex in chatThreadListItem, and surfaces estimated token counts in thinking-step UI via a new ToolTokenBadge component wired through DetailsCard, ToolOutputGuard, McpToolComponent, and ToolSearchToolComponent.

Changes

Per-step tool token usage estimation and display

Layer / File(s) Summary
Token estimation utilities and schema types
packages/web/src/features/chat/tokenEstimation.ts, packages/web/src/features/chat/tokenEstimation.test.ts, packages/web/src/features/chat/types.ts
New estimateTokenCount, estimateToolOutputTokens, and estimateModelToolOutputTokens functions handle per-type tool output sizing. sbChatMessageMetadataSchema gains optional stepTokenUsage; StepTokenUsageEntry and ToolTokenUsageEntry types are exported. Tests cover all variant shapes.
Post-stream step token usage computation in agent
packages/web/src/ee/features/chat/agent.ts, packages/web/src/ee/features/chat/agent.test.ts
createMessageStream imports token helpers, builds a toolCallId → token usage map from response.messages, constructs ordered stepTokenUsage aligned to streamed steps, attaches unclaimed tool outputs to the first step, and concatenates prior stepTokenUsage on approval-phase continuation. Test mock updated to return {messages: []}.
Step index tracking in chatThreadListItem
packages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsx
useMemo now derives both uiVisibleThinkingSteps (as ThinkingStep[]) and answerStepIndex by incrementing a step counter on step-start parts. answerStepIndex is forwarded to DetailsCard.
DetailsCard and ThinkingSteps step-index and token-usage rendering
packages/web/src/ee/features/chat/components/chatThread/detailsCard.tsx, packages/web/src/ee/features/chat/components/chatThread/detailsCard.test.tsx
Exports ThinkingStep interface. DetailsCardProps gains answerStepIndex. DetailsCardComponent builds toolTokenUsageMap from metadata.stepTokenUsage. ThinkingSteps iterates {stepIndex, parts}, inlines or right-aligns StepTokenUsage, and appends a dedicated answer usage row. StepPartRenderer uses TOOL_GUARD_CONFIG and passes estimatedOutputTokens into tool components.
ToolTokenBadge component and tool component integration
packages/web/src/ee/features/chat/components/chatThread/tools/toolTokenBadge.tsx, packages/web/src/ee/features/chat/components/chatThread/tools/toolOutputGuard.tsx, packages/web/src/ee/features/chat/components/chatThread/tools/mcpToolComponent.tsx, packages/web/src/ee/features/chat/components/chatThread/tools/toolSearchToolComponent.tsx
New ToolTokenBadge renders a token count with a tooltip. ToolOutputGuard, McpToolComponent, and ToolSearchToolComponent each accept optional estimatedOutputTokens and conditionally render ToolTokenBadge with a Separator.

Sequence Diagram(s)

sequenceDiagram
  participant chatThreadListItem
  participant createMessageStream
  participant estimateModelToolOutputTokens
  participant DetailsCard
  participant ToolTokenBadge

  createMessageStream->>estimateModelToolOutputTokens: estimate tokens per tool-result part
  estimateModelToolOutputTokens-->>createMessageStream: estimated token counts
  createMessageStream->>createMessageStream: build stepTokenUsage aligned to steps
  createMessageStream-->>chatThreadListItem: message-metadata (stepTokenUsage)
  chatThreadListItem->>chatThreadListItem: derive uiVisibleThinkingSteps + answerStepIndex
  chatThreadListItem->>DetailsCard: thinkingSteps, answerStepIndex, metadata
  DetailsCard->>DetailsCard: build toolTokenUsageMap from stepTokenUsage
  DetailsCard->>ToolTokenBadge: estimatedOutputTokens per tool call
  ToolTokenBadge-->>DetailsCard: rendered token badge with tooltip
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • brendan-kellam
  • msukkari
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: support richer token cost tracking for ask' accurately describes the main objective: adding per-step token usage tracking and tool call token estimates to the chat history metadata.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jminnetian/tool-token-cost-tracking

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/web/src/ee/features/chat/agent.ts (1)

257-263: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Prevent computed token metadata from being overwritten by caller metadata

On Line 262, spreading ...metadata last lets external metadata replace derived fields (stepTokenUsage, totals, modelName, traceId), which can break the step-index join contract used by the UI.

Suggested fix
             writer.write({
                 type: 'message-metadata',
                 messageMetadata: {
+                    ...metadata,
                     totalTokens: (priorMetadata?.totalTokens ?? 0) + (totalUsage.totalTokens ?? 0),
                     totalInputTokens: (priorMetadata?.totalInputTokens ?? 0) + (totalUsage.inputTokens ?? 0),
                     totalOutputTokens: (priorMetadata?.totalOutputTokens ?? 0) + (totalUsage.outputTokens ?? 0),
                     totalCacheReadTokens: (priorMetadata?.totalCacheReadTokens ?? 0) + (totalUsage.inputTokenDetails?.cacheReadTokens ?? 0),
                     totalCacheWriteTokens: (priorMetadata?.totalCacheWriteTokens ?? 0) + (totalUsage.inputTokenDetails?.cacheWriteTokens ?? 0),
                     totalResponseTimeMs: (priorMetadata?.totalResponseTimeMs ?? 0) + (new Date().getTime() - startTime.getTime()),
                     // Concatenated (not summed) across approval-continuation
                     // phases so earlier phases' steps are preserved in order.
                     stepTokenUsage: [...(priorMetadata?.stepTokenUsage ?? []), ...stepTokenUsage],
                     modelName,
                     traceId,
-                    ...metadata,
                 }
             });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/web/src/ee/features/chat/agent.ts` around lines 257 - 263, The
spread operator `...metadata` is placed last in the object literal, which allows
caller-provided metadata to overwrite the computed derived fields
(stepTokenUsage, modelName, traceId, and totals). Reorder the object properties
by moving `...metadata` to the beginning of the object literal before the
computed fields, so that the carefully derived values (stepTokenUsage,
modelName, traceId, and any totals fields) are spread after the external
metadata and cannot be accidentally overwritten by caller data.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@packages/web/src/ee/features/chat/agent.ts`:
- Around line 257-263: The spread operator `...metadata` is placed last in the
object literal, which allows caller-provided metadata to overwrite the computed
derived fields (stepTokenUsage, modelName, traceId, and totals). Reorder the
object properties by moving `...metadata` to the beginning of the object literal
before the computed fields, so that the carefully derived values
(stepTokenUsage, modelName, traceId, and any totals fields) are spread after the
external metadata and cannot be accidentally overwritten by caller data.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 181d5be9-92d5-4715-9a94-791f21b19000

📥 Commits

Reviewing files that changed from the base of the PR and between 26435a4 and da8cd83.

📒 Files selected for processing (12)
  • packages/web/src/ee/features/chat/agent.test.ts
  • packages/web/src/ee/features/chat/agent.ts
  • packages/web/src/ee/features/chat/components/chatThread/chatThreadListItem.tsx
  • packages/web/src/ee/features/chat/components/chatThread/detailsCard.test.tsx
  • packages/web/src/ee/features/chat/components/chatThread/detailsCard.tsx
  • packages/web/src/ee/features/chat/components/chatThread/tools/mcpToolComponent.tsx
  • packages/web/src/ee/features/chat/components/chatThread/tools/toolOutputGuard.tsx
  • packages/web/src/ee/features/chat/components/chatThread/tools/toolSearchToolComponent.tsx
  • packages/web/src/ee/features/chat/components/chatThread/tools/toolTokenBadge.tsx
  • packages/web/src/features/chat/tokenEstimation.test.ts
  • packages/web/src/features/chat/tokenEstimation.ts
  • packages/web/src/features/chat/types.ts

@jsourcebot jsourcebot changed the title Jminnetian/tool token cost tracking feat: support richer token cost tracking for ask Jun 19, 2026
Spread caller-supplied metadata before the derived token fields so
stepTokenUsage and the totals can't be clobbered, which would desync
the UI's index-based step join.
@@ -0,0 +1,63 @@
import { ToolResultOutput } from "@ai-sdk/provider-utils";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be in /ee/features?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants