You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Right now PromptBuilder inlines the entire A2UI schema into the system prompt on every turn. Every catalog item's full schema goes in, combined into one big oneOf. As catalogs grow, and especially for custom catalogs, that creates two problems.
First, token cost. The whole schema is re-sent on every request, so even a modest 16-item custom catalog already runs well over 10k prompt tokens, and it scales linearly with the catalog. We pay that on every turn.
Second, and more important, the model still uses components that aren't in the catalog. Even with everything inlined it will invent something like Column for a single-item custom catalog, and the SDK throws CatalogItemNotFoundException (#771). That issue was closed by removing the hardcoded standard-catalog examples from the prompt strings, but there's still nothing structural keeping the model to items that actually exist. The reporter put it well: using the SDK with a custom catalog from scratch is "hard / not possible."
Underneath both is the same thing: the model gets every schema at once, with no index in between and no real contract about which components it's actually allowed to use.
Describe the solution you'd like
An opt-in catalog mode built on progressive disclosure, with two tiers:
A manifest that's always in the prompt: just each item's name and a short description. No full schemas, so it stays cheap at any catalog size.
An on-demand body: a loadCatalogItems tool the model calls to pull the exact schema and examples for the components it's about to use, before it emits any A2UI. Load before use.
The host registers loadCatalogItems and resolves it against the in-process catalog. If the model asks for a name that doesn't exist, it gets a structured error back and can self-correct on the next turn instead of emitting something unrenderable. The current full-schema behavior stays the default; this is opt-in.
Why I think this helps:
Fewer tokens: the per-turn prompt only carries the manifest, so input drops sharply. Full schemas are paid once, on demand, and only for what's actually used.
More purposeful context, not just less of it. With "load before use" the prompt only ever holds the components actually in play, instead of every schema at once. That's cheaper, but the bigger win is signal: a focused, high-signal context is easier for the model to reason over than a large one padded with schemas it will never use. There's good research behind this (see Additional context). On top of that it makes the Gemini trying to render items that are not in the catalog #771 failure structural to avoid, since the model has to name a real component and receive its real schema before it can use it, rather than being kept in line by prompt wording alone.
It scales to large and custom catalogs, which is exactly where the current approach hurts most.
This does depend on the catalog id being available in the prompt, since createSurface needs it.
Describe alternatives you've considered
Keep inlining the full schema and lean on prompt wording to keep the model in bounds (today's approach, and Gemini trying to render items that are not in the catalog #771's fix). Doesn't scale on tokens, and gives no guarantee against invented components.
Trim the inlined schema heuristically, e.g. only the "likely" items. Fragile, guesses intent, and still paid on every turn.
Retrieval / RAG over catalog items. Heavier infrastructure than a deterministic tool call for an in-process catalog.
A static, name-only allow-list in the prompt. Tells the model the names but not how to use them, so the schema still has to live somewhere.
Additional context
This mirrors Anthropic's Agent Skills progressive disclosure: name and description are always loaded, and the full body is loaded on demand.
Why "purposeful loading" helps beyond saving tokens: there's evidence that excess or irrelevant context measurably degrades LLM reasoning (Shi et al., 2023), and that models use information worse as it gets buried in a longer context — the "lost in the middle" effect (Liu et al., 2023). The practitioner takeaway, from both Anthropic and Google, is to keep prompts high-signal and pull detail in just-in-time via tools rather than front-loading everything (Anthropic, Gemini function calling).
Initial validation (small, directional): on the simple_chat custom catalog with gemini-flash-latest, incremental used ~60% fewer tokens at the same expectation pass-rate. There is a latency spike from the extra loadCatalogItems round-trip, so it trades latency for tokens. A fuller eval across more prompts and models would firm this up.
Is your feature request related to a problem? Please describe.
Right now
PromptBuilderinlines the entire A2UI schema into the system prompt on every turn. Every catalog item's full schema goes in, combined into one bigoneOf. As catalogs grow, and especially for custom catalogs, that creates two problems.First, token cost. The whole schema is re-sent on every request, so even a modest 16-item custom catalog already runs well over 10k prompt tokens, and it scales linearly with the catalog. We pay that on every turn.
Second, and more important, the model still uses components that aren't in the catalog. Even with everything inlined it will invent something like
Columnfor a single-item custom catalog, and the SDK throwsCatalogItemNotFoundException(#771). That issue was closed by removing the hardcoded standard-catalog examples from the prompt strings, but there's still nothing structural keeping the model to items that actually exist. The reporter put it well: using the SDK with a custom catalog from scratch is "hard / not possible."Underneath both is the same thing: the model gets every schema at once, with no index in between and no real contract about which components it's actually allowed to use.
Describe the solution you'd like
An opt-in catalog mode built on progressive disclosure, with two tiers:
loadCatalogItemstool the model calls to pull the exact schema and examples for the components it's about to use, before it emits any A2UI. Load before use.The host registers
loadCatalogItemsand resolves it against the in-process catalog. If the model asks for a name that doesn't exist, it gets a structured error back and can self-correct on the next turn instead of emitting something unrenderable. The current full-schema behavior stays the default; this is opt-in.Why I think this helps:
This does depend on the catalog id being available in the prompt, since
createSurfaceneeds it.Describe alternatives you've considered
Additional context
gemini-flash-latest, incremental used ~60% fewer tokens at the same expectation pass-rate. There is a latency spike from the extraloadCatalogItemsround-trip, so it trades latency for tokens. A fuller eval across more prompts and models would firm this up.