feat(workflow-executor): use stored OAuth credentials for MCP steps [PRD-367]#1665
Conversation
| }); | ||
|
|
||
| if (result.refreshToken) { | ||
| await this.persistRotatedRefreshToken(credential, result.refreshToken); |
There was a problem hiding this comment.
Claude Opus 4.8 — [Should fix] On the invalid_grant retry path, runGrantWithRotationRetry re-reads latest and grants with it, but this write-back passes the pre-retry credential for the non-token fields (clientId / clientSecret / tokenEndpoint / scopes). refreshTokenEnc and encKeyVersion are correct (from the fresh encrypt), so this is not token/key corruption — but if a peer concurrently re-deposited new config, those fields get reverted: a narrow lost-update on the exact multi-instance path this logic guards. Thread the credential actually used for the successful grant through to the write-back.
There was a problem hiding this comment.
Claude Opus 4.8 — resolved (2b5e5e5). Agreed. runGrantWithRotationRetry now returns the credential whose token produced the grant, and refreshAndCache writes the rotated token back onto that (current) row — so a concurrent re-deposit is no longer partially reverted. Added a regression test asserting the write-back uses the re-read credential fields.
| // Classifies errors surfaced while connecting to or calling an MCP server. The MCP SDK / HTTP | ||
| // transport reports failures in several shapes (a numeric status field, or only a message string), | ||
| // so the checks walk the cause chain and inspect both structured status and the message text. | ||
| const AUTH_STATUSES = new Set([401, 403]); |
There was a problem hiding this comment.
Claude Opus 4.8 — [Should fix] 403 is treated like 401, so a 403 on the tool call forces a refresh and (on a second 403) pauses with needs-oauth-reauth. The spec specifies retry on 401; a 403 is usually insufficient-scope, which re-consent often cannot fix, so the user re-auths back into the same 403. Confirm 403 should drive the reauth flow, or narrow the refresh/reauth trigger to 401 and surface 403 as a normal failure. (Classifying 403 as kind: auth is fine — this is only about the reauth trigger.)
There was a problem hiding this comment.
Claude Opus 4.8 — resolved (8d441d2). Narrowed to 401 per the spec: 403 is no longer treated as a refreshable auth error, so it surfaces as an ordinary failure instead of forcing a refresh + reauth loop. Treating a scope-403 as a re-consent opportunity is a separate, out-of-scope enhancement. isMcpAuthError / classifyMcpLoadError now match 401/unauthorized only; tests updated.
| // transport reports failures in several shapes (a numeric status field, or only a message string), | ||
| // so the checks walk the cause chain and inspect both structured status and the message text. | ||
| const AUTH_STATUSES = new Set([401, 403]); | ||
| const AUTH_PATTERN = /\b40[13]\b|unauthorized|forbidden/i; |
There was a problem hiding this comment.
Claude Opus 4.8 — [Preferential] This message-text fallback can match a bare 401/403 appearing for non-auth reasons in an error body (IDs, counts, timestamps). It only runs when no structured status field is present (those are checked first), so the blast radius is small — flagging as a known fuzzy edge, not a blocker.
There was a problem hiding this comment.
Claude Opus 4.8 — acknowledged, keeping as-is. Deliberate last-resort fallback, used only when no structured status field is present (code/status/statusCode are checked first), so the misclassification window is narrow and low-probability. Tightening the message regex risks false negatives on real 401s that carry the status only in text. Leaving it as the documented fuzzy edge.
On the invalid_grant concurrent-rotation retry path the write-back used the pre-retry credential for the non-token fields; thread the credential whose token produced the grant through so a concurrent re-deposit is not partially reverted. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A 403 is a permission/scope failure that a token refresh or re-consent cannot resolve, so it no longer triggers the refresh + re-auth flow (which looped) and instead surfaces as an ordinary failure. The spec specifies retry on 401. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On the invalid_grant concurrent-rotation retry path the write-back used the pre-retry credential for the non-token fields; thread the credential whose token produced the grant through so a concurrent re-deposit is not partially reverted. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A 403 is a permission/scope failure that a token refresh or re-consent cannot resolve, so it no longer triggers the refresh + re-auth flow (which looped) and instead surfaces as an ordinary failure. The spec specifies retry on 401. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
8d441d2 to
0e05efe
Compare
10 new issues
|
|
Coverage Impact ⬆️ Merging this pull request will increase total coverage on Modified Files with Diff Coverage (20) 🤖 Increase coverage with AI coding...🚦 See full report on Qlty Cloud » 🛟 Help
|
On the invalid_grant concurrent-rotation retry path the write-back used the pre-retry credential for the non-token fields; thread the credential whose token produced the grant through so a concurrent re-deposit is not partially reverted. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A 403 is a permission/scope failure that a token refresh or re-consent cannot resolve, so it no longer triggers the refresh + re-auth flow (which looped) and instead surfaces as an ordinary failure. The spec specifies retry on 401. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
d4555ee to
cf71d69
Compare
On the invalid_grant concurrent-rotation retry path the write-back used the pre-retry credential for the non-token fields; thread the credential whose token produced the grant through so a concurrent re-deposit is not partially reverted. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A 403 is a permission/scope failure that a token refresh or re-consent cannot resolve, so it no longer triggers the refresh + re-auth flow (which looped) and instead surfaces as an ordinary failure. The spec specifies retry on 401. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cf71d69 to
55309c7
Compare
d42828e to
30b063c
Compare
At an oauth2 MCP step the executor looks up the stored credential by (user, server), runs the refresh-token grant against the stored token endpoint behind an expiry-skew cache, injects the bearer token before connecting, retries once on a 401 across list-tools and the tool call, and pauses for re-authentication when no usable credential exists or the refresh is rejected. Bearer and none steps are unchanged. Adds additive auth-error classification to the shared ai-proxy McpClient consumed by this path. Behaviour stays dormant until the orchestrator serves authType and the frontend ships (deploy orchestrator first), so it is safe to deploy alone. Depends on the PR1 credential store. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On the invalid_grant concurrent-rotation retry path the write-back used the pre-retry credential for the non-token fields; thread the credential whose token produced the grant through so a concurrent re-deposit is not partially reverted. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A 403 is a permission/scope failure that a token refresh or re-consent cannot resolve, so it no longer triggers the refresh + re-auth flow (which looped) and instead surfaces as an ordinary failure. The spec specifies retry on 401. Addresses review on #1665. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cutor [PRD-367] PR1 wired an in-memory credential store + deposit endpoint into buildInMemoryExecutor, so the previous "in-memory raises ConfigurationError for oauth2 steps" behavior was inconsistent: a credential could be deposited but never used. Wire an OAuthTokenService into the in-memory runner (sharing the same store instance the deposit endpoint writes to) so oauth2 steps work end-to-end in dev, matching the database executor. The token service is now a required RunnerConfig/RemoteToolFetcher collaborator (both executors provide it), so the unreachable ConfigurationError guard and its fetcher test are removed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tials port [PRD-367] The PR1 rebase moved the credentials store interface + types from stores/mcp-oauth-credentials-store to ports/mcp-oauth-credentials-store (the store file now holds only the Database/InMemory implementations). Import McpOAuthCredentialsStore and StoredMcpOAuthCredential from the new port path so the package compiles against the rebased PR1 base. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion [PRD-367] PR1 dropped enc_key_version from the credential store (no version-aware decrypt path), so the rotation write-back no longer carries encKeyVersion. Per PRD-367 key-rotation handling, a decrypt failure with the encryption key PRESENT (auth-tag mismatch from a since-rotated/hard-swapped key) is recoverable: toGrantParams now classifies it as OAuthReauthRequiredError (needs-oauth-reauth) so re-consent re-deposits under the new key. A missing key (ExecutorEncryptionKeyMissingError) stays terminal — re-consent cannot help and a re-deposit would 503. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ting [PRD-367] New GET /list-mcp-tools?mcpServerId= route on the executor HTTP server for the orchestrator-engine MCP-server details page: resolves the caller's vault credential (user_id from the validated JWT, never the request), refreshes, injects the Bearer, and returns the server's tool definitions — reusing RemoteToolFetcher, no new fetch/refresh logic. A missing/unrefreshable credential returns a typed needs-oauth-reauth (409), not a generic error or empty list. Wired into both the database and in-memory executors so oauth2 tool listing works in dev too. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
55309c7 to
842d468
Compare
|
Agent code review — Claude Opus 4.8 (2026-06-18) Reviewed the folded-in commits Verified:
|
…nses [PRD-367]
A literal JSON null (or other non-object) body from the token endpoint overwrote the {} parse default, so the subsequent payload.error / payload.access_token reads threw a TypeError instead of the typed OAuthRefreshError. Keep the {} default for non-object bodies so the status checks still surface a typed OAuthRefreshError.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

What & why
Executor-side runtime use of stored OAuth credentials for OAuth2-protected MCP servers (PRD-367, PR2). At an
oauth2MCP step the executor looks up the stored credential by(user, server), runs a refresh-token grant against the storedtoken_endpointbehind an expiry-skew cache, injects the Bearer token before connecting, retries once on an upstream 401 (covering both list-tools and the tool call), and pauses the stepawaiting-inputwithawaitingInputReason: needs-oauth-reauthwhen there is no usable credential or the refresh is rejected. Bearer/none steps are unchanged.Stacked PR / deploy ordering
feat/prd-367-pr1-executor-oauth-credentials(PR feat(workflow-executor): add OAuth credential store + deposit endpoint (PRD-367 PR1) #1619), notmain: PR2 depends on PR1's credential store / encryption / deposit endpoint, which is not yet merged. Review the diff against that branch.authType+ accepts the typedawaitingInputReason(PR1.5) and the frontend ships (PR3) — deploy the orchestrator first. Safe to deploy alone: the oauth2 path only activates onceauthType=oauth2is served; bearer/none is unchanged.Notable choices (large-PR annotation)
invalid_grant, not a Postgres row lock — the executor and token endpoints can sit behind a client VPN, so no DB lock is held across the refresh HTTP call. Row lock is the documented hardening path if a strict reuse-detection provider plus real cross-process contention appears.reloadWithFreshAuth) plus a typedOAuthReauthRequiredError -> awaiting-inputmapping; all token/credential/HTTP logic stays behindRemoteToolFetcherand the new OAuth token service.@forestadmin/ai-proxychange (consumed by the agent too) is additive-only (newloadToolsWithFailures/loadRemoteToolsWithFailures+mcp-auth-errorexport) plus behaviour-preserving delegation;authTypeis stripped in theMcpClientconstructor likeid.ConfigurationErrorfor oauth2 steps (no credential store wired).Tests
294 tests across the touched suites (ai-proxy 59 + workflow-executor 235); build, lint, tsc clean; >=90% line/stmt coverage on changed files.
Known limitation
A re-auth at tool-call time leaves the step at
idempotencyPhase=executing, so it needs a manual retry after re-auth; the common list-tools-time re-auth (pre-executor) resumes cleanly.Part of PRD-367.
🤖 Generated with Claude Code
Note
Add stored OAuth credential support for MCP steps in workflow executor
OAuthTokenServiceto retrieve, cache, and refresh OAuth access tokens per (user, MCP server), with mutex-serialized refresh, rotation persistence, andOAuthReauthRequiredErroron unrecoverable invalid_grant.RemoteToolFetcherto inject Bearer tokens for oauth2-typed MCP servers and retry tool loading with a forced token refresh on auth failure.McpStepExecutorto catchOAuthReauthRequiredErrorduring both tool fetching and tool invocation, pausing the step withawaiting-input/needs-oauth-reauthrather than failing it; a single reauth retry is attempted before pausing.loadRemoteToolsWithFailurestoMcpClient,AiClient, and all adapter layers so per-server auth/connection failures are classified and surfaced without aborting the entire tool load.awaitingInputReasonthroughMcpStepOutcomeSchemaandstep-outcome-to-update-step-mapperso the reason reaches the server update request.📊 Macroscope summarized 0e05efe. 21 files reviewed, 0 issues evaluated, 0 issues filtered, 0 comments posted
🗂️ Filtered Issues
No issues evaluated.