Skip to content

ci(e2e): classify staging e2e results into a structured digest#8760

Open
jacekradko wants to merge 1 commit into
jacek/staging-e2e-smoke-legfrom
jacek/staging-e2e-reporting
Open

ci(e2e): classify staging e2e results into a structured digest#8760
jacekradko wants to merge 1 commit into
jacek/staging-e2e-smoke-legfrom
jacek/staging-e2e-reporting

Conversation

@jacekradko
Copy link
Copy Markdown
Member

The report job posted a bare red circle on any gating failure with no detail about what actually broke. This adds a classifier that turns each run into a structured digest.

It parses every leg's Playwright JSON report (uploaded since #8756) and buckets each failed test by error signature: FAPI 429 and handshake/JWT clock-skew are infra-flake; everything else is a candidate regression; and passed-on-retry tests are reported as flaky. Unknown signatures default to candidate regression, never the reverse, so a real break can't hide as infra. The report job writes the digest to the job summary on every run (so the informational generic leg's breakdown is always visible) and posts that same digest to Slack on a gating-leg failure. The digest looks like:

:red_circle: Staging E2E: gating failure
_ref main · sdk latest · clerk_go abc1234_
• ❌ smoke: 1 candidate regression
• ❌ generic (informational): 2 candidate regressions, 14 infra-flake, 5 flaky

The classifier is a standalone Node script with unit tests, and it never fails the build (an empty or unreadable reports directory just yields an "all green" digest). The Slack trigger is unchanged (needs.integration-tests.result == 'failure', i.e. a gating-leg failure), so this is a strictly richer message, not a noisier one.

Two pieces are deliberately deferred: persisting per-test history to distinguish new from sustained failures (which would let the informational generic leg's regressions page a human rather than just appear in the summary), and wiring the clerk_go commit status to the smoke gate, which the plan says to hold until the smoke leg is demonstrably green. Stacked on #8759.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 5, 2026

🦋 Changeset detected

Latest commit: 1510b55

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 0 packages

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
clerk-js-sandbox Ready Ready Preview, Comment Jun 5, 2026 11:44am

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 5, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Repository UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 668abfe9-f9b9-44ee-b057-bb050ffbdfc1

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Comment @coderabbitai help to get the list of available commands and usage tips.

The report job posted a bare red circle on any gating failure with no detail. Add
a classifier that parses every leg's Playwright JSON report and buckets each failed
test by error signature: FAPI 429 and handshake/JWT clock-skew are infra-flake,
everything else (unknown signatures included, never the reverse) is a candidate
regression, and passed-on-retry tests are reported as flaky.

The report job now downloads the per-leg reports, writes the classified digest to
the job summary on every run (so the informational generic leg's breakdown is always
visible), and posts that same digest to Slack on a gating-leg failure instead of the
old opaque message. Reporting never fails the build.

Deferred: persisting per-test history to alert on new-vs-sustained failures (which
would let the informational generic leg's regressions page a human), and wiring the
clerk_go commit status to the smoke gate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant