ci(e2e): classify staging e2e results into a structured digest#8760
ci(e2e): classify staging e2e results into a structured digest#8760jacekradko wants to merge 1 commit into
Conversation
🦋 Changeset detectedLatest commit: 1510b55 The changes in this PR will be included in the next version bump. This PR includes changesets to release 0 packagesWhen changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Repository UI (inherited) Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Comment |
3d2c6ea to
db5acc0
Compare
The report job posted a bare red circle on any gating failure with no detail. Add a classifier that parses every leg's Playwright JSON report and buckets each failed test by error signature: FAPI 429 and handshake/JWT clock-skew are infra-flake, everything else (unknown signatures included, never the reverse) is a candidate regression, and passed-on-retry tests are reported as flaky. The report job now downloads the per-leg reports, writes the classified digest to the job summary on every run (so the informational generic leg's breakdown is always visible), and posts that same digest to Slack on a gating-leg failure instead of the old opaque message. Reporting never fails the build. Deferred: persisting per-test history to alert on new-vs-sustained failures (which would let the informational generic leg's regressions page a human), and wiring the clerk_go commit status to the smoke gate.
ac9a8f8 to
1510b55
Compare
The report job posted a bare red circle on any gating failure with no detail about what actually broke. This adds a classifier that turns each run into a structured digest.
It parses every leg's Playwright JSON report (uploaded since #8756) and buckets each failed test by error signature: FAPI 429 and handshake/JWT clock-skew are infra-flake; everything else is a candidate regression; and passed-on-retry tests are reported as flaky. Unknown signatures default to candidate regression, never the reverse, so a real break can't hide as infra. The report job writes the digest to the job summary on every run (so the informational
genericleg's breakdown is always visible) and posts that same digest to Slack on a gating-leg failure. The digest looks like:The classifier is a standalone Node script with unit tests, and it never fails the build (an empty or unreadable reports directory just yields an "all green" digest). The Slack trigger is unchanged (
needs.integration-tests.result == 'failure', i.e. a gating-leg failure), so this is a strictly richer message, not a noisier one.Two pieces are deliberately deferred: persisting per-test history to distinguish new from sustained failures (which would let the informational
genericleg's regressions page a human rather than just appear in the summary), and wiring the clerk_go commit status to the smoke gate, which the plan says to hold until the smoke leg is demonstrably green. Stacked on #8759.