feat(pr-metrics): Attribute PRs from the seer.pr_created event#116759
Conversation
7e174e1 to
988fbef
Compare
988fbef to
0856ca4
Compare
This comment was marked as resolved.
This comment was marked as resolved.
0856ca4 to
407fb82
Compare
This comment was marked as outdated.
This comment was marked as outdated.
407fb82 to
0d38d14
Compare
| ) -> None: | ||
| """Attribute PRs reported by Seer's ``seer.pr_created`` event to the Seer app. | ||
|
|
||
| For each reported PR: resolve the org-scoped ``Repository`` by name and |
There was a problem hiding this comment.
Why on earth is it more than one?
There was a problem hiding this comment.
same name on gitlab & github for example
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0d38d14. Configure here.
When Seer reports a PR it created (own or delegated coding agent) via the existing seer.pr_created event, attribute it in Sentry: resolve the org-scoped Repository by name + provider, find-or-create the canonical PullRequest row (keyed on PR number), and idempotently record a seer_app PullRequestAttribution signal (signal_details = run_id, group_id, pr_url). Hooks into the existing process_autofix_updates flow next to the activity handler; no new transport. Repository resolution takes provider into account because an org can host same-named repos across providers; it resolves only on a single match and refuses to guess when Seer sends an unknown provider. Unresolvable repos and unrecognized provider values are logged so they can be fixed upstream. Gated behind the organizations:pr-metrics-attribution flag for rollout. Scoped to the seer_app source only; delegated-agent classification and the cached PullRequest.attribution projection land separately. Refs CORE-204 Co-Authored-By: Claude <noreply@anthropic.com>
0d38d14 to
f84a7d8
Compare
| detected signal is preserved as its own ``PullRequestAttribution`` row rather | ||
| than collapsed into a single field. | ||
|
|
||
| See the architecture doc ("PR Metrics — Architecture Overview", §Attribution |
There was a problem hiding this comment.
[nit] Docs get stale fast... I'd remove this paragraph I think
| ) | ||
| record_attribution_signal( | ||
| pull_request=pull_request, | ||
| signal_type=PullRequestAttributionSignalType.SEER_APP, |
There was a problem hiding this comment.
Are we confident that this is indeed the app used, and not the SENTRY_APP?
Or does it not matter?
There was a problem hiding this comment.
I'm even thinking that we should drop the distinction and just always put Sentry in there, instead of trying to figure out who is using Sentry and who is on Seer app
There was a problem hiding this comment.
I think that's fair. For our purposes it doesn't matter much, to be honest.
Truth be told I think these days it might be just the SENTRY_APP anyway
Address PR review feedback on the attribution resolver docs: - Use "attributions" instead of "signals" for naming consistency. - Drop the stale architecture-doc reference paragraph. - Document why the seer.pr_created path always records SEER_APP: Seer picks between the Sentry and Seer GitHub apps at push time, but the payload doesn't carry which one it used, so a faithful SEER_APP vs SENTRY_APP split is deferred until that app kind is threaded through. Refs CORE-204 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Seer picks between the Sentry and Seer GitHub apps at push time, but the seer.pr_created payload doesn't say which one it used. Until Seer threads that app kind through, default to SENTRY_APP rather than SEER_APP so the attribution reflects the more common case. Refs CORE-204 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…webhook (#116834) Part of the [PR Merge Live Metrics](https://linear.app/getsentry/project/pr-merge-live-metrics-6a9efd473801/overview) project. Builds on the schema from #116586 (CORE-200) and the attribution write helper from #116759 (CORE-204), both now merged. ## What Hooks a new processor into `PullRequestEventWebhook.WEBHOOK_EVENT_PROCESSORS` that runs on every GitHub `pull_request` webhook. On `action=opened` it detects two attribution signals and writes `PullRequestAttribution` rows via `record_attribution_signal()`: 1. **App ID match** — compares `pull_request.user.id` against `SEER_AUTOFIX_GITHUB_APP_USER_ID` and `SENTRY_GITHUB_APP_USER_ID` to produce `SEER_APP` / `SENTRY_APP` attributions respectively. 2. **Referenced issue** — scans the PR title and body for Sentry issue short IDs (`Fixes PROJ-123`) and sentry.io URLs (`Fixes https://....sentry.io/issues/456`) via the existing `find_referenced_groups` utility, and writes a `REFERENCED_ISSUE` attribution with matched group IDs in `signal_details`. Both signals are independent — a Seer-opened PR that also references an issue produces two rows. All writes are idempotent and race-safe via `record_attribution_signal()` (keyed on `(pull_request, signal_type, source)`, matching the unique constraint). The processor is gated behind the `organizations:pr-metrics-attribution` FlagPole flag (registered by #116759). Refs CORE-216 --------- Co-authored-by: Claude <noreply@anthropic.com>
…h) (#116842) Part of the [PR Merge Live Metrics](https://linear.app/getsentry/project/pr-merge-live-metrics-6a9efd473801/overview) project. Builds on the schema (#116586, CORE-200), the `seer.pr_created` attribution path (#116759, CORE-204), and the webhook attribution processor (#116834, CORE-216). ## What On a tracked PR's **close/merge**, Sentry now emits a provisional `pr_metrics.row` analytics event directly — no Seer judge, no round-trip. This is the "easy path" of the judge-gated emission design (the judge path is CORE-217). A PR is **tracked** once it has ≥1 valid `PullRequestAttribution` row (`is_valid=true`); untracked PRs emit nothing. The row carries only what Sentry already holds — no SCM fetch, no PR text: - lifecycle: `close_action`, `head_commit_sha`, `merge_commit_sha`, `opened_at` / `closed_at` / `merged_at`, `draft` - payload-derived counters: `additions`, `deletions`, `files_changed`, `commits_count`, `comments_count`, `review_comments_count`, `is_assigned` - `attributions`: the point-in-time snapshot of valid attribution signals (`{signal_type, source, signal_details}`), ordered by attribution priority (highest-confidence first) - `verdict`: `None` for now (verdicts arrive with judges) Emission is gated by the new `organizations:pr-metrics-emit` FlagPole flag, and is **stateless** — it does not dedupe webhook redeliveries (a DB-side, pre-fork guard keyed on the terminal event is CORE-227; the judge round-trip is CORE-217). Transport is `sentry.analytics` → the existing analytics → BigQuery pipeline. The schema is intentionally provisional/additive; the consumable schema is finalized in M5 (CORE-223). ## Architecture Consolidates all pr_metrics GitHub webhook handling into `src/sentry/pr_metrics/webhooks.py`, following the code_review pattern. Two **independent** processors are registered on `PullRequestEventWebhook.WEBHOOK_EVENT_PROCESSORS`: - `handle_attribution` — GH-App-author + referenced-issue signals (relocated from `integrations/github/pr_metrics_webhook_processors.py`, no behavior change) - `handle_emission` — the close/merge metrics row They're separate rather than one routing function so the webhook loop isolates each in its own `try/except` (a failure in one can't suppress the other), and each carries its own feature flag and action gate. Domain logic lives in `pr_metrics/attribution.py` and `pr_metrics/emit.py`; `pr_metrics/webhooks.py` is the GitHub entry point. > [!NOTE] > This relocates the CORE-216 attribution webhook handler (a pure move, logic unchanged) from `integrations/github/` into `pr_metrics/`. Refs CORE-221 --------- Co-authored-by: Claude <noreply@anthropic.com>

Part of the PR Merge Live Metrics project. Builds on the storage layer from #116586 (CORE-200, now merged).
What
When Seer reports a PR it created — it already emits the
seer.pr_createdevent (SentryAppEventType.SEER_PR_CREATED). This hooks attribution into that existing inbound flow (process_autofix_updates, next to the activity handler — no new transport):Repositoryby name and provider.PullRequestrow (keyed on PR number). This may run before the SCMopenedwebhook, so the row can be a shell; we never overwrite title/body the webhook fills in later.PullRequestAttributionrow with signal typeseer_appand sourceseer_data(signal_details = {run_id, group_id, pr_url}), keyed on(pull_request, signal_type, source)— matching the model's unique constraint — so event redelivery refreshes rather than duplicates.Why provider-aware resolution
Repositoryhas no(organization_id, name)unique constraint (only(organization_id, provider, external_id)), so an org can legitimately host same-named repos across providers — resolving by name+org alone could attribute to the wrong repo. Seer normalizes its provider (process_repo_provider: stripsintegrations:, lowercases) while Sentry stores the prefixed form, so we match both shapes (the idiomfilter_repo_by_provideralready uses) and resolve only on a single match — refusing to guess when Seer sendsunknown.Observability
Structured warnings so upstream issues surface:
repo_not_found,repo_ambiguous, andunrecognized_provider(a provider value we don't map — flagged so it can be fixed in Seer).Rollout / scope
organizations:pr-metrics-attributionflag (FlagPole, backend-only). Matures out after rollout.seer.pr_createdreports — recorded as theseer_appsignal type. Delegated-agent PRs (Cursor/Copilot/Claude) are out of scope: they flow through Seer's coding-agent state-update path, which does not emitseer.pr_created, so they never reach this handler. Attributing those is separate/later work.PullRequest.attribution(MAX-confidence) projection is deferred —recompute_pull_request_attributioncomputes it as a read helper, but the cached column isn't persisted yet (by design).Tests
New
tests/sentry/pr_metrics/test_attribution.py(resolution, provider disambiguation, unknown-provider single-match vs ambiguous, idempotency, signal revival, confidence ranking, all three warnings) + operator-flow integration tests behind the feature flag.Refs CORE-204