Skip to content

feat(pr-metrics): Attribute PRs from the seer.pr_created event#116759

Merged
vaind merged 3 commits into
masterfrom
pr-merge-metrics/seer-pr-created-attribution
Jun 4, 2026
Merged

feat(pr-metrics): Attribute PRs from the seer.pr_created event#116759
vaind merged 3 commits into
masterfrom
pr-merge-metrics/seer-pr-created-attribution

Conversation

@vaind

@vaind vaind commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Part of the PR Merge Live Metrics project. Builds on the storage layer from #116586 (CORE-200, now merged).

What

When Seer reports a PR it created — it already emits the seer.pr_created event (SentryAppEventType.SEER_PR_CREATED). This hooks attribution into that existing inbound flow (process_autofix_updates, next to the activity handler — no new transport):

  1. Resolve the org-scoped Repository by name and provider.
  2. Find-or-create the canonical PullRequest row (keyed on PR number). This may run before the SCM opened webhook, so the row can be a shell; we never overwrite title/body the webhook fills in later.
  3. Idempotently record a PullRequestAttribution row with signal type seer_app and source seer_data (signal_details = {run_id, group_id, pr_url}), keyed on (pull_request, signal_type, source) — matching the model's unique constraint — so event redelivery refreshes rather than duplicates.

Why provider-aware resolution

Repository has no (organization_id, name) unique constraint (only (organization_id, provider, external_id)), so an org can legitimately host same-named repos across providers — resolving by name+org alone could attribute to the wrong repo. Seer normalizes its provider (process_repo_provider: strips integrations:, lowercases) while Sentry stores the prefixed form, so we match both shapes (the idiom filter_repo_by_provider already uses) and resolve only on a single match — refusing to guess when Seer sends unknown.

Observability

Structured warnings so upstream issues surface: repo_not_found, repo_ambiguous, and unrecognized_provider (a provider value we don't map — flagged so it can be fixed in Seer).

Rollout / scope

  • Gated behind the organizations:pr-metrics-attribution flag (FlagPole, backend-only). Matures out after rollout.
  • Covers PRs from Seer's own coding pipeline — that's exactly what seer.pr_created reports — recorded as the seer_app signal type. Delegated-agent PRs (Cursor/Copilot/Claude) are out of scope: they flow through Seer's coding-agent state-update path, which does not emit seer.pr_created, so they never reach this handler. Attributing those is separate/later work.
  • The cached PullRequest.attribution (MAX-confidence) projection is deferredrecompute_pull_request_attribution computes it as a read helper, but the cached column isn't persisted yet (by design).

Tests

New tests/sentry/pr_metrics/test_attribution.py (resolution, provider disambiguation, unknown-provider single-match vs ambiguous, idempotency, signal revival, confidence ranking, all three warnings) + operator-flow integration tests behind the feature flag.

Refs CORE-204

@linear-code

linear-code Bot commented Jun 3, 2026

Copy link
Copy Markdown

CORE-204

CW-1460

@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Jun 3, 2026
@vaind vaind force-pushed the pr-merge-metrics/seer-pr-created-attribution branch from 7e174e1 to 988fbef Compare June 3, 2026 11:33
Base automatically changed from gio/pr-merge-metrics/extend-pr-data to master June 3, 2026 12:47
@vaind vaind force-pushed the pr-merge-metrics/seer-pr-created-attribution branch from 988fbef to 0856ca4 Compare June 3, 2026 12:53
@vaind vaind marked this pull request as ready for review June 3, 2026 12:53
@vaind vaind requested a review from a team as a code owner June 3, 2026 12:53
Comment thread src/sentry/pr_metrics/attribution.py Outdated
@getsentry getsentry deleted a comment from github-actions Bot Jun 3, 2026

This comment was marked as resolved.

@vaind vaind force-pushed the pr-merge-metrics/seer-pr-created-attribution branch from 0856ca4 to 407fb82 Compare June 3, 2026 13:02
@vaind vaind requested a review from giovanni-guidini June 3, 2026 13:04
Comment thread src/sentry/pr_metrics/attribution.py
@vaind

This comment was marked as outdated.

@vaind vaind force-pushed the pr-merge-metrics/seer-pr-created-attribution branch from 407fb82 to 0d38d14 Compare June 3, 2026 13:43
) -> None:
"""Attribute PRs reported by Seer's ``seer.pr_created`` event to the Seer app.

For each reported PR: resolve the org-scoped ``Repository`` by name and

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why on earth is it more than one?

@vaind vaind Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same name on gitlab & github for example

@cursor cursor Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 0d38d14. Configure here.

Comment thread src/sentry/pr_metrics/attribution.py
Comment thread src/sentry/pr_metrics/attribution.py
Comment thread src/sentry/seer/entrypoints/operator.py
When Seer reports a PR it created (own or delegated coding agent) via the
existing seer.pr_created event, attribute it in Sentry: resolve the
org-scoped Repository by name + provider, find-or-create the canonical
PullRequest row (keyed on PR number), and idempotently record a seer_app
PullRequestAttribution signal (signal_details = run_id, group_id, pr_url).
Hooks into the existing process_autofix_updates flow next to the activity
handler; no new transport.

Repository resolution takes provider into account because an org can host
same-named repos across providers; it resolves only on a single match and
refuses to guess when Seer sends an unknown provider. Unresolvable repos
and unrecognized provider values are logged so they can be fixed upstream.

Gated behind the organizations:pr-metrics-attribution flag for rollout.
Scoped to the seer_app source only; delegated-agent classification and the
cached PullRequest.attribution projection land separately.

Refs CORE-204

Co-Authored-By: Claude <noreply@anthropic.com>
@vaind vaind force-pushed the pr-merge-metrics/seer-pr-created-attribution branch from 0d38d14 to f84a7d8 Compare June 3, 2026 14:03
Comment thread src/sentry/pr_metrics/attribution.py Outdated
Comment thread src/sentry/pr_metrics/attribution.py Outdated
detected signal is preserved as its own ``PullRequestAttribution`` row rather
than collapsed into a single field.

See the architecture doc ("PR Metrics — Architecture Overview", §Attribution

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] Docs get stale fast... I'd remove this paragraph I think

Comment thread src/sentry/pr_metrics/attribution.py Outdated
)
record_attribution_signal(
pull_request=pull_request,
signal_type=PullRequestAttributionSignalType.SEER_APP,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we confident that this is indeed the app used, and not the SENTRY_APP?
Or does it not matter?

@vaind vaind Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm even thinking that we should drop the distinction and just always put Sentry in there, instead of trying to figure out who is using Sentry and who is on Seer app

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's fair. For our purposes it doesn't matter much, to be honest.

Truth be told I think these days it might be just the SENTRY_APP anyway

vaind and others added 2 commits June 4, 2026 10:26
Address PR review feedback on the attribution resolver docs:

- Use "attributions" instead of "signals" for naming consistency.
- Drop the stale architecture-doc reference paragraph.
- Document why the seer.pr_created path always records SEER_APP: Seer
  picks between the Sentry and Seer GitHub apps at push time, but the
  payload doesn't carry which one it used, so a faithful SEER_APP vs
  SENTRY_APP split is deferred until that app kind is threaded through.

Refs CORE-204
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Seer picks between the Sentry and Seer GitHub apps at push time, but the
seer.pr_created payload doesn't say which one it used. Until Seer threads
that app kind through, default to SENTRY_APP rather than SEER_APP so the
attribution reflects the more common case.

Refs CORE-204
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vaind vaind enabled auto-merge (squash) June 4, 2026 08:38
@vaind vaind merged commit 552fb5c into master Jun 4, 2026
85 checks passed
@vaind vaind deleted the pr-merge-metrics/seer-pr-created-attribution branch June 4, 2026 08:46
giovanni-guidini added a commit that referenced this pull request Jun 4, 2026
…webhook (#116834)

Part of the [PR Merge Live
Metrics](https://linear.app/getsentry/project/pr-merge-live-metrics-6a9efd473801/overview)
project. Builds on the schema from #116586 (CORE-200) and the
attribution write helper from #116759 (CORE-204), both now merged.

## What

Hooks a new processor into
`PullRequestEventWebhook.WEBHOOK_EVENT_PROCESSORS` that runs on every
GitHub `pull_request` webhook. On `action=opened` it detects two
attribution signals and writes `PullRequestAttribution` rows via
`record_attribution_signal()`:

1. **App ID match** — compares `pull_request.user.id` against
`SEER_AUTOFIX_GITHUB_APP_USER_ID` and `SENTRY_GITHUB_APP_USER_ID` to
produce `SEER_APP` / `SENTRY_APP` attributions respectively.
2. **Referenced issue** — scans the PR title and body for Sentry issue
short IDs (`Fixes PROJ-123`) and sentry.io URLs (`Fixes
https://....sentry.io/issues/456`) via the existing
`find_referenced_groups` utility, and writes a `REFERENCED_ISSUE`
attribution with matched group IDs in `signal_details`.

Both signals are independent — a Seer-opened PR that also references an
issue produces two rows. All writes are idempotent and race-safe via
`record_attribution_signal()` (keyed on `(pull_request, signal_type,
source)`, matching the unique constraint). The processor is gated behind
the `organizations:pr-metrics-attribution` FlagPole flag (registered by
#116759).

Refs CORE-216

---------

Co-authored-by: Claude <noreply@anthropic.com>
vaind added a commit that referenced this pull request Jun 5, 2026
…h) (#116842)

Part of the [PR Merge Live
Metrics](https://linear.app/getsentry/project/pr-merge-live-metrics-6a9efd473801/overview)
project. Builds on the schema (#116586, CORE-200), the `seer.pr_created`
attribution path (#116759, CORE-204), and the webhook attribution
processor (#116834, CORE-216).

## What

On a tracked PR's **close/merge**, Sentry now emits a provisional
`pr_metrics.row` analytics event directly — no Seer judge, no
round-trip. This is the "easy path" of the judge-gated emission design
(the judge path is CORE-217). A PR is **tracked** once it has ≥1 valid
`PullRequestAttribution` row (`is_valid=true`); untracked PRs emit
nothing.

The row carries only what Sentry already holds — no SCM fetch, no PR
text:

- lifecycle: `close_action`, `head_commit_sha`, `merge_commit_sha`,
`opened_at` / `closed_at` / `merged_at`, `draft`
- payload-derived counters: `additions`, `deletions`, `files_changed`,
`commits_count`, `comments_count`, `review_comments_count`,
`is_assigned`
- `attributions`: the point-in-time snapshot of valid attribution
signals (`{signal_type, source, signal_details}`), ordered by
attribution priority (highest-confidence first)
- `verdict`: `None` for now (verdicts arrive with judges)

Emission is gated by the new `organizations:pr-metrics-emit` FlagPole
flag, and is **stateless** — it does not dedupe webhook redeliveries (a
DB-side, pre-fork guard keyed on the terminal event is CORE-227; the
judge round-trip is CORE-217).

Transport is `sentry.analytics` → the existing analytics → BigQuery
pipeline. The schema is intentionally provisional/additive; the
consumable schema is finalized in M5 (CORE-223).

## Architecture

Consolidates all pr_metrics GitHub webhook handling into
`src/sentry/pr_metrics/webhooks.py`, following the code_review pattern.
Two **independent** processors are registered on
`PullRequestEventWebhook.WEBHOOK_EVENT_PROCESSORS`:

- `handle_attribution` — GH-App-author + referenced-issue signals
(relocated from `integrations/github/pr_metrics_webhook_processors.py`,
no behavior change)
- `handle_emission` — the close/merge metrics row

They're separate rather than one routing function so the webhook loop
isolates each in its own `try/except` (a failure in one can't suppress
the other), and each carries its own feature flag and action gate.
Domain logic lives in `pr_metrics/attribution.py` and
`pr_metrics/emit.py`; `pr_metrics/webhooks.py` is the GitHub entry
point.

> [!NOTE]
> This relocates the CORE-216 attribution webhook handler (a pure move,
logic unchanged) from `integrations/github/` into `pr_metrics/`.

Refs CORE-221

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants