fix(backend): prevent duplicate index jobs from indexedAt race#1298
Conversation
onJobCompleted marked the index job COMPLETED in one write, then ran several git reads (isRepoEmpty, getCommitHashForRefName, getLatestCommitTimestamp, getLocalDefaultBranch), then updated repo.indexedAt in a separate write. During the git-read window the job was already COMPLETED but indexedAt was still stale, so the scheduler (scheduleIndexJobs) saw no active job and an out-of-date indexedAt and scheduled a duplicate index job. On large repos the window is long enough to be hit routinely by the 1s scheduler poll. Run the git reads first, then write status=COMPLETED and indexedAt together in a single repoIndexingJob.update (nested repo update). Now the job stays IN_PROGRESS until the moment indexedAt becomes fresh, so the scheduler's two guards can never both pass at once. Fixes SOU-1150 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughRefactored ChangesNested Prisma Update Pattern
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fixes SOU-1150
Problem
Large repositories could be indexed twice within a single reindex interval due to a race in
RepoIndexManager.onJobCompleted(packages/backend/src/repoIndexManager.ts).The handler did three things in order:
COMPLETED(one DB write, which also flipsrepo.latestIndexingJobStatus).isRepoEmpty,getCommitHashForRefName,getLatestCommitTimestamp,getLocalDefaultBranch.repo.indexedAt(a separate DB write).Between steps 1 and 3 there is a window — the duration of the git reads in step 2 — where the job is already
COMPLETEDbutindexedAtis still stale. The scheduler (scheduleIndexJobs) polls everyreindexRepoPollingIntervalMs(default 1s) and creates a new index job when both:indexedAt IS NULL OR indexedAt < now - reindexIntervalMs(still true — step 3 hasn't run), andINDEXjob inPENDING/IN_PROGRESS(now true — step 1 already flipped it toCOMPLETED).So a poll landing in that window schedules a duplicate index job. On large repos the git reads take long enough that the 1s poll hits the window routinely; the Redlock doesn't help because
onJobCompletedruns as a BullMQcompletedevent handler, after the lock fromprocessJobhas already been released (and the duplicate job row is created regardless of execution serialization).Fix
Run the git reads first, then write
status = COMPLETEDandindexedAt(plusindexedCommitHash,pushedAt, metadata,defaultBranch) together in a singlerepoIndexingJob.updatewith a nestedrepoupdate.Now the job remains
IN_PROGRESSfor the entire git-read window and only flips toCOMPLETEDat the same instantindexedAtbecomes fresh. The scheduler's two guards can never both pass simultaneously: before the write, the active-job guard blocks; after it, the freshindexedAtguards.Verification
Reproduced the original bug locally by injecting a 5s sleep between the
COMPLETEDwrite and theindexedAtupdate — a single repo cascaded into 4 index jobs within ~11s. After the fix, with the sleep moved to before the combined write (so the job staysIN_PROGRESS), no duplicate is scheduled.yarn workspace @sourcebot/backend buildpasses.yarn workspace @sourcebot/backend test --run repoIndexManager— 15/15 pass. The "updates repo.indexedAt" test now assertsstatus: COMPLETEDandindexedAtare written in the samerepoIndexingJob.updatecall, which serves as the regression guard.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes