Skip to content

Add Crr Cascade capabilities to backbeat crr replication#2747

Open
SylvainSenechal wants to merge 1 commit into
development/9.5from
improvement/BB-767
Open

Add Crr Cascade capabilities to backbeat crr replication#2747
SylvainSenechal wants to merge 1 commit into
development/9.5from
improvement/BB-767

Conversation

@SylvainSenechal

@SylvainSenechal SylvainSenechal commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Issue: BB-767

Related PRs :
Arsenal : scality/Arsenal#2628
Cloudserver : scality/cloudserver#6179
CloudserverClient : scality/cloudserverclient#24
S3utils : scality/s3utils#395

@bert-e

bert-e commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Hello sylvainsenechal,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Available options
name description privileged authored
/after_pull_request Wait for the given pull request id to be merged before continuing with the current one.
/bypass_author_approval Bypass the pull request author's approval
/bypass_build_status Bypass the build and test status
/bypass_commit_size Bypass the check on the size of the changeset TBA
/bypass_incompatible_branch Bypass the check on the source branch prefix
/bypass_jira_check Bypass the Jira issue check
/bypass_peer_approval Bypass the pull request peers' approval
/bypass_leader_approval Bypass the pull request leaders' approval
/approve Instruct Bert-E that the author has approved the pull request. ✍️
/create_pull_requests Allow the creation of integration pull requests.
/create_integration_branches Allow the creation of integration branches.
/no_octopus Prevent Wall-E from doing any octopus merge and use multiple consecutive merge instead
/unanimity Change review acceptance criteria from one reviewer at least to all reviewers
/wait Instruct Bert-E not to run until further notice.
Available commands
name description privileged
/help Print Bert-E's manual in the pull request.
/status Print Bert-E's current status in the pull request TBA
/clear Remove all comments from Bert-E from the history TBA
/retry Re-start a fresh build TBA
/build Re-start a fresh build TBA
/force_reset Delete integration branches & pull requests, and restart merge process from the beginning.
/reset Try to remove integration branches unless there are commits on them which do not appear on the source branch.

Status report is not available.

Comment thread package.json Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
  • package.json:54 — @scality/cloudserverclient uses a local file path (file:../cloudserverclient/...). Must be changed to a proper registry or git-pinned reference before merge.
    - ReplicateObject.js:6 — checkCrrCascadeEvent and getMicroVersionId() do not appear to exist in arsenal 8.3.9. Arsenal version bump likely needed.
    - ReplicateObject.js:743 — Any 409 from destination putMetadata is assumed to be cascade-stale and marked COMPLETED. Consider using a more specific signal to avoid silently skipping replication if 409 is returned for other reasons.

    Review by Claude Code

@SylvainSenechal SylvainSenechal marked this pull request as ready for review June 3, 2026 16:15
Comment thread extensions/replication/tasks/ReplicateObject.js

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can functional tests instead of just these,
But waiting for Arsenal/cloudserver to be merged, as it will be easier to make these tests (functional tests in backbeat rely on an image of cloudserver)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keeping unit test is good, functional test should just be an addition?

Comment thread package.json Outdated
@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.05%. Comparing base (fa0f64c) to head (d1bf122).

Additional details and impacted files

Impacted file tree graph

Files with missing lines Coverage Δ
extensions/replication/tasks/ReplicateObject.js 93.20% <100.00%> (+0.92%) ⬆️

... and 3 files with indirect coverage changes

Components Coverage Δ
Bucket Notification 80.22% <ø> (ø)
Core Library 80.93% <ø> (-0.66%) ⬇️
Ingestion 70.13% <ø> (ø)
Lifecycle 79.06% <ø> (ø)
Oplog Populator 85.83% <ø> (ø)
Replication 62.28% <100.00%> (+0.71%) ⬆️
Bucket Scanner 85.76% <ø> (ø)
@@                 Coverage Diff                 @@
##           development/9.5    #2747      +/-   ##
===================================================
- Coverage            75.23%   75.05%   -0.18%     
===================================================
  Files                  200      200              
  Lines                13764    13815      +51     
===================================================
+ Hits                 10355    10369      +14     
- Misses                3399     3436      +37     
  Partials                10       10              
Flag Coverage Δ
api:retry 9.09% <0.00%> (-0.04%) ⬇️
api:routes 8.87% <0.00%> (-0.04%) ⬇️
bucket-scanner 85.76% <ø> (ø)
ft_test:queuepopulator 9.09% <0.00%> (-1.88%) ⬇️
ingestion 12.32% <0.00%> (-0.05%) ⬇️
lib 7.77% <0.00%> (-0.05%) ⬇️
lifecycle 18.82% <0.00%> (-0.08%) ⬇️
notification 1.01% <0.00%> (-0.01%) ⬇️
oplogPopulator 0.14% <0.00%> (-0.01%) ⬇️
replication 18.79% <39.34%> (+0.03%) ⬆️
unit 53.62% <85.24%> (+0.42%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown
  • package.json:54 — @scality/cloudserverclient uses a local file: path that will break CI and other developers. Must be changed to a proper npm or git-tag reference before merge.
    - extensions/replication/tasks/ReplicateObject.js:33-35 — Cascade sentinel objects (cascadeLoopDetected, cascadeDataComplete, partAlreadyAtDest) are plain {} singletons that pass through BackbeatTask.retry(), which mutates errors. Consider Object.freeze() to prevent accidental mutation.

    The cascade logic itself (putData/putMetadata detection, allPartsAlreadyAtDest propagation through the waterfall, _handleReplicationOutcome treating cascade signals as COMPLETED) is sound. The retry wrapper correctly passes cascade signals through without retrying (no retryable property). Tests cover the key scenarios well.

    Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread package.json Outdated
Comment thread package.json Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
@claude

claude Bot commented Jun 5, 2026

Copy link
Copy Markdown
  • ReplicateObject.js:602 — Cascade sentinel objects (cascadeLoopDetected, cascadeDataComplete) are passed as err to callbacks. Verify that BackbeatTask._retry won't retry on these truthy-but-non-error sentinels, which would defeat the loop/stale detection.
    - ReplicateObject.js:433 — On error path in _getAndPutData, destLocations may contain partial results. The sentinel objects ({}) pass the filter and could reach _deleteOrphans with undefined keys. Likely harmless due to downstream filtering, but worth a defensive check.
    - ReplicateObject.js:1014 — _processQueueEntryRetryFull now passes allPartsAlreadyAtDest as mdOnly to _putMetadata, changing retry-full semantics from always-full to conditionally-metadata-only. Confirm this is intentional.
    - package.json:57 — Arsenal pinned to a raw commit hash instead of a tag. Should be updated to a tag once the Arsenal PR merges.
    - package.json:54 — scality-cloudserverclient-v1.0.9.tgz vendored as a binary blob in git. Prefer a registry or git-tag reference.
    - ReplicateObject.js:573 — Trailing whitespace.

    Review by Claude Code

Comment thread package.json Outdated
"@smithy/node-http-handler": "^3.3.3",
"JSONStream": "^1.3.5",
"arsenal": "git+https://github.com/scality/arsenal#8.3.9",
"arsenal": "git+https://github.com/scality/Arsenal#2c429ab35a5ac82c3dafa5a0296a49a23a9c8a4a",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arsenal is pinned to a raw commit hash (2c429ab...) instead of a semantic version tag. Per project conventions, git-based deps (arsenal, vaultclient, etc.) should pin to tags (e.g. #8.x.y). Commit hashes are opaque — it's unclear which features/fixes are included, and there's no semver contract. This also makes it harder for reviewers and operators to reason about what changed.

— Claude Code

Comment thread package.json Outdated
@SylvainSenechal SylvainSenechal requested review from a team, benzekrimaha and maeldonn June 8, 2026 15:47
@scality scality deleted a comment from bert-e Jun 9, 2026
@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown
  • package.json:54 — @scality/cloudserverclient v1.0.9 is installed from a binary tarball committed to the repo. Binary blobs are not auditable in code review. Should be published to a registry or referenced by git tag/commit.
    - package.json:57 — Arsenal is pinned to a bare commit hash instead of a tag. The yarn.lock also resolves to a different commit (83682d1...) than what package.json specifies (f6b6e2a...), creating an inconsistency that may cause different installs depending on whether the lockfile is used.

    The cascade replication logic in ReplicateObject.js (version ID collision handling, microVersionId comparison, loop/stale detection, and the _handleReplicationOutcome refactoring) looks correct. The new tests provide good coverage of the cascade paths.

    Review by Claude Code

Comment thread package.json Outdated
Comment thread package.json Outdated
@claude

claude Bot commented Jun 17, 2026

Copy link
Copy Markdown
  • package.json:57 — Arsenal dependency pinned to commit hash instead of a tag. Git-based deps should pin to tags per project conventions.
    - package.json:54 — cloudserverclient changed from registry package to a local .tgz tarball committed to the repo. Binary files permanently inflate git history and cannot be audited. Publish to the registry before merging.
    - The cascade replication logic (VersionIdCollisionException handling, microVersionId comparison, loop/stale detection, and outcome routing) looks correct. Sentinel objects with identity comparison, non-retryable error propagation through the waterfall, and the allPartsAlreadyAtDest flag all work as expected. Tests cover the key paths well.

    Review by Claude Code

Comment thread package.json Outdated
Comment thread package.json Outdated
@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown

PR Review Summary — Add CRR Cascade capabilities

The cascade replication logic (VersionIdCollision, microVersionId comparison, loop/stale detection) is well-structured. The sentinel-based flow through _handleReplicationOutcome correctly marks cascade loop and stale cases as COMPLETED without retry. Stream cleanup on VersionIdCollisionException is proper (abort + destroy before callback). Tests cover the key scenarios thoroughly.

Issues found:
- package.json:54 — @scality/cloudserverclient vendored as a binary tarball committed to the repo. Binary blobs can't be audited, inflate git history permanently, and bypass registry supply chain checks. Publish to a registry or reference a git tag/commit instead.
- package.json:57 — arsenal pinned to a bare commit hash instead of a release tag. Project convention requires git-based deps to pin to tags.

Review by Claude Code

Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread package.json Outdated
Comment thread package.json Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js
@scality scality deleted a comment from claude Bot Jun 22, 2026
@scality scality deleted a comment from claude Bot Jun 22, 2026
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
Comment thread extensions/replication/tasks/ReplicateObject.js Outdated
VersioningRequired: true,
RequestUids: log.getSerializedUids(),
});
const putCommand = attachExpectContinueMiddleware(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create the follow-up ticket @SylvainSenechal ?

err, sourceEntry, destEntry, kafkaEntry, log, done));
}

_handleReplicationOutcome(err, sourceEntry, destEntry, kafkaEntry,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section has quite a bit of nested conditional logic and duplicate checks that make it hard to read and maintain.

Could we flatten this using guard clauses (early returns) and abstract the err.XYZ || err.name === 'XYZ' checks into a helper function? It would drastically reduce the cognitive load of this function. Let me know if you want to pair on it!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its trash code and the diff is hard to read with the last else

I just changed it and tried something that i didn't want to do first but i think it's fine : for each condition, directly publish/return, without having to do it at the end of the function. I believe its quite readable this way

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could still use some refactor maybe, although all the erro don't have the same form, I think the diff is reasonnable here

this._getAndPutPart(sourceEntry, destEntry, part, log, done);
}, (err, destLocations) => {
}, (err, partResults) => {
const destLocations = (partResults || [])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you extract this change into its own commit so you can explain the 'why' in the commit description ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I added some comments to clarify this a bit

});
}

_resolveVersionIdCollision(collisionErr, sourceEntry, destEntry, log) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logical flow here using early returns is great, but the function feels a bit hard to read due to the repetitive log metadata block being copied into every if statement.

We can dramatically clean this up by extracting method and destEntry.getLogInfo() into a single const logMeta object at the top of the function, then passing it directly to log.info/log.error. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true the whole function is just logs

@SylvainSenechal SylvainSenechal force-pushed the improvement/BB-767 branch 2 times, most recently from 42ef0e4 to fd87ff8 Compare June 25, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants