feat(commit): align functionality with java paimon commit#397
feat(commit): align functionality with java paimon commit#397lucasfang wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot couldn't run its full agentic review because no GitHub Actions runner was available. Make sure your repository has a runner available to run Copilot's review, or add a copilot-setup-steps.yml file specifying one with the runs-on attribute. See the docs for more details.
This PR expands commit conflict detection and index-file handling in FileStoreCommitImpl, adds row-id/write-column conflict checking for data-evolution scenarios, and updates/extends unit tests accordingly.
Changes:
- Add
ConflictDetection,ManifestEntryChanges, andRowIdColumnConflictCheckerto centralize commit change collection and conflict checks (including DV/global index considerations). - Support committing index files during overwrite / filter-and-overwrite flows, and adjust commit-kind selection (e.g., DV index append => overwrite kind).
- Update commit retry behavior and add/adjust tests for index files, retries, and PK table commit behavior.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/operation/file_store_commit_test.cpp | Adds test validating DV index append triggers overwrite commit kind |
| src/paimon/core/operation/file_store_commit_impl_test.cpp | Extends commit tests for index files, retry metrics, and conflict checking signature changes |
| src/paimon/core/operation/file_store_commit_impl.h | Introduces new conflict/index APIs and wires in ConflictDetection |
| src/paimon/core/operation/file_store_commit_impl.cpp | Implements index-aware overwrite, retry commit checking, and centralized change collection |
| src/paimon/core/operation/file_store_commit.cpp | Removes prior PK-commit restriction logic in Create |
| src/paimon/core/operation/commit/row_id_column_conflict_checker.h | Adds row-id/write-column overlap conflict checker interface |
| src/paimon/core/operation/commit/row_id_column_conflict_checker.cpp | Implements row-id/write-column overlap conflict checker |
| src/paimon/core/operation/commit/manifest_entry_changes.h | Adds helper to collect manifest + index changes from commit messages |
| src/paimon/core/operation/commit/manifest_entry_changes.cpp | Implements change collection and “changed partitions” logic |
| src/paimon/core/operation/commit/manifest_entry_changes_test.cpp | Adds unit tests for ManifestEntryChanges |
| src/paimon/core/operation/commit/conflict_detection.h | Adds conflict detection API covering bucket/key-range/row-id/global-index checks |
| src/paimon/core/operation/commit/conflict_detection.cpp | Implements conflict detection logic and row-id history scan |
| src/paimon/core/operation/commit/conflict_detection_test.cpp | Adds unit test coverage for conflict detection behaviors |
| src/paimon/common/data/binary_row.h | Adds std::hash for tuple key used in conflict detection |
| src/paimon/CMakeLists.txt | Wires new sources + tests into the build |
Comments suppressed due to low confidence (1)
src/paimon/core/operation/file_store_commit.cpp:1
- The PR description is still the unfilled template (e.g., linked issue
#xxx, tests section empty), but this change alters user-visible behavior by removing the previous PK-table commit restriction inFileStoreCommit::Create. Please update the PR title/description to clearly state the behavioral change, link the real issue, and list the tests validating the new PK commit behavior.
/*
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| EXPECT_CALL(*mock_fs, ReadFile(testing::_, testing::_)) | ||
| .Times(testing::AnyNumber()) | ||
| .WillRepeatedly(testing::Invoke([&](const std::string& path, std::string* content) { | ||
| return mock_fs->FileSystem::ReadFile(path, content); | ||
| })); | ||
| EXPECT_CALL(*mock_fs, ReadFile(testing::StrEq(latest_hint), testing::_)) | ||
| .WillRepeatedly(testing::Invoke([](const std::string& path, std::string* content) { | ||
| *content = "-1"; | ||
| return Status::OK(); | ||
| })); |
| ::ArrowSchema arrow_schema; | ||
| ASSERT_TRUE(arrow::ExportSchema(*schema, &arrow_schema).ok()); |
| for (const auto& write_col : file.write_cols.value()) { | ||
| auto field_id_result = ResolveFieldId(file, write_col); | ||
| if (!field_id_result.ok()) { | ||
| return false; | ||
| } | ||
| const auto& field_id = field_id_result.value(); | ||
| if (field_id.has_value() && write_range.field_ids.count(field_id.value()) > 0) { | ||
| return true; | ||
| } | ||
| } |
| Result<bool> commit_result = CommitSnapshotImpl(new_snapshot, delta_statistics); | ||
| if (!commit_result.ok()) { | ||
| // commit exception, not sure about the situation and should not clean up the files. | ||
| PAIMON_LOG_WARN(logger_, "You need call FilterAndCommit to retry commit for exception. %s", | ||
| // commit exception is uncertain; retry after checking whether this commit already exists. | ||
| PAIMON_LOG_WARN(logger_, "Retry commit for exception. %s", | ||
| commit_result.status().ToString().c_str()); | ||
|
|
||
| // To prevent the case where an atomic write times out but actually succeeds, | ||
| // retrying the commit could lead to the snapshot file being committed multiple times. | ||
| // Therefore, retries should be handled by the upper layer, | ||
| // which should call FilterAndCommit to avoid duplicate commits. | ||
| // Therefore, we should not trigger cleanup here, | ||
| // as it may delete meta files from a snapshot that was just written by ourselves, | ||
| // leading to an incomplete or corrupted snapshot. | ||
| guard.Release(); | ||
| return Status::Invalid("You need call FilterAndCommit to retry commit for exception. ", | ||
| commit_result.status().ToString()); | ||
| return false; | ||
| } |
Purpose
Linked issue: close #xxx
Tests
API and Format
Documentation
Generative AI tooling