[WIP][ray] Ray merge into#8028
Draft
XiaoHongbo-Hope wants to merge 10 commits into
Draft
Conversation
Pythonic MERGE INTO on Ray Datasets, mirroring Spark/Flink merge-into.
UPSERT-flavored clauses (matched-update, not-matched-insert,
not-matched-by-source-update) supported; DELETE raises NotImplementedError
pending KeyValueDataWriter row-kind work.
API:
from pypaimon.ray import merge_paimon
merge_paimon(target, source, catalog_options,
on=[...],
when_matched_update={...},
when_not_matched_insert="*")
Algorithm: read target -> tag _side -> union -> groupby(on).map_groups
to classify matched/not-matched and apply SET; write back via write_paimon
(PK upsert through _SEQUENCE_NUMBER).
Known bugs to fix in follow-up:
- _schema_type_map referenced but never defined (NameError on call)
- for f in batch.schema iterates pa.Schema (TypeError on pyarrow >= 18)
- type-mismatch fallback to pa.null() destroys join keys
- test helper _make_pk_table_with_flag returns 1 value, test unpacks 2
- _schema_type_map called but undefined: NameError on any cross-schema merge. - for f in batch.schema raises TypeError on pyarrow >= 18. - type-mismatch fallback to pa.null() drops join key values. - _make_pk_table_with_flag returned 1 value but caller unpacks 2.
…rop API - pa.Table.drop deprecated in newer pyarrow; switch to drop_columns. - matched branch silently produced cartesian product on multiple source rows. - _required_target_cols_for_passthrough widened projection to all columns when its spec was None, defeating the projection optimization.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Tests