Skip to content

[python][daft] Make Daft Paimon write sink serializable#8022

Open
kerwin-zk wants to merge 1 commit into
apache:masterfrom
kerwin-zk:fix-daft-write-sink-serializable
Open

[python][daft] Make Daft Paimon write sink serializable#8022
kerwin-zk wants to merge 1 commit into
apache:masterfrom
kerwin-zk:fix-daft-write-sink-serializable

Conversation

@kerwin-zk
Copy link
Copy Markdown
Contributor

@kerwin-zk kerwin-zk commented May 28, 2026

Purpose

PaimonDataSink currently keeps the FileStoreTable and WriteBuilder directly in the sink object. When Daft runs with the Ray runner, the sink needs to be serialized and sent to workers. For OSS/Jindo tables, the table can indirectly hold PyArrow/Jindo filesystem objects that are not picklable, causing Ray serialization failures.

  Checking PaimonDataSink daft.pickle roundtrip ...
  Traceback (most recent call last):
    File ".../test_paimon_daft_filesystem.py", line 72, in <module>
      restored_sink = loads(dumps(PaimonDataSink(table, mode="overwrite")))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ...
    File "<stringsource>", line 2, in _fs.FileSystem.__reduce_cython__
  TypeError: no default __reduce__ due to non-trivial __cinit__

This PR makes the Daft Paimon write sink serializable by only storing reconstructable table state during pickling.

Tests

CI

@YannByron YannByron self-assigned this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants