feat(ffi): plumb simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705
feat(ffi): plumb simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705Brijesh-Thakkar wants to merge 5 commits into
simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705Conversation
…hrough FFI_WindowUDF Closes apache#22332 Add documentation, expressions, simplify, and reverse_expr fn pointers to FFI_WindowUDF struct Implement producer-side wrappers for each new fn pointer Wire ForeignWindowUDF trait methods to call the fn pointers Add FFI_ExpressionArgs in a new expression_args.rs sibling module to carry Vec<Arc<dyn PhysicalExpr>> across the boundary Add FFI_ReversedUDWF enum (#[repr(C, u8)]) to represent the three ReversedUDWF variants stably simplify executes on the producer side and returns a serialized Expr; the consumer closure deserializes and returns the result Unit tests (local-bypass + forced-foreign) for all four methods Integration test for documentation in ffi_udwf.rs Note: ForeignWindowUDF::simplify() uses DefaultLogicalExtensionCodec internally; UDWFs requiring a custom codec are a known follow-up. API change: FFI_WindowUDF struct layout modified — target main only, no backport.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR extends the UDWF FFI surface to expose additional WindowUDFImpl capabilities (documentation, expressions, simplify, and reverse), and adds tests to validate cross-FFI behavior.
Changes:
- Add new FFI entry points for
documentation,expressions,simplify, andreverse_expronFFI_WindowUDF - Introduce an
FFI_ExpressionArgswrapper for passingExpressionArgsacross FFI - Add/extend tests to cover the new UDWF FFI behaviors
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| datafusion/ffi/src/udwf/mod.rs | Adds new UDWF FFI function pointers + wrappers, plus ForeignWindowUDF implementations for the new trait methods |
| datafusion/ffi/src/udwf/expression_args.rs | New FFI-stable container to pass ExpressionArgs across the FFI boundary |
| datafusion/ffi/tests/ffi_udwf.rs | Adds a test ensuring UDWF documentation survives the FFI wrapping/unwrapping path |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| impl TryFrom<ExpressionArgs<'_>> for FFI_ExpressionArgs { | ||
| type Error = DataFusionError; | ||
|
|
||
| fn try_from(args: ExpressionArgs) -> Result<Self, DataFusionError> { |
| unsafe extern "C" fn expressions_fn_wrapper( | ||
| udwf: &FFI_WindowUDF, | ||
| args: FFI_ExpressionArgs, | ||
| ) -> SVec<FFI_PhysicalExpr> { | ||
| unsafe { | ||
| let inner = udwf.inner(); | ||
| let args = ForeignExpressionArgs::try_from(args).unwrap(); | ||
| let expressions = inner.expressions((&args).into()); | ||
| expressions.into_iter().map(FFI_PhysicalExpr::from).collect() | ||
| } | ||
| } |
| fn expressions( | ||
| &self, | ||
| expr_args: datafusion_expr::function::ExpressionArgs, | ||
| ) -> Vec<Arc<dyn PhysicalExpr>> { | ||
| unsafe { | ||
| let args = FFI_ExpressionArgs::try_from(expr_args).unwrap(); | ||
| (self.udf.expressions)(&self.udf, args) | ||
| .into_iter() | ||
| .map(|e| Arc::<dyn PhysicalExpr>::from(&e)) | ||
| .collect() | ||
| } | ||
| } |
| /// user defined signatures, which will in turn call coerce_types to be called. This | ||
| /// call should be transparent to most users as the internal function performs the | ||
| /// appropriate calls on the underlying [`WindowUDF`] | ||
| pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const Documentation, |
| fn simplify(&self) -> Option<WindowFunctionSimplification> { | ||
| let udf = self.udf.clone(); | ||
| Some(Box::new(move |wf, info| { |
| pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const Documentation, | ||
|
|
||
| pub expressions: unsafe extern "C" fn( | ||
| udwf: &Self, | ||
| args: FFI_ExpressionArgs, | ||
| ) -> SVec<FFI_PhysicalExpr>, | ||
|
|
||
| pub simplify: unsafe extern "C" fn( | ||
| udwf: &Self, | ||
| window_function: SVec<u8>, | ||
| schema: WrappedSchema, | ||
| ) -> FFI_Result<FFI_Option<SVec<u8>>>, | ||
|
|
||
| pub reverse_expr: unsafe extern "C" fn(udwf: &Self) -> FFI_ReversedUDWF, |
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
e0cacfd to
027ea4b
Compare
|
@timsaucer Please review it and suggest if any changes are required |
Which issue does this PR close?
Closes #22332
Rationale for this change
FFI_WindowUDFwas silently dropping producer overrides of four defaultedWindowUDFImplmethods. Any plugin that overrodeexpressions(),simplify(),reverse_expr(), ordocumentation()would have thoseoverrides ignored on the consumer side, with the trait defaults running
instead. This could change window-function semantics (e.g.
expressions()selects which physical expressions reach the
PartitionEvaluator) orsilently suppress planner-level optimizations (
simplify,reverse_expr).What changes are included in this PR?
documentation— newunsafe extern "C"fn pointer returning*const Documentation(null =None); consumer reconstructs viaptr.as_ref()expressions— new fn pointer takingFFI_ExpressionArgsandreturning
SVec<FFI_PhysicalExpr>; a new sibling moduleudwf/expression_args.rsprovidesFFI_ExpressionArgs/ForeignExpressionArgsto carryVec<Arc<dyn PhysicalExpr>>andVec<FieldRef>across the boundarysimplify— closures cannot cross FFI, so simplification executeson the producer side: the consumer serializes the
WindowFunctiontoprotobuf bytes, calls the fn pointer, and deserializes the returned
Expr; the producer wrapper deserializes, calls the innersimplifyclosure, and returns the serialized result
reverse_expr— newFFI_ReversedUDWFenum (#[repr(C, u8)],matching the existing ABI-stable enum pattern in this crate) with
variants
Identical,NotSupported, andReversed(FFI_WindowUDF);the
Reversedvariant uses the round-trip downcast optimization toavoid double-wrapping when the inner UDF is already a
ForeignWindowUDFKnown limitation:
ForeignWindowUDF::simplify()currently usesDefaultLogicalExtensionCodecinternally. UDWFs whose expressions requirea custom codec to round-trip are not yet supported through the simplify
path and are tracked as a follow-up.
Are these changes tested?
Yes. For each of the four methods:
datafusion/ffi/src/udwf/mod.rscover both thelocal-bypass path (marker IDs match, downcasts to concrete type) and
the forced-foreign path (
mock_foreign_marker_idset, full FFI pathexercised)
datafusion/ffi/tests/ffi_udwf.rscoversdocumentationacross the crate boundaryAre there any user-facing changes?
FFI_WindowUDFstruct layout has changed — this is an ABI-breaking changefor anyone linking against the struct directly. The PR targets
mainonlywith no backport, per the
datafusion-fficrate convention for layoutchanges.