Skip to content

feat(ffi): plumb simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705

Open
Brijesh-Thakkar wants to merge 5 commits into
apache:mainfrom
Brijesh-Thakkar:fix/22332-windowudf-overrides
Open

feat(ffi): plumb simplify, expressions, reverse_expr, and documentation through FFI_WindowUDF#22705
Brijesh-Thakkar wants to merge 5 commits into
apache:mainfrom
Brijesh-Thakkar:fix/22332-windowudf-overrides

Conversation

@Brijesh-Thakkar
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #22332

Rationale for this change

FFI_WindowUDF was silently dropping producer overrides of four defaulted
WindowUDFImpl methods. Any plugin that overrode expressions(),
simplify(), reverse_expr(), or documentation() would have those
overrides ignored on the consumer side, with the trait defaults running
instead. This could change window-function semantics (e.g. expressions()
selects which physical expressions reach the PartitionEvaluator) or
silently suppress planner-level optimizations (simplify, reverse_expr).

What changes are included in this PR?

  • documentation — new unsafe extern "C" fn pointer returning
    *const Documentation (null = None); consumer reconstructs via
    ptr.as_ref()

  • expressions — new fn pointer taking FFI_ExpressionArgs and
    returning SVec<FFI_PhysicalExpr>; a new sibling module
    udwf/expression_args.rs provides FFI_ExpressionArgs /
    ForeignExpressionArgs to carry Vec<Arc<dyn PhysicalExpr>> and
    Vec<FieldRef> across the boundary

  • simplify — closures cannot cross FFI, so simplification executes
    on the producer side: the consumer serializes the WindowFunction to
    protobuf bytes, calls the fn pointer, and deserializes the returned
    Expr; the producer wrapper deserializes, calls the inner simplify
    closure, and returns the serialized result

  • reverse_expr — new FFI_ReversedUDWF enum (#[repr(C, u8)],
    matching the existing ABI-stable enum pattern in this crate) with
    variants Identical, NotSupported, and Reversed(FFI_WindowUDF);
    the Reversed variant uses the round-trip downcast optimization to
    avoid double-wrapping when the inner UDF is already a ForeignWindowUDF

Known limitation: ForeignWindowUDF::simplify() currently uses
DefaultLogicalExtensionCodec internally. UDWFs whose expressions require
a custom codec to round-trip are not yet supported through the simplify
path and are tracked as a follow-up.

Are these changes tested?

Yes. For each of the four methods:

  • Unit tests in datafusion/ffi/src/udwf/mod.rs cover both the
    local-bypass path (marker IDs match, downcasts to concrete type) and
    the forced-foreign path (mock_foreign_marker_id set, full FFI path
    exercised)
  • Integration test in datafusion/ffi/tests/ffi_udwf.rs covers
    documentation across the crate boundary
  • All 104 existing unit tests and all integration tests continue to pass

Are there any user-facing changes?

FFI_WindowUDF struct layout has changed — this is an ABI-breaking change
for anyone linking against the struct directly. The PR targets main only
with no backport, per the datafusion-ffi crate convention for layout
changes.

…hrough FFI_WindowUDF

Closes apache#22332

Add documentation, expressions, simplify, and reverse_expr
fn pointers to FFI_WindowUDF struct
Implement producer-side wrappers for each new fn pointer
Wire ForeignWindowUDF trait methods to call the fn pointers
Add FFI_ExpressionArgs in a new expression_args.rs sibling module
to carry Vec<Arc<dyn PhysicalExpr>> across the boundary
Add FFI_ReversedUDWF enum (#[repr(C, u8)]) to represent the three
ReversedUDWF variants stably
simplify executes on the producer side and returns a serialized Expr;
the consumer closure deserializes and returns the result
Unit tests (local-bypass + forced-foreign) for all four methods
Integration test for documentation in ffi_udwf.rs

Note: ForeignWindowUDF::simplify() uses DefaultLogicalExtensionCodec
internally; UDWFs requiring a custom codec are a known follow-up.
API change: FFI_WindowUDF struct layout modified — target main only,
no backport.
Copilot AI review requested due to automatic review settings June 2, 2026 05:34
@github-actions github-actions Bot added the ffi Changes to the ffi crate label Jun 2, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR extends the UDWF FFI surface to expose additional WindowUDFImpl capabilities (documentation, expressions, simplify, and reverse), and adds tests to validate cross-FFI behavior.

Changes:

  • Add new FFI entry points for documentation, expressions, simplify, and reverse_expr on FFI_WindowUDF
  • Introduce an FFI_ExpressionArgs wrapper for passing ExpressionArgs across FFI
  • Add/extend tests to cover the new UDWF FFI behaviors

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
datafusion/ffi/src/udwf/mod.rs Adds new UDWF FFI function pointers + wrappers, plus ForeignWindowUDF implementations for the new trait methods
datafusion/ffi/src/udwf/expression_args.rs New FFI-stable container to pass ExpressionArgs across the FFI boundary
datafusion/ffi/tests/ffi_udwf.rs Adds a test ensuring UDWF documentation survives the FFI wrapping/unwrapping path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +42
impl TryFrom<ExpressionArgs<'_>> for FFI_ExpressionArgs {
type Error = DataFusionError;

fn try_from(args: ExpressionArgs) -> Result<Self, DataFusionError> {
Comment on lines +188 to +198
unsafe extern "C" fn expressions_fn_wrapper(
udwf: &FFI_WindowUDF,
args: FFI_ExpressionArgs,
) -> SVec<FFI_PhysicalExpr> {
unsafe {
let inner = udwf.inner();
let args = ForeignExpressionArgs::try_from(args).unwrap();
let expressions = inner.expressions((&args).into());
expressions.into_iter().map(FFI_PhysicalExpr::from).collect()
}
}
Comment on lines +483 to +494
fn expressions(
&self,
expr_args: datafusion_expr::function::ExpressionArgs,
) -> Vec<Arc<dyn PhysicalExpr>> {
unsafe {
let args = FFI_ExpressionArgs::try_from(expr_args).unwrap();
(self.udf.expressions)(&self.udf, args)
.into_iter()
.map(|e| Arc::<dyn PhysicalExpr>::from(&e))
.collect()
}
}
/// user defined signatures, which will in turn call coerce_types to be called. This
/// call should be transparent to most users as the internal function performs the
/// appropriate calls on the underlying [`WindowUDF`]
pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const Documentation,
Comment on lines +496 to +498
fn simplify(&self) -> Option<WindowFunctionSimplification> {
let udf = self.udf.clone();
Some(Box::new(move |wf, info| {
Comment on lines +85 to +98
pub documentation: unsafe extern "C" fn(udwf: &Self) -> *const Documentation,

pub expressions: unsafe extern "C" fn(
udwf: &Self,
args: FFI_ExpressionArgs,
) -> SVec<FFI_PhysicalExpr>,

pub simplify: unsafe extern "C" fn(
udwf: &Self,
window_function: SVec<u8>,
schema: WrappedSchema,
) -> FFI_Result<FFI_Option<SVec<u8>>>,

pub reverse_expr: unsafe extern "C" fn(udwf: &Self) -> FFI_ReversedUDWF,
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
     Cloning apache/main
    Building datafusion-ffi v53.1.0 (current)
       Built [  81.292s] (current)
     Parsing datafusion-ffi v53.1.0 (current)
      Parsed [   0.060s] (current)
    Building datafusion-ffi v53.1.0 (baseline)
       Built [  58.740s] (baseline)
     Parsing datafusion-ffi v53.1.0 (baseline)
      Parsed [   0.058s] (baseline)
    Checking datafusion-ffi v53.1.0 -> v53.1.0 (no change; assume patch)
     Checked [   0.259s] 222 checks: 220 pass, 1 fail, 1 warn, 30 skip

--- failure constructible_struct_adds_field: externally-constructible struct adds field ---

Description:
A pub struct constructible with a struct literal has a new pub field. Existing struct literals must be updated to include the new field.
        ref: https://doc.rust-lang.org/reference/expressions/struct-expr.html
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/constructible_struct_adds_field.ron

Failed in:
  field FFI_WindowUDF.documentation in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:85
  field FFI_WindowUDF.expressions in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:88
  field FFI_WindowUDF.simplify in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:95
  field FFI_WindowUDF.reverse_expr in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:102
  field FFI_WindowUDF.has_simplify in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:111

--- warning repr_c_plain_struct_fields_reordered: struct fields reordered in repr(C) struct ---

Description:
A public repr(C) struct had its fields reordered. This can change the struct's memory layout, possibly breaking FFI use cases that depend on field position and order.
        ref: https://doc.rust-lang.org/reference/type-layout.html#reprc-structs
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.47.0/src/lints/repr_c_plain_struct_fields_reordered.ron

Failed in:
  FFI_WindowUDF.coerce_types moved from position 6 to 10, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:104
  FFI_WindowUDF.sort_options moved from position 7 to 11, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:109
  FFI_WindowUDF.clone moved from position 8 to 13, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:115
  FFI_WindowUDF.release moved from position 9 to 14, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:118
  FFI_WindowUDF.private_data moved from position 10 to 15, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:122
  FFI_WindowUDF.library_marker_id moved from position 11 to 16, in /home/runner/work/datafusion/datafusion/datafusion/ffi/src/udwf/mod.rs:127

     Summary semver requires new major version: 1 major and 0 minor checks failed
     Warning produced 1 major and 0 minor level warnings
    Finished [ 141.751s] datafusion-ffi

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 2, 2026
@Brijesh-Thakkar Brijesh-Thakkar force-pushed the fix/22332-windowudf-overrides branch from e0cacfd to 027ea4b Compare June 2, 2026 12:54
@Brijesh-Thakkar
Copy link
Copy Markdown
Contributor Author

@timsaucer Please review it and suggest if any changes are required
Thank youu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change ffi Changes to the ffi crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FFI: FFI_WindowUDF silently drops producer overrides of defaulted trait methods

2 participants