Skip to content

feat: Additional Canonical Extension Types#21291

Open
tobixdev wants to merge 11 commits intoapache:mainfrom
tobixdev:additional-canonical-extension-types
Open

feat: Additional Canonical Extension Types#21291
tobixdev wants to merge 11 commits intoapache:mainfrom
tobixdev:additional-canonical-extension-types

Conversation

@tobixdev
Copy link
Copy Markdown
Contributor

@tobixdev tobixdev commented Apr 1, 2026

Which issue does this PR close?

Rationale for this change

Implements DFExtensionType for the other canonical extension types (except Parquet types).

What changes are included in this PR?

  • Implement DFExtensionType for:
    • arrow.bool8 (custom formatting with true/false)
    • arrow.fixed_shape_tensor (nu custom formatting for now)
    • arrow.json (no custom formatting for now)
    • arrow.opaque (no custom formatting for now)
    • arrow.timestamp_with_offset (custom formatting as timestamp)
    • arrow.variable_shape_tensor (no custom formatting)
  • Introduce a wrapper struct DFUuid for Uuid so that it is consistent with the other extension types. I don't know whether we truly need the wrapper structs for extension types that only have a single valid storage type based on their metadata (e.g., Bool8, Uuid) . Open for any opinions.

@paleolimbot Is this the kind of wrapper structs you imagined?

Should we also add end-to-end tests for the other extension types? Currently, we only have the UUID one from the last PR.

I think for variant supports it's best to wait for general Variant support in DataFusion as this is currently done in https://github.com/datafusion-contrib/datafusion-variant .

Are these changes tested?

Custom formatters are tested. The rest is mainly boiler plate code.

Are there any user-facing changes?

Yes more extension types and renames Uuid to DFUuid. If we plan to keep our own wrapper structs, we should merge that before releasing a version with the old Uuid so we have no breaking changes.

@tobixdev tobixdev changed the title Additional Canonical Extension Types feat: Additional Canonical Extension Types Apr 1, 2026
@github-actions github-actions bot added logical-expr Logical plan and expressions common Related to common crate labels Apr 1, 2026
Copy link
Copy Markdown
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I may be missing something about the arrow-rs ExtensionType, but I'm not sure we need to impl it for these (which can be constrained to datafusion-specific behaviour).

I think datafusion-common is a great place for these now...there might be type-specific crates in the future that include the dependencies required to actually do things (e.g., the datafusion JSON contribution being discussed and perhaps a future variant), and when that's the case the type definition could maybe live there too.

Comment on lines +33 to +35
impl ExtensionType for DFBool8 {
const NAME: &'static str = Bool8::NAME;
type Metadata = <Bool8 as ExtensionType>::Metadata;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is needed (only the impl DFExtensionType bit). Does the fact that these objects implement ExtensionType allow something to be done with them that isn't otherwise possible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this was also a thorn in my eye. I think the only thing is the default registration that requires ExtensionType + DFExtensionType.

I've removed this requirement from the default registration. I think it's better now, especially because now there is only a single ExtensionType implementation that can be used for reading/writing metadata. The DFBool8 etc. types can then be ignored by most users, except those that want to customize the registered extension types.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, I think the example is a bit weird as it implements both traits in the same struct.

For example, ExtensionType::supports_data_type now returns true for all supported data types (as it should for the arrow-rs API), but it may seem unexpected as it just stores a single sotrage type. So some users could assume that it should only return true for this specific type.

Do you think it warrants splitting the example into TemperatureExtensionType and DFTemperatureExtensionType too to avoid this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants