Skip to content

fix: reject incompatible decimal precision/scale in native_datafusion scan#4090

Open
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-issue-4089-decimal-precision
Open

fix: reject incompatible decimal precision/scale in native_datafusion scan#4090
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:fix-issue-4089-decimal-precision

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 25, 2026

Which issue does this PR close?

Closes #4089.

Rationale for this change

When the native_datafusion scan reads a Parquet column whose physical type is Decimal(p1, s1) under a requested read schema of Decimal(p2, s2) with s2 < s1, the existing schema adapter falls through to Spark's Cast expression. Cast happily truncates fractional digits, producing wrong values silently. Spark's vectorized reader rejects this with SchemaColumnConvertNotSupportedException, and native_iceberg_compat already does the same via TypeUtil.checkParquetType. The native scan should match.

What changes are included in this PR?

native/core/src/parquet/schema_adapter.rs: in replace_with_spark_cast, add a guard before the existing branches that returns DataFusionError::Plan when both physical_type and target_type are Decimal128 and the target scale is smaller than the source scale.

The check is intentionally narrow:

How are these changes tested?

Added a focused test to ParquetReadSuite: native_datafusion rejects incompatible decimal precision/scale. It writes Decimal(10, 2) data, reads it under Decimal(5, 0) (scale narrowed from 2 to 0), forces spark.comet.scan.impl=native_datafusion and spark.sql.sources.useV1SourceList=parquet, and asserts collect() raises SparkException. Verified against ParquetReadV1Suite (44 tests, all pass; 1 pre-existing test ignored).

The behavior is also covered by the per-impl matrix added in #4087 (decimal(10,2) read as decimal(5,0): native_datafusion), whose assertion will need flipping from "succeeds" to "throws" once that PR merges.

@andygrove andygrove added correctness bug Something isn't working labels Apr 25, 2026
… scan

The native_datafusion Spark physical expression adapter previously fell
through to a Spark Cast for decimal-to-decimal type changes, which
silently rescales or truncates values that should have raised an error.
Mirror Spark's TypeUtil.isDecimalTypeMatched (Spark 3.x rule) by
rejecting reads where the target precision is smaller than the source
precision or the scales differ.

Closes apache#4089.
@andygrove andygrove force-pushed the fix-issue-4089-decimal-precision branch from 1194d82 to 99e1235 Compare April 26, 2026 13:04
@mbutrovich
Copy link
Copy Markdown
Contributor

In a case where we expect an exception to be generated anyway, can we catch this at CometScanRule rather than going all the way to serialization and native operators?

@andygrove
Copy link
Copy Markdown
Member Author

In a case where we expect an exception to be generated anyway, can we catch this at CometScanRule rather than going all the way to serialization and native operators?

I think the issue is that we do not know the types of all the parquet files until runtime?

@mbutrovich
Copy link
Copy Markdown
Contributor

mbutrovich commented Apr 26, 2026

I think the issue is that we do not know the types of all the parquet files until runtime?

IIRC from looking at this a while back, Spark has read the physical schema already, but thrown it away by the time our Comet rules run with no good way to get it again.

I'm not opposed to handling it this way, just wanted to think through if we could catch it earlier. It's also a fairly uncommon scenario.

@andygrove
Copy link
Copy Markdown
Member Author

I'm not opposed to handling it this way, just wanted to think through if we could catch it earlier. It's also a fairly uncommon scenario.

I do think this is an edge case that is fairly unlikely IRL because it only happens when the user provides a schema that is incompatible with the file schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working correctness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

native_datafusion: incompatible decimal precision/scale read silently succeeds without value validation

2 participants