Summary
DATE_BIN currently performs timestamp unit scaling in multiple call sites before invoking the binning function. This logic is duplicated across scalar and array paths and primarily uses direct multiplication. The existing approach increases maintenance cost and risks inconsistent overflow/error behavior across code paths.
Problem Statement
The DATE_BIN implementation contains repeated patterns that convert input values to nanoseconds using expressions like value * scale in multiple closures and match arms.
Examples include:
Because scaling is spread across many locations:
- Overflow handling is not centrally enforced by one helper.
- Error mapping behavior can diverge between scalar and array code paths.
- Future fixes must be replicated in several places.
Why This Matters
- Correctness: unit scaling is arithmetic on i64 values and can overflow for edge inputs.
- Consistency: DATE_BIN should produce predictable behavior regardless of scalar vs array path.
- Maintainability: duplicated conversion logic increases risk of regressions.
Proposed Change
Introduce a small helper local to DATE_BIN implementation to centralize checked scaling:
- Helper shape: value_to_nanos(value: i64, scale: i64) -> Result
- Internals: use checked_mul and return a structured execution/compute error on overflow.
- Adoption: replace direct value * scale call sites in DATE_BIN scalar and array paths.
- Error behavior: keep existing external behavior, but standardize internal overflow handling and message shape.
Optional follow-on helper for reverse conversion can be considered if it reduces duplication further, but initial scope should focus on input scaling.
Scope
In scope:
Out of scope:
- Broad cross-function refactors outside DATE_BIN
- Behavioral changes unrelated to scaling overflow handling
Acceptance Criteria
- All DATE_BIN input scaling to nanoseconds uses the new checked helper.
- No direct value * scale remains in DATE_BIN timestamp/time scalar and array scaling paths where overflow is possible.
- Scalar and array paths produce consistent overflow handling semantics.
- Existing DATE_BIN tests continue to pass.
- New or updated tests cover at least one overflow edge for scaled input conversion.
Test Plan
Related PR
#22408
Summary
DATE_BIN currently performs timestamp unit scaling in multiple call sites before invoking the binning function. This logic is duplicated across scalar and array paths and primarily uses direct multiplication. The existing approach increases maintenance cost and risks inconsistent overflow/error behavior across code paths.
Problem Statement
The DATE_BIN implementation contains repeated patterns that convert input values to nanoseconds using expressions like value * scale in multiple closures and match arms.
Examples include:
Because scaling is spread across many locations:
Why This Matters
Proposed Change
Introduce a small helper local to DATE_BIN implementation to centralize checked scaling:
Optional follow-on helper for reverse conversion can be considered if it reduces duplication further, but initial scope should focus on input scaling.
Scope
In scope:
Out of scope:
Acceptance Criteria
Test Plan
Related PR
#22408