Skip to content

feat: implement retract_batch for array_agg(DISTINCT) sliding window#22719

Open
SubhamSinghal wants to merge 1 commit into
apache:mainfrom
SubhamSinghal:array-agg-distinct-retract
Open

feat: implement retract_batch for array_agg(DISTINCT) sliding window#22719
SubhamSinghal wants to merge 1 commit into
apache:mainfrom
SubhamSinghal:array-agg-distinct-retract

Conversation

@SubhamSinghal
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

  • DistinctArrayAggAccumulator state: HashSet → HashMap<ScalarValue, u64>.
  • update_batch: increments the per-value count instead of inserting.
  • New retract_batch: decrements, removes the key on zero, mirrors the update_batch null-handling
    rules (ignore_nulls skip, otherwise NULL is a tracked key).
  • supports_retract_batch() now returns true.
  • merge_batch is structurally unchanged — the wire state (List) carries presence, not
    multiplicities. Merged counts represent "partitions that emitted this value," which is fine because
    evaluate only reads keys. Refcount semantics are only relied on within a single accumulator instance
    (window execution, which doesn't merge).
  • New helper ScalarValue::size_of_hashmap<V, S> in datafusion-common, mirroring size_of_hashset.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes. array_agg(DISTINCT x) now works in bounded/sliding window frames. Queries that previously
errored now succeed:

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) common Related to common crate functions Changes to functions implementation labels Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for array_arg(DISTINCT x) in sliding window execution

1 participant