Skip to content

fix: [branch-0.14] backport #3924 - share unified memory pools across native execution contexts#3938

Merged
andygrove merged 2 commits intoapache:branch-0.14from
andygrove:backport-3924-to-0.14
Apr 14, 2026
Merged

fix: [branch-0.14] backport #3924 - share unified memory pools across native execution contexts#3938
andygrove merged 2 commits intoapache:branch-0.14from
andygrove:backport-3924-to-0.14

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Backport of #3924 to branch-0.14. Closes #3921.

Rationale for this change

When Comet executes a shuffle, it creates two native execution contexts that run concurrently within the same Spark task. Previously, each context created its own memory pool with the full per-task memory limit, effectively allowing 2x the intended memory to be consumed. This caused significantly higher memory usage than expected, leading to OOM errors.

What changes are included in this PR?

Cherry-pick of #3924 with minor conflict resolution (added missing parking_lot::Mutex import that was not present on the 0.14 branch).

Changes from the original PR:

  • Make fair_unified and greedy_unified memory pools task-shared, so a single pool instance is reused across all native execution contexts within the same Spark task
  • Fix a tracing bug where total_reserved_for_thread() and unregister_and_total() double-counted memory when multiple execution contexts shared the same pool Arc
  • Update tuning guide to document that both pool types are shared across execution contexts

How are these changes tested?

Same tests as #3924. Verified native code compiles on the 0.14 branch after cherry-pick.

@andygrove andygrove changed the title fix: backport #3924 - share unified memory pools across native execution contexts fix: [branch-0.14] backport #3924 - share unified memory pools across native execution contexts Apr 13, 2026
Remove tracing helper functions (register_memory_pool,
unregister_and_total, total_reserved_for_thread, log_jemalloc_usage)
whose call sites do not exist on branch-0.14, along with their
now-unused imports.
@andygrove andygrove force-pushed the backport-3924-to-0.14 branch from 59bb1d2 to a8529ed Compare April 13, 2026 22:28
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove!

@andygrove andygrove merged commit 7a4f42d into apache:branch-0.14 Apr 14, 2026
159 of 160 checks passed
@andygrove andygrove deleted the backport-3924-to-0.14 branch April 14, 2026 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants