Skip to content

fix(cuda_std): use correct PTX scope suffix in block acqrel fence#377

Merged
LegNeato merged 1 commit intoRust-GPU:mainfrom
Snehal-Reddy:fix-fence-acqrel-block
Apr 20, 2026
Merged

fix(cuda_std): use correct PTX scope suffix in block acqrel fence#377
LegNeato merged 1 commit intoRust-GPU:mainfrom
Snehal-Reddy:fix-fence-acqrel-block

Conversation

@Snehal-Reddy
Copy link
Copy Markdown
Contributor

Summary

This updates the inline assembly to use the correct .cta (cooperative thread array) scope suffix, ensuring that block-level fences don't incur the unnecessary performance overhead of a system-wide synchronization.

Closes #376

Changes

  • crates/cuda_std/src/atomic/intrinsics.rs

Testing

  • cargo build passes
  • cargo clippy --workspace passes
  • Tested on: Ubuntu 24.04.1 LTS, NVIDIA GeForce RTX 5060 Ti, CUDA 12.8.93

This updates the inline assembly to use the correct `.cta` (cooperative thread array) scope suffix, ensuring that block-level fences don't incur the unnecessary performance overhead of a system-wide synchronization.
@LegNeato
Copy link
Copy Markdown
Contributor

Thanks!

@LegNeato LegNeato added this pull request to the merge queue Apr 20, 2026
Merged via the queue into Rust-GPU:main with commit 946c91f Apr 20, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fence_acqrel_block incorrectly uses system-level PTX scope (.sys) instead of block-level (.cta)

2 participants