Skip to content

cuda.bindings latency benchmarks#1736

Merged
danielfrg merged 20 commits intomainfrom
cuda-bindings-bench
Apr 2, 2026
Merged

cuda.bindings latency benchmarks#1736
danielfrg merged 20 commits intomainfrom
cuda-bindings-bench

Conversation

@danielfrg
Copy link
Copy Markdown
Contributor

Description

closes #1580

Description

closes #1580

@leofang @mdboom I migrated one benchmark from the pytest suite to use pyperf and added a C++ equivalent.

  • Added a small benchmark discovery to find bench_*.py files with bench_*() functions
  • Uses bench_time_func
  • C++ benchmarks output pyperf-compatible JSON so both sides can be analyzed with the same pyperf stats / pyperf hist commands.
  • The readme explain how to run it on the different envs using pixi

The benchmark is cuPointerGetAttribute, both Python and C++ call the same driver API with error checking.

These are one set of results for Python and C++ in my system, so we are ok under the <1us. They dont run the same warmup and runs for each, i still need to finish that but just to give you an idea.

# Python (pyperf bench_time_func)
bindings.pointer_attributes.pointer_get_attribute: Mean +- std dev: 603 ns +- 25 ns

# C++ (driver API baseline)
cpp.pointer_attributes.pointer_get_attribute: Mean +- std dev: 29 ns +- 1 ns

I still need to work on matching params for all the benchmarks and so on and so on but wanted to get feedback first if this looks fine to keep going.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link
Copy Markdown
Contributor

@mdboom mdboom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm marking this as "approve" even though I have some questions inline and since I think it's totally fine to merge this and iterate if that's the easiest way forward.

(I am not a regular pixi user...) I tried to follow the instructions but I get:

 pixi run -e source bench
Error:   × failed to solve requirements of environment 'source' for platform 'linux-64'
  ├─▶   × failed to solve the environment
  │
  ╰─▶ Cannot solve the request because of: cuda-bindings * cannot be installed because there are no viable options:
      └─ cuda-bindings 13.1.0 would require
         └─ cuda-nvrtc >=13.2.51,<14.0a0, which cannot be installed because there are no viable options:
            └─ cuda-nvrtc 13.2.51 would require
               └─ cuda-version >=13.2,<13.3.0a0, for which no candidates were found.

 pixi run -e wheel bench
Error:   × failed to solve requirements of environment 'source' for platform 'linux-64'
  ├─▶   × failed to solve the environment
  │
  ╰─▶ Cannot solve the request because of: cuda-bindings * cannot be installed because there are no viable options:
      └─ cuda-bindings 13.1.0 would require
         └─ cuda-nvrtc >=13.2.51,<14.0a0, which cannot be installed because there are no viable options:
            └─ cuda-nvrtc 13.2.51 would require
               └─ cuda-version >=13.2,<13.3.0a0, for which no candidates were found.


- `bench`: Runs the Python benchmarks
- `bench-cpp`: Runs the C++ benchmarks

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention pyperf system tune here?

@danielfrg
Copy link
Copy Markdown
Contributor Author

Thanks for the comments! I dont think we need to merge now. I'll address the comments and once we are happy with a template we have here we can commit and then in another PR i can just add more benchmarks.

@danielfrg
Copy link
Copy Markdown
Contributor Author

Addressed the comments and i relaxed one of the deps in pixi so i think you should be able to try again.

@cpcloud
Copy link
Copy Markdown
Contributor

cpcloud commented Mar 17, 2026

Are these going to run in CI or in any sort of regular way? I'm not sure we should have this much additional code that is going to go stale immediately.

Can we run one iteration of the benchmarks in CI so they don't go stale?

@danielfrg
Copy link
Copy Markdown
Contributor Author

Yes, the idea is for those to run in CI but I dont think we have decided the infra where thats going to happen, i remember having a discussion a while back but i dont remember what the problem is with the infra right now.

It sounds like a good idea to at least run one iteration for now. If thats what we want I will add it here.

@danielfrg
Copy link
Copy Markdown
Contributor Author

danielfrg commented Mar 31, 2026

@cpcloud I just added a small smoke test running one loop in CI

@danielfrg
Copy link
Copy Markdown
Contributor Author

I'll merge in a couple of hours and then add other benchmarks in a following PR unless any objections.

@danielfrg danielfrg enabled auto-merge (squash) April 1, 2026 18:07
@danielfrg danielfrg added cuda.bindings Everything related to the cuda.bindings module performance labels Apr 1, 2026
@danielfrg
Copy link
Copy Markdown
Contributor Author

danielfrg commented Apr 1, 2026

Whats the "Check job status" CI?

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 1, 2026

Whats the "Check job status" CI?

It'll kick in after all other GitHub Actions jobs are finished, to determine if the PR can be merged.

Give me a moment to take a look.

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 1, 2026

/ok to test 43e2602

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 1, 2026

I just triggered the CI. Assuming it passes, this should auto-merge.

@github-actions

This comment has been minimized.

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 1, 2026

@danielfrg are you around to resolve the merge conflict? — We'll have to rerun the CI.

@danielfrg
Copy link
Copy Markdown
Contributor Author

Resolved the conflict

@danielfrg
Copy link
Copy Markdown
Contributor Author

/ok to test f7d98c0

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 2, 2026

/ok to test e953174

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 2, 2026

@danielfrg I added 4f8eef0 and merged main again — hopefully it'll go through now.

@danielfrg danielfrg merged commit 64b8c07 into main Apr 2, 2026
87 checks passed
@danielfrg danielfrg deleted the cuda-bindings-bench branch April 2, 2026 06:10
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

Doc Preview CI
Preview removed because the pull request was closed or merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python latency testing & benchmarking

6 participants