Fix #1804: Fix use-after-free in graph-related APIs#2083
Conversation
This comment has been minimized.
This comment has been minimized.
kkraus14
left a comment
There was a problem hiding this comment.
Suggested using typed assignment for the deep copy while preserving the use-after-free fix.
| pydependencies_out = [cudaGraphNode_t(init_value=<void_ptr>cydependencies_out[idx]) for idx in range(numDependencies_out)] | ||
| pydependencies_out = [cudaGraphNode_t() for _ in range(numDependencies_out)] | ||
| for idx in range(numDependencies_out): | ||
| string.memcpy((<cudaGraphNode_t>pydependencies_out[idx])._pvt_ptr, &cydependencies_out[idx], sizeof(cyruntime.cudaGraphNode_t)) |
There was a problem hiding this comment.
Can we use typed assignment for these wrapper-owned copies instead of raw memcpy? It keeps the use-after-free fix, while letting the generated Cython/C types define the copy size. Same pattern applies to the other generated deep-copy sites in this PR.
| string.memcpy((<cudaGraphNode_t>pydependencies_out[idx])._pvt_ptr, &cydependencies_out[idx], sizeof(cyruntime.cudaGraphNode_t)) | |
| (<cudaGraphNode_t>pydependencies_out[idx])._pvt_ptr[0] = cydependencies_out[idx] |
There was a problem hiding this comment.
That does seem to work. What is this magical thing called C++ 😆
| pydependencies_out = [CUgraphNode(init_value=<void_ptr>cydependencies_out[idx]) for idx in range(numDependencies_out)] | ||
| pydependencies_out = [CUgraphNode() for _ in range(numDependencies_out)] | ||
| for idx in range(numDependencies_out): | ||
| string.memcpy((<CUgraphNode>pydependencies_out[idx])._pvt_ptr, &cydependencies_out[idx], sizeof(cydriver.CUgraphNode)) |
There was a problem hiding this comment.
This and the remaining memcpys should also use typed copy assignment I think? Does this need to be rerun with the latest codegen changes?
There was a problem hiding this comment.
Was working from my laptop which has an older version of doxygen and I asked my agent to filter out those changes. Looks like it messed up.
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin 12.9.x
git worktree add -d .worktree/backport-2083-to-12.9.x origin/12.9.x
cd .worktree/backport-2083-to-12.9.x
git switch --create backport-2083-to-12.9.x
git cherry-pick -x 87ee176ba58547c69cc72d97aee3f68b8f47b409 |
|
|
For posterity: Mike backported the PR in #2110. |
See #1804 for more information. Close #1804.
There are actually two different array semantics at play here, both of which were handled incorrectly.
cuStreamGetCaptureInforeturns "driver owned" arrays that are not allocated by the caller, but are only valid until the next call.The remainder of the functions use a caller-provided buffer to write into.
These have both been solved by copying the array used to communicate with the API into the Python object itself.
But we can leave that last thing as an optimization for the future. This fixes the immediate and serious bug.
Cc: @Andy-Jost for viz since this may have implications for
cuda.core.graph. It passed the tests locally for me, but I did not do a thorough review of any implications it might have.