Skip to content

Add wheel support for Newton-Schulz method via cuSolverMp#3004

Open
ksivaman wants to merge 1 commit into
NVIDIA:mainfrom
ksivaman:expand_wheel_builds
Open

Add wheel support for Newton-Schulz method via cuSolverMp#3004
ksivaman wants to merge 1 commit into
NVIDIA:mainfrom
ksivaman:expand_wheel_builds

Conversation

@ksivaman
Copy link
Copy Markdown
Member

@ksivaman ksivaman commented May 17, 2026

Description

#2706 added distributed Newton-Schulz matrix orthogonalization API via cuSolverMp, this PR brings the support for the same via published wheels.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Enable NVTE_WITH_CUSOLVERMP TE build via PyPI wheel.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@ksivaman ksivaman requested review from cyanguwa, denera and mk-61 May 17, 2026 01:37
@ksivaman ksivaman marked this pull request as draft May 17, 2026 01:38
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 17, 2026

Greptile Summary

This PR updates the wheel build infrastructure to include cuSolverMP as an optional feature by installing the system packages in both Dockerfiles, creating a canonical /opt/nvidia/cusolvermp layout with symlinked include/lib directories, and exporting NVTE_WITH_CUSOLVERMP=1 in the build script.

  • Dockerfiles (x86 + aarch64): libcusolvermp0-cuda-${CUDA_MAJOR} and its devel package are installed via dnf, symlinks create a unified /opt/nvidia/cusolvermp/{include,lib} tree, ldconfig is invoked in the same layer, and CUSOLVERMP_HOME is exported — all addressing prior review feedback.
  • build_wheels.sh: Only NVTE_WITH_CUSOLVERMP=1 is exported; the three other flags cited in the PR description (NVTE_WITH_CUBLASMP, NVTE_ENABLE_NVSHMEM, NVTE_UB_WITH_MPI) have no corresponding exports or dependency installs anywhere in the changed files.

Confidence Score: 4/5

Safe to merge for cuSolverMP support, but the three other features promised in the PR description are not implemented.

The cuSolverMP wiring — package install, symlink tree, ldconfig, ENV exports — is self-consistent and correct in both Dockerfiles. The gap is between what the PR claims to deliver (four optional build flags) and what is actually wired up (only one). Users relying on the PR description to know the wheel now includes cuBLASMP, NVSHMEM, or MPI support will be wrong.

build_tools/wheel_utils/build_wheels.sh — the three missing flag exports (and the Dockerfiles that would need matching dependency installs) are the unfinished portion of this change.

Important Files Changed

Filename Overview
build_tools/wheel_utils/build_wheels.sh Adds NVTE_WITH_CUSOLVERMP=1 export before the build loop, but NVTE_WITH_CUBLASMP, NVTE_ENABLE_NVSHMEM, and NVTE_UB_WITH_MPI — all claimed in the PR description — are absent.
build_tools/wheel_utils/Dockerfile.x86 Installs libcusolvermp from the CUDA repo, creates /opt/nvidia/cusolvermp with include/lib symlinks, calls ldconfig in the same layer, and sets CUSOLVERMP_HOME. Only cuSolverMP is set up; no cuBLASMP/NVSHMEM/OpenMPI packages are installed.
build_tools/wheel_utils/Dockerfile.aarch Mirror of Dockerfile.x86 for aarch64 — same cuSolverMP install/symlink/ldconfig/ENV additions, same missing cuBLASMP/NVSHMEM/MPI setup.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Docker build\nDockerfile.x86 / .aarch] --> B[dnf install libcusolvermp0-cuda-CUDA_MAJOR\n+ devel package]
    B --> C[Create /opt/nvidia/cusolvermp\nwith include + lib symlinks]
    C --> D[echo lib path to ld.so.conf.d\n+ ldconfig]
    D --> E[ENV CUSOLVERMP_HOME=/opt/nvidia/cusolvermp\nENV LD_LIBRARY_PATH += cusolvermp/lib]
    E --> F[Container starts\nbuild_wheels.sh]
    F --> G[export NVTE_WITH_CUSOLVERMP=1]
    G --> H[python setup.py bdist_wheel\ncommon lib]
    H --> I[Wheel includes cuSolverMP]
    F -.->|NOT exported| J[NVTE_WITH_CUBLASMP\nNVTE_ENABLE_NVSHMEM\nNVTE_UB_WITH_MPI]
    J -.->|NOT installed| K[cuBLASMP / NVSHMEM / OpenMPI\ndependencies]
    style J fill:#ffcccc,stroke:#cc0000
    style K fill:#ffcccc,stroke:#cc0000
Loading

Reviews (2): Last reviewed commit: "Add NS via cusolvermp to wheel build" | Re-trigger Greptile

Comment thread build_tools/wheel_utils/build_wheels.sh Outdated

SITE_PACKAGES=$(/opt/python/cp310-cp310/bin/python -c "import sysconfig; print(sysconfig.get_paths()['purelib'])")
export CUBLASMP_HOME="${SITE_PACKAGES}/nvidia/cublasmp/cu${CUDA_MAJOR}"
export CUSOLVERMP_HOME="${SITE_PACKAGES}/nvidia/cu${CUDA_MAJOR}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Likely incorrect CUSOLVERMP_HOME path

The path ${SITE_PACKAGES}/nvidia/cu${CUDA_MAJOR} is missing the package-name segment. Every other NVIDIA Python package follows the layout site-packages/nvidia/<package-name>/cu<ver>/ — for example, nvidia-cublasmp-cu12 installs under nvidia/cublasmp/cu12/, so nvidia-cusolvermp-cu12 should install under nvidia/cusolvermp/cu12/. With the current path the .so symlink loop silently skips cuSolverMP's lib/ directory ([ -d "$lib_dir" ] || continue), no unversioned .so stubs are created, and the linker will not find cuSolverMP at build time even though NVTE_WITH_CUSOLVERMP=1 is exported.

Comment thread build_tools/wheel_utils/Dockerfile.x86 Outdated
Comment thread build_tools/wheel_utils/Dockerfile.aarch Outdated
Signed-off-by: Kirthi Shankar Sivamani <ksivamani@nvidia.com>
@ksivaman ksivaman force-pushed the expand_wheel_builds branch from 522c631 to df140b3 Compare May 19, 2026 23:33
@ksivaman ksivaman marked this pull request as ready for review May 19, 2026 23:34
Comment on lines +28 to +30
# Enable optional build features. cuSolverMp is provided by the build image
# (see Dockerfile.x86 / Dockerfile.aarch), which also sets CUSOLVERMP_HOME.
export NVTE_WITH_CUSOLVERMP=1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Three of the four advertised flags never get exported

The PR description and title claim to enable NVTE_WITH_CUSOLVERMP, NVTE_WITH_CUBLASMP, NVTE_ENABLE_NVSHMEM, and NVTE_UB_WITH_MPI in the wheel build. Only NVTE_WITH_CUSOLVERMP is exported here. Neither NVTE_WITH_CUBLASMP, NVTE_ENABLE_NVSHMEM, nor NVTE_UB_WITH_MPI are exported in build_wheels.sh, and no corresponding packages (cuBLASMP, NVSHMEM, OpenMPI) are installed in either Dockerfile. Wheels built from this script will silently omit those three features.

@ksivaman ksivaman changed the title Add optional core lib features to wheel build Add wheel support for Newton-Schulz method via cuSolverMp May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant