[None][fix] Generalize FP8 checkpoint loading for Qwen3.5 by amukkara · Pull Request #15067 · NVIDIA/TensorRT-LLM

amukkara · 2026-06-08T00:06:05Z

Summary by CodeRabbit

New Features
- Enhanced Qwen3.5 model checkpoint handling with improved tensor normalization and packing support.
Improvements
- Refined FP8 quantization weight-scale loading for improved consistency across different loading modes.
- Updated quantization exclusion rules for linear attention modules in model configuration.

Description

Fixes few ad-hoc handling of weight shapes for Qwen3.5 FP8 checkpoints.
After this PR, both FP8 blockwise and FP8 rowwise (per-token per-channel) checkpoints will be loaded correctly.

Test Coverage

Existing FP8 test cases for Qwen3.5

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-06-08T00:12:34Z

📝 Walkthrough

Walkthrough

This PR refactors Qwen3.5 checkpoint weight normalization and projection packing to handle FP8 scale tensor shapes consistently. It introduces scale-name normalization during checkpoint preprocessing and updates weight-loading to reshape scale tensors to 1-D format across vanilla and fused linear modes.

Changes

Qwen3.5 FP8 Scale Normalization and Projection Packing

Layer / File(s)	Summary
FP8 Block-Scale Normalization `tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`	Adds `QuantAlgo` import and new `_normalize_scale_names` method that conditionally remaps FP8 block-scale tensors from 4D `weight_scale` to 2D `weight_scale_inv`, returning a flag for modelopt-native FP8 path preservation.
Projection Packing and Split-Kind Refactor `tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`	Introduces `_SPLIT_KIND_BY_SUFFIX` mapping to determine row vs. block-granularity splitting for packed qkv tensors, refactors `_pack_split_projections` to compute split-kind and expected leading dimensions dynamically, and updates assertions for q/k/v/z and b/a components.
Checkpoint Preprocessing Orchestration `tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`	`preprocess_weights` now calls `_normalize_scale_names` and uses the returned flag to conditionally invoke FP8 qkvz dequantization before delegating to superclass weight remapping.
Weight-Loading Scale Tensor Shape Normalization `tensorrt_llm/_torch/modules/linear.py`	Updates `FP8RowwiseLinearMethod` weight-scale loading across vanilla and fused modes to reshape per-channel `weight_scale` to flat 1-D tensors before copying, ensuring consistent scale tensor layout.
Quantization Module Exclusion `tensorrt_llm/_torch/models/modeling_qwen3_5.py`	Adds `*linear_attn.conv1d` to normalized exclude-modules pattern to exclude conv1d from NVFP4/FP8 quantization.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: generalizing FP8 checkpoint loading for Qwen3.5 model, which is confirmed by the raw summary showing FP8 normalization and checkpoint loading modifications.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description provides a clear summary of the issue and solution, explaining that the changes fix ad-hoc weight shape handling for Qwen3.5 FP8 checkpoints to support both blockwise and rowwise loading.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py (1)

52-58: ⚡ Quick win

Make _SPLIT_KIND_BY_SUFFIX immutable.

Ruff is already flagging this as RUF012. This mapping is treated like a constant, so leaving it as a mutable class attribute makes it shared and accidentally mutable across instances.

Proposed fix

+from types import MappingProxyType
+
 ...
-    _SPLIT_KIND_BY_SUFFIX = {
+    _SPLIT_KIND_BY_SUFFIX = MappingProxyType({
         "weight": "row",
         "bias": "row",
         "weight_scale": "row",
         "weight_scale_inv": "block",
-    }
+    })

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py` around
lines 52 - 58, The _SPLIT_KIND_BY_SUFFIX mapping is currently a mutable dict
used as a constant; make it immutable to satisfy RUF012 by replacing the mutable
dict with an immutable mapping (e.g., wrap the literal in types.MappingProxyType
or express it as a tuple of pairs) and update the assignment of
_SPLIT_KIND_BY_SUFFIX accordingly; import types.MappingProxyType if you choose
the mapping proxy approach and ensure all code that references
_SPLIT_KIND_BY_SUFFIX continues to use it read-only (refer to the
_SPLIT_KIND_BY_SUFFIX symbol in qwen3_5_weight_mapper.py and any places that
import or access it).

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 315-320: preprocess_weights dereferences
self.config.quant_config.quant_algo unconditionally causing AttributeError for
models without quant_config; change preprocess_weights to guard access by
checking if self.config.quant_config is not None (or use getattr) before reading
quant_algo, and pass a sensible default (e.g., None or a non-quant value) into
_normalize_scale_names; update references in preprocess_weights to use this
guarded quant_algo and ensure _normalize_scale_names handles a None/default
quant_algo similarly so non-quantized/BF16 Qwen3.5 checkpoints continue to load.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 52-58: The _SPLIT_KIND_BY_SUFFIX mapping is currently a mutable
dict used as a constant; make it immutable to satisfy RUF012 by replacing the
mutable dict with an immutable mapping (e.g., wrap the literal in
types.MappingProxyType or express it as a tuple of pairs) and update the
assignment of _SPLIT_KIND_BY_SUFFIX accordingly; import types.MappingProxyType
if you choose the mapping proxy approach and ensure all code that references
_SPLIT_KIND_BY_SUFFIX continues to use it read-only (refer to the
_SPLIT_KIND_BY_SUFFIX symbol in qwen3_5_weight_mapper.py and any places that
import or access it).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3fc486d0-fa60-491c-b93f-c67590ed3b85

📥 Commits

Reviewing files that changed from the base of the PR and between dcd4e90 and e8ea727.

📒 Files selected for processing (3)

tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
tensorrt_llm/_torch/models/modeling_qwen3_5.py
tensorrt_llm/_torch/modules/linear.py

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

amukkara requested review from a team as code owners June 8, 2026 00:06

amukkara requested review from Wanli-Jiang, mikeiovine and yechank-nvidia June 8, 2026 00:06

github-actions Bot assigned amukkara Jun 8, 2026

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py

Streamline FP8 checkpoint loading

6a8b6ae

Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>

amukkara force-pushed the qwen-comp-tensor branch from e8ea727 to 6a8b6ae Compare June 8, 2026 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][fix] Generalize FP8 checkpoint loading for Qwen3.5#15067

[None][fix] Generalize FP8 checkpoint loading for Qwen3.5#15067
amukkara wants to merge 1 commit into
NVIDIA:mainfrom
amukkara:qwen-comp-tensor

amukkara commented Jun 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amukkara commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amukkara commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading