Skip to content

[None][fix] Generalize FP8 checkpoint loading for Qwen3.5#15067

Open
amukkara wants to merge 1 commit into
NVIDIA:mainfrom
amukkara:qwen-comp-tensor
Open

[None][fix] Generalize FP8 checkpoint loading for Qwen3.5#15067
amukkara wants to merge 1 commit into
NVIDIA:mainfrom
amukkara:qwen-comp-tensor

Conversation

@amukkara

@amukkara amukkara commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • New Features

    • Enhanced Qwen3.5 model checkpoint handling with improved tensor normalization and packing support.
  • Improvements

    • Refined FP8 quantization weight-scale loading for improved consistency across different loading modes.
    • Updated quantization exclusion rules for linear attention modules in model configuration.

Description

Fixes few ad-hoc handling of weight shapes for Qwen3.5 FP8 checkpoints.
After this PR, both FP8 blockwise and FP8 rowwise (per-token per-channel) checkpoints will be loaded correctly.

Test Coverage

Existing FP8 test cases for Qwen3.5

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR refactors Qwen3.5 checkpoint weight normalization and projection packing to handle FP8 scale tensor shapes consistently. It introduces scale-name normalization during checkpoint preprocessing and updates weight-loading to reshape scale tensors to 1-D format across vanilla and fused linear modes.

Changes

Qwen3.5 FP8 Scale Normalization and Projection Packing

Layer / File(s) Summary
FP8 Block-Scale Normalization
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
Adds QuantAlgo import and new _normalize_scale_names method that conditionally remaps FP8 block-scale tensors from 4D weight_scale to 2D weight_scale_inv, returning a flag for modelopt-native FP8 path preservation.
Projection Packing and Split-Kind Refactor
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
Introduces _SPLIT_KIND_BY_SUFFIX mapping to determine row vs. block-granularity splitting for packed qkv tensors, refactors _pack_split_projections to compute split-kind and expected leading dimensions dynamically, and updates assertions for q/k/v/z and b/a components.
Checkpoint Preprocessing Orchestration
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
preprocess_weights now calls _normalize_scale_names and uses the returned flag to conditionally invoke FP8 qkvz dequantization before delegating to superclass weight remapping.
Weight-Loading Scale Tensor Shape Normalization
tensorrt_llm/_torch/modules/linear.py
Updates FP8RowwiseLinearMethod weight-scale loading across vanilla and fused modes to reshape per-channel weight_scale to flat 1-D tensors before copying, ensuring consistent scale tensor layout.
Quantization Module Exclusion
tensorrt_llm/_torch/models/modeling_qwen3_5.py
Adds *linear_attn.conv1d to normalized exclude-modules pattern to exclude conv1d from NVFP4/FP8 quantization.

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: generalizing FP8 checkpoint loading for Qwen3.5 model, which is confirmed by the raw summary showing FP8 normalization and checkpoint loading modifications.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description provides a clear summary of the issue and solution, explaining that the changes fix ad-hoc weight shape handling for Qwen3.5 FP8 checkpoints to support both blockwise and rowwise loading.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py (1)

52-58: ⚡ Quick win

Make _SPLIT_KIND_BY_SUFFIX immutable.

Ruff is already flagging this as RUF012. This mapping is treated like a constant, so leaving it as a mutable class attribute makes it shared and accidentally mutable across instances.

Proposed fix
+from types import MappingProxyType
+
 ...
-    _SPLIT_KIND_BY_SUFFIX = {
+    _SPLIT_KIND_BY_SUFFIX = MappingProxyType({
         "weight": "row",
         "bias": "row",
         "weight_scale": "row",
         "weight_scale_inv": "block",
-    }
+    })
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py` around
lines 52 - 58, The _SPLIT_KIND_BY_SUFFIX mapping is currently a mutable dict
used as a constant; make it immutable to satisfy RUF012 by replacing the mutable
dict with an immutable mapping (e.g., wrap the literal in types.MappingProxyType
or express it as a tuple of pairs) and update the assignment of
_SPLIT_KIND_BY_SUFFIX accordingly; import types.MappingProxyType if you choose
the mapping proxy approach and ensure all code that references
_SPLIT_KIND_BY_SUFFIX continues to use it read-only (refer to the
_SPLIT_KIND_BY_SUFFIX symbol in qwen3_5_weight_mapper.py and any places that
import or access it).

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 315-320: preprocess_weights dereferences
self.config.quant_config.quant_algo unconditionally causing AttributeError for
models without quant_config; change preprocess_weights to guard access by
checking if self.config.quant_config is not None (or use getattr) before reading
quant_algo, and pass a sensible default (e.g., None or a non-quant value) into
_normalize_scale_names; update references in preprocess_weights to use this
guarded quant_algo and ensure _normalize_scale_names handles a None/default
quant_algo similarly so non-quantized/BF16 Qwen3.5 checkpoints continue to load.

---

Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 52-58: The _SPLIT_KIND_BY_SUFFIX mapping is currently a mutable
dict used as a constant; make it immutable to satisfy RUF012 by replacing the
mutable dict with an immutable mapping (e.g., wrap the literal in
types.MappingProxyType or express it as a tuple of pairs) and update the
assignment of _SPLIT_KIND_BY_SUFFIX accordingly; import types.MappingProxyType
if you choose the mapping proxy approach and ensure all code that references
_SPLIT_KIND_BY_SUFFIX continues to use it read-only (refer to the
_SPLIT_KIND_BY_SUFFIX symbol in qwen3_5_weight_mapper.py and any places that
import or access it).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3fc486d0-fa60-491c-b93f-c67590ed3b85

📥 Commits

Reviewing files that changed from the base of the PR and between dcd4e90 and e8ea727.

📒 Files selected for processing (3)
  • tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
  • tensorrt_llm/_torch/models/modeling_qwen3_5.py
  • tensorrt_llm/_torch/modules/linear.py

Comment thread tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
@amukkara amukkara force-pushed the qwen-comp-tensor branch from e8ea727 to 6a8b6ae Compare June 8, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant