[None][fix] Generalize FP8 checkpoint loading for Qwen3.5#15067
[None][fix] Generalize FP8 checkpoint loading for Qwen3.5#15067amukkara wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR refactors Qwen3.5 checkpoint weight normalization and projection packing to handle FP8 scale tensor shapes consistently. It introduces scale-name normalization during checkpoint preprocessing and updates weight-loading to reshape scale tensors to 1-D format across vanilla and fused linear modes. ChangesQwen3.5 FP8 Scale Normalization and Projection Packing
🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py (1)
52-58: ⚡ Quick winMake
_SPLIT_KIND_BY_SUFFIXimmutable.Ruff is already flagging this as
RUF012. This mapping is treated like a constant, so leaving it as a mutable class attribute makes it shared and accidentally mutable across instances.Proposed fix
+from types import MappingProxyType + ... - _SPLIT_KIND_BY_SUFFIX = { + _SPLIT_KIND_BY_SUFFIX = MappingProxyType({ "weight": "row", "bias": "row", "weight_scale": "row", "weight_scale_inv": "block", - } + })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py` around lines 52 - 58, The _SPLIT_KIND_BY_SUFFIX mapping is currently a mutable dict used as a constant; make it immutable to satisfy RUF012 by replacing the mutable dict with an immutable mapping (e.g., wrap the literal in types.MappingProxyType or express it as a tuple of pairs) and update the assignment of _SPLIT_KIND_BY_SUFFIX accordingly; import types.MappingProxyType if you choose the mapping proxy approach and ensure all code that references _SPLIT_KIND_BY_SUFFIX continues to use it read-only (refer to the _SPLIT_KIND_BY_SUFFIX symbol in qwen3_5_weight_mapper.py and any places that import or access it).Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 315-320: preprocess_weights dereferences
self.config.quant_config.quant_algo unconditionally causing AttributeError for
models without quant_config; change preprocess_weights to guard access by
checking if self.config.quant_config is not None (or use getattr) before reading
quant_algo, and pass a sensible default (e.g., None or a non-quant value) into
_normalize_scale_names; update references in preprocess_weights to use this
guarded quant_algo and ensure _normalize_scale_names handles a None/default
quant_algo similarly so non-quantized/BF16 Qwen3.5 checkpoints continue to load.
---
Nitpick comments:
In `@tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.py`:
- Around line 52-58: The _SPLIT_KIND_BY_SUFFIX mapping is currently a mutable
dict used as a constant; make it immutable to satisfy RUF012 by replacing the
mutable dict with an immutable mapping (e.g., wrap the literal in
types.MappingProxyType or express it as a tuple of pairs) and update the
assignment of _SPLIT_KIND_BY_SUFFIX accordingly; import types.MappingProxyType
if you choose the mapping proxy approach and ensure all code that references
_SPLIT_KIND_BY_SUFFIX continues to use it read-only (refer to the
_SPLIT_KIND_BY_SUFFIX symbol in qwen3_5_weight_mapper.py and any places that
import or access it).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3fc486d0-fa60-491c-b93f-c67590ed3b85
📒 Files selected for processing (3)
tensorrt_llm/_torch/models/checkpoints/hf/qwen3_5_weight_mapper.pytensorrt_llm/_torch/models/modeling_qwen3_5.pytensorrt_llm/_torch/modules/linear.py
Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>
e8ea727 to
6a8b6ae
Compare
Summary by CodeRabbit
New Features
Improvements
Description
Fixes few ad-hoc handling of weight shapes for Qwen3.5 FP8 checkpoints.
After this PR, both FP8 blockwise and FP8 rowwise (per-token per-channel) checkpoints will be loaded correctly.
Test Coverage
Existing FP8 test cases for Qwen3.5
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.