[PyTorch][torch.compile] Make quantizers opaque value objects by pggPL · Pull Request #7 · pggPL/TransformerEngine

pggPL · 2026-06-06T12:14:03Z

Description

Tensorless quantizers in TE (MXFP8, FP8 blockwise, FP8 current-scaling, NVFP4)
are fully described by a handful of plain, reproducible scalars — they hold no
live tensors and no process groups. This PR turns them into opaque value
objects so torch.compile can treat them as baked-in constants: two
quantizers with the same configuration become interchangeable, hashable, and
reconstructible inside an FX graph.

Quantizers that hold live state (delayed-scaling Float8Quantizer, which keeps
scale/amax tensors) and any user-defined quantizer keep the default
identity semantics, so the change is opt-in and backward compatible. On older
PyTorch builds without the opaque-object API the registration is a graceful
no-op.

Along the way this also un-breaks the existing test_torch_compile.py suite:
that file lived on main but was never wired into CI, and its
test_autocast_nested_custom case (nested te.autocast with multiple
CustomRecipe instances) was failing because of the CustomRecipe state-caching
bug fixed here. The file is now run in CI and passes.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Add opt-in value-object identity to the base Quantizer
(_value_fields / _value_key / __eq__ / __hash__). Returning None
from _value_fields() (the default) keeps identity semantics.
New module transformer_engine/pytorch/dynamo.py holding the
torch.compile glue: __fx_repr__, value-key reconstruction and
register_value_opaque_quantizer (gracefully no-op without PyTorch's
opaque-object API).
Register MXFP8Quantizer, Float8BlockQuantizer,
Float8CurrentScalingQuantizer and NVFP4Quantizer as value opaque types
(the deprecated amax_reduction_group is never part of the value).
Fix CustomRecipe state caching in TransformerEngineBaseModule.set_meta_tensor:
rebuild quantizers when the CustomRecipe instance changes (e.g. nested
te.autocast regions) instead of reusing the first recipe's state, since
every CustomRecipe shares the CustomRecipeState type but carries its own
qfactory. This fixes the previously-failing test_autocast_nested_custom.
Enable tests/pytorch/test_torch_compile.py in the L0_pytorch_unittest QA
suite (it existed on main but was never run in CI), and add the quantizer
value-object tests to it. Bringing it into CI required fixing the existing
CustomRecipe torch.compile path: the qfactory now dispatches on
QuantizerRole.tensor_type supplied by ToyLinear.get_quantizer_roles.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

…ompile Give tensorless quantizers (MXFP8, FP8 blockwise, FP8 current-scaling, NVFP4) value-object semantics so torch.compile can treat them as baked-in constants: - Add opt-in value identity to the base Quantizer (_value_fields / _value_key / __eq__ / __hash__). Quantizers holding live tensors (delayed-scaling Float8Quantizer) and custom quantizers keep identity semantics. - New transformer_engine/pytorch/dynamo.py houses the torch.compile glue: __fx_repr__, value-key reconstruction and register_value_opaque_quantizer (gracefully a no-op on PyTorch builds without the opaque-object API). - Register the four tensorless quantizers as value opaque types. Also fix CustomRecipe state caching in TransformerEngineBaseModule: set_meta_tensor now rebuilds quantizers when the CustomRecipe instance changes (e.g. nested te.autocast regions) instead of reusing the first recipe's state, since every CustomRecipe shares the CustomRecipeState type but carries its own qfactory. Move the quantizer value-object tests into tests/pytorch/test_torch_compile.py and add that file to the L0 pytorch unittest QA suite. Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

…globals Follow-up to the value-opaque quantizer support: - Remove the module-level _QUANTIZER_VALUE_REGISTRY (qualname -> class) and _quantizer_from_value_key. __fx_repr__ now captures the quantizer class directly in the FX globals and reconstructs via _rebuild_quantizer(cls, items), matching how PyTorch's own value opaque types (e.g. DTensor placements) reconstruct themselves. This removes global mutable state and the qualname collision risk. - Consolidate the quantizer value-object tests in test_torch_compile.py down to two functions and exercise reconstruction through the public __fx_repr__ path instead of internal helpers. Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

Replace the single dynamo.py module with a dynamo/ package so the torch.compile glue can grow with a clear responsibility split across the stacked branches. This branch owns the value-opaque quantizer layer. * dynamo/quantizer_opaque.py -- register_value_opaque_quantizer and helpers * dynamo/__init__.py -- re-exports the public API so callers keep importing from transformer_engine.pytorch.dynamo unchanged Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

A value-opaque quantizer must not carry live distributed state. Scan the quantizer attributes in __fx_repr__ and raise TypeError if any holds a torch.distributed.ProcessGroup (e.g. a non-None deprecated amax_reduction_group), so it cannot be silently baked into a torch.compile FX graph. Clarify the related comments accordingly. Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

NVFP4Quantizer is registered as a value-opaque quantizer but was missing from the value-semantics / __fx_repr__ round-trip test. Add it to _VALUE_QUANTIZERS (skipped without CUDA, which it needs to construct). Signed-off-by: Pawel Gadzinski <pgadzinski@nvidia.com>

kshitij12345

LGTM

kshitij12345 · 2026-06-09T10:29:32Z

+    repr_str, globals_ = a.__fx_repr__()
+    rebuilt = eval(repr_str, dict(globals_))  # pylint: disable=eval-used
+    assert rebuilt == a and rebuilt is not a
+    assert hash(rebuilt) == hash(a)


It would be good to also test that torch.compile(fullgraph=True) + quantizer to verify that registration actually worked and won't be broken.

def fn(quantizer): return quantizer torch.compile(fn, fullgraph=True)(some_quantizer)

kshitij12345 · 2026-06-09T10:30:40Z

+
+    try:
+        register_opaque_type(cls, typ="value")
+    except (ImportError, AttributeError, RuntimeError, TypeError):


I don't think we should catch ImportError and AttributeError here.

pggPL added 5 commits June 6, 2026 14:11

kshitij12345 approved these changes Jun 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch][torch.compile] Make quantizers opaque value objects#7

[PyTorch][torch.compile] Make quantizers opaque value objects#7
pggPL wants to merge 5 commits into
remove_process_group_from_quantizersfrom
make_qunatizers_opaque

pggPL commented Jun 6, 2026 •

edited

Loading

Uh oh!

kshitij12345 left a comment

Uh oh!

kshitij12345 Jun 9, 2026

Uh oh!

kshitij12345 Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pggPL commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Changes

Checklist:

Uh oh!

kshitij12345 left a comment

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

kshitij12345 Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pggPL commented Jun 6, 2026 •

edited

Loading