Skip to content

JIT: Fix disasm of reg to reg SIMD narrowing/widening instructions#126371

Open
saucecontrol wants to merge 2 commits intodotnet:mainfrom
saucecontrol:movzxbw-disasm
Open

JIT: Fix disasm of reg to reg SIMD narrowing/widening instructions#126371
saucecontrol wants to merge 2 commits intodotnet:mainfrom
saucecontrol:movzxbw-disasm

Conversation

@saucecontrol
Copy link
Copy Markdown
Member

Fixes #126363

Diff for the current vector byte multiply implementation:

        62F17C481002         vmovups  zmm0, zmmword ptr [rdx]
        62F17C4828C8         vmovaps  zmm1, zmm0
-       62F27D4820C9         vpmovsxbw zmm1, zmm1
+       62F27D4820C9         vpmovsxbw zmm1, ymm1
        62D17C481010         vmovups  zmm2, zmmword ptr [r8]
        62F17C4828DA         vmovaps  zmm3, zmm2
-       62F27D4820DB         vpmovsxbw zmm3, zmm3
+       62F27D4820DB         vpmovsxbw zmm3, ymm3
        62F16548D5C9         vpmullw  zmm1, zmm3, zmm1
-       62F27E4830C9         vpmovwb  zmm1, zmm1
+       62F27E4830C9         vpmovwb  ymm1, zmm1
        62F37D483BC001       vextracti32x8 ymm0, zmm0, 1
-       62F27D4820C0         vpmovsxbw zmm0, zmm0
+       62F27D4820C0         vpmovsxbw zmm0, ymm0
        62F37D483BD201       vextracti32x8 ymm2, zmm2, 1
-       62F27D4820D2         vpmovsxbw zmm2, zmm2
+       62F27D4820D2         vpmovsxbw zmm2, ymm2
        62F16D48D5C0         vpmullw  zmm0, zmm2, zmm0
-       62F27E4830C0         vpmovwb  zmm0, zmm0
+       62F27E4830C0         vpmovwb  ymm0, zmm0
        62F375483AC001       vinserti32x8 zmm0, zmm1, ymm0, 1
        62F17C481101         vmovups  zmmword ptr [rcx], zmm0
        488BC1               mov      rax, rcx
        C5F877               vzeroupper 
        C3                   ret      

Copilot AI review requested due to automatic review settings March 31, 2026 19:21
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 31, 2026
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 31, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the xarch JIT instruction disassembly formatting to correctly display operand register widths for SIMD widening/narrowing instructions (e.g., showing ymm sources for zmm widening, and ymm targets for zmm narrowing), addressing incorrect/invalid-looking disasm output reported in #126363.

Changes:

  • Adjust disasm operand-size selection for reg-to-reg SIMD widening and narrowing/conversion instructions by deriving source/target emitAttr from the instruction size.
  • Refactor IF_RRW_* reg/reg display logic to compute tgtAttr/srcAttr and print via a single shared path (preserving special-case handling for bt and APX NDD shift forms).
  • Add vcvtps2ph handling in the reg/reg/imm display path to account for differing source/target widths.

@saucecontrol
Copy link
Copy Markdown
Member Author

cc @dotnet/jit-contrib

No code diffs, just disasm text fixes.

printf("%s", emitRegName(id->idReg1(), attr));
printf(", %s", emitRegName(id->idReg2(), attr));
emitDispShift(ins, (BYTE)0);
srcAttr = static_cast<emitAttr>(std::max(16U, static_cast<uint32_t>(EA_SIZE(attr)) / 2U));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits:

  1. This touches a whole lot more than just what the title says it does. That is fine, but the description should likely be more accurate since its doing a broader cleanup/simplification of this

  2. The widening ones look like something that can rather be handled via the Tuple Type information from the instruction, as TT_HALF_MEM, TT_QUARTER_MEM, etc cover this already and avoid the need to handle a ton of additional instructions in the switch. The narrowing could likely be handled between the Input_*Bit and KMask_Base* information in turn.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started down that path, but I couldn't come up with a rule that worked for everything. e.g. older instructions like cvtpd2ps, which is TT_FULL, Input_64bit, KMask_Base2, giving no indication that it's narrowing.

I'm not sure it's any cleaner to generalize it once you account for the exceptions, though the newer instructions do tend to be consistent, so the exception count may be low.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the biggest thing is that it reduces risk that we build up new cases in the future or at least limits it to the narrowing scenarios. It also streamlines the switch table and so should be better for throughput when disasm is used (although that's likely not super impactful).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll see what I can come up with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect JIT disasm for ConvertToVector512*

3 participants