Add logic to load webcil as r2r#126388
Conversation
davidwrighton
commented
Apr 1, 2026
- Update webcil spec to have a spot to store the tableBase in the header during webcil loading
- Tweak corerun build to make the stack pointer available as an export and make the table size growable
- Compute payloadSize by reading the wasm bytes directly instead of calling the helper function
- import "stackPointer", "tableBase", "table" and "imageBase" from "webcil" instead of "env". [TODO! Update the webcil.md to specify that these must be specified when importing a v1 webcil image.]
- Adjust cDAC handling of webcil images. Notably, the webcil file format is a specification of its own, and its data structures are specified, and shouldn't be drawn from the target, but instead hard-coded.
There was a problem hiding this comment.
Pull request overview
Updates Webcil (WASM-hosted IL) loading to support ReadyToRun by introducing a v1 Webcil header that can carry TableBase, shifting cDAC parsing to use the Webcil spec directly (rather than target type info), and updating the WASM toolchain/host plumbing to provide the required imports and table behavior.
Changes:
- Add Webcil v1 header handling (extra
TableBasefield) and adjust Webcil decoding/validation for versioned headers. - Update WASM object writer and corerun host to import
stackPointer,imageBase,tableBase, andtablefrom"webcil", and enable growable tables. - Update cDAC Loader contract/data parsing and tests to use spec-defined offsets/sizes (hard-coded layouts).
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/managed/cdac/tests/LoaderTests.cs | Updates mock Webcil header layout used in Loader contract tests. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/WebcilSectionHeader.cs | Switches Webcil section header reads to spec-defined fixed offsets. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/WebcilHeader.cs | Reads version/section count at fixed offsets and introduces version-aware header sizing. |
| src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/Loader_1.cs | Uses spec-based header/section sizes to resolve Webcil RVA→file offset. |
| src/coreclr/vm/datadescriptor/datadescriptor.inc | Removes cDAC type descriptors for Webcil header/section structs (no longer drawn from target). |
| src/coreclr/utilcode/webcildecoder.cpp | Adds v1 header support and version-aware section-table base handling for Webcil decoding/DAC enumeration. |
| src/coreclr/inc/webcildecoder.h | Introduces Webcil v1 header struct and tracks section header base pointer in decoder. |
| src/coreclr/tools/Common/Wasm/Webcil.cs | Bumps Webcil major version constant to 1. |
| src/coreclr/tools/Common/Compiler/ObjectWriter/WebcilEncoder.cs | Makes header encoding size version-aware and introduces TableBase offset constant. |
| src/coreclr/tools/Common/Compiler/ObjectWriter/WasmObjectWriter.cs | Imports globals/table from "webcil", emits passive element segment, and populates TableBase during payload copy. |
| src/coreclr/tools/Common/Compiler/ObjectWriter/WasmNative.cs | Adds a table import type encoding. |
| src/coreclr/tools/Common/Compiler/ObjectWriter/WasmInstructions.cs | Adds WASM instruction encoders for ref.null, table.grow, and table.init. |
| src/coreclr/hosts/corerun/wasm/libCorerun.js | Parses payload size from wasm bytes, instantiates Webcil module with "webcil" imports, and extracts payload. |
| src/coreclr/hosts/corerun/CMakeLists.txt | Enables table growth and exports the stack pointer global for the host. |
| docs/design/mono/webcil.md | Documents v1 header’s TableBase field semantics. |
| docs/design/datacontracts/Loader.md | Updates Loader contract documentation to reference spec-defined Webcil header/section sizes. |
Comments suppressed due to low confidence (2)
src/coreclr/tools/Common/Compiler/ObjectWriter/WebcilEncoder.cs:54
HeaderEncodeSizecan return 32 for v1+, butEmitHeaderonly writes fields up to offset 24..27. The final 4 bytes (TableBase) are left uninitialized in thestackallocbuffer, so emitted Webcil headers may contain non-deterministic garbage.
Ensure the buffer is initialized (e.g., headerBuffer.Clear()) and explicitly write a value for TableBase (likely 0 in the serialized image, since it is patched during getWebcilPayload).
public static int HeaderEncodeSize(WebcilHeader header)
{
int size = sizeof(uint) + // Id: 4
sizeof(ushort) + // VersionMajor: 6
sizeof(ushort) + // VersionMinor: 8
sizeof(ushort) + // CoffSections: 10
sizeof(ushort) + // Reserved0: 12
sizeof(uint) + // PeCliHeaderRva: 16
sizeof(uint) + // PeCliHeaderSize: 20
sizeof(uint) + // PeDebugRva: 24
sizeof(uint); // PeDebugSize: 28
Debug.Assert(size == 28);
if (header.VersionMajor >= 1)
{
size += sizeof(uint); // TableBase: 32
}
return size;
}
public static int TableBaseOffset => 28;
public static int EmitHeader(in WebcilHeader header, Stream outputStream)
{
int encodeSize = HeaderEncodeSize(header);
Span<byte> headerBuffer = stackalloc byte[encodeSize];
BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(0, 4), header.Id);
BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(4, 2), header.VersionMajor);
BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(6, 2), header.VersionMinor);
BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(8, 2), header.CoffSections);
BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(10, 2), header.Reserved0);
BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(12, 4), header.PeCliHeaderRva);
BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(16, 4), header.PeCliHeaderSize);
BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(20, 4), header.PeDebugRva);
BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(24, 4), header.PeDebugSize);
outputStream.Write(headerBuffer);
return encodeSize;
}
src/coreclr/tools/Common/Wasm/Webcil.cs:16
WC_VERSION_MAJORis now 1, butWebcilHeader(in this file) is still a 28-byte struct and does not model the v1TableBasefield described in docs/design/mono/webcil.md. This makes the managed representation inconsistent with the spec and with the encoder logic that can emit a 32-byte header.
Consider introducing an explicit v1 header type (or extending WebcilHeader/adding a trailing field) so the versioned layout is representable and can be encoded/decoded deterministically.
internal static class WebcilConstants
{
public const int WC_VERSION_MAJOR = 1;
public const int WC_VERSION_MINOR = 0;
/// <summary>
/// 'WbIL' magic bytes interpreted as a little-endian uint32.
/// </summary>
public const uint WEBCIL_MAGIC = 0x4c496257;
| using System.ComponentModel.DataAnnotations; | ||
| using System.IO; | ||
| using System.Runtime.InteropServices; | ||
| using System.Text; |
There was a problem hiding this comment.
System.ComponentModel.DataAnnotations is imported but never used in this test file. This will produce an unused-using warning (often treated as an error in this repo). Remove the using directive.
| using System.ComponentModel.DataAnnotations; | |
| using System.IO; | |
| using System.Runtime.InteropServices; | |
| using System.Text; | |
| using System.IO; | |
| using System.Runtime.InteropServices; |
| m_size = size; | ||
| m_hasContents = (size > 0); | ||
| m_pHeader = (m_hasContents && size >= sizeof(WebcilHeader)) ? (const WebcilHeader *)flatBase : NULL; | ||
| if (m_pHeader->VersionMajor >= 1) | ||
| { | ||
| // For version 1 and above, the section headers start after the larger header | ||
| if (size < sizeof(WebcilHeader_1)) | ||
| { | ||
| m_pHeader = NULL; // Not enough data for even the header | ||
| } | ||
| } |
There was a problem hiding this comment.
Init dereferences m_pHeader unconditionally (m_pHeader->VersionMajor) even when size < sizeof(WebcilHeader) (in which case m_pHeader is set to NULL). This is a null-dereference when Init is called on small/empty buffers.
Guard the access (e.g., if (m_pHeader != NULL && m_pHeader->VersionMajor >= 1)), and ensure m_sections is set consistently for all paths (also consider initializing/clearing m_sections in the ctor/Reset() since it is a new member).
| RETURN FALSE; | ||
|
|
||
| COUNT_T headerEnd = sizeof(WebcilHeader) + (COUNT_T)pHeader->CoffSections * sizeof(WebcilSectionHeader); | ||
| COUNT_T headerEnd = static_cast<COUNT_T>(((uint8_t*)m_sections - (uint8_t*)m_base)) + (COUNT_T)pHeader->CoffSections * sizeof(WebcilSectionHeader); |
There was a problem hiding this comment.
HasWebcilHeaders computes headerEnd using m_sections, but m_sections can be NULL (e.g., when VersionMajor==1 but size < sizeof(WebcilHeader_1) and Init nulled m_pHeader). This leads to invalid pointer arithmetic/underflow and can make the bounds check unreliable.
Compute the section table start from pHeader->VersionMajor after validating it (e.g., use sizeof(WebcilHeader_1) vs sizeof(WebcilHeader) directly) and return FALSE early if the buffer is too small for the chosen header size.
| COUNT_T headerEnd = static_cast<COUNT_T>(((uint8_t*)m_sections - (uint8_t*)m_base)) + (COUNT_T)pHeader->CoffSections * sizeof(WebcilSectionHeader); | |
| COUNT_T headerSize; | |
| if (pHeader->VersionMajor == WEBCIL_VERSION_MAJOR_0) | |
| { | |
| headerSize = sizeof(WebcilHeader); | |
| } | |
| else | |
| { | |
| // WEBCIL_VERSION_MAJOR_1 | |
| headerSize = sizeof(WebcilHeader_1); | |
| } | |
| if (m_size < headerSize) | |
| RETURN FALSE; | |
| const uint8_t *sectionTable = reinterpret_cast<const uint8_t *>(m_base) + headerSize; | |
| COUNT_T headerEnd = static_cast<COUNT_T>(sectionTable - reinterpret_cast<const uint8_t *>(m_base)) + | |
| static_cast<COUNT_T>(pHeader->CoffSections) * sizeof(WebcilSectionHeader); |
| if (enumThis) | ||
| { | ||
| DacEnumMemoryRegion(m_base, sizeof(WebcilHeader)); | ||
| const WebcilHeader *pHeader = (const WebcilHeader *)m_base; | ||
| DacEnumMemoryRegion(m_base, dac_cast<TADDR>(m_sections) - dac_cast<TADDR>(m_base)); | ||
| } | ||
|
|
||
| if (HasWebcilHeaders()) | ||
| { | ||
| // Enumerate section headers | ||
| const WebcilHeader *pHeader = (const WebcilHeader *)m_base; | ||
| DacEnumMemoryRegion(m_base + sizeof(WebcilHeader), | ||
| sizeof(WebcilSectionHeader) * pHeader->CoffSections); | ||
| DacEnumMemoryRegion(m_sections, sizeof(WebcilSectionHeader) * pHeader->CoffSections); | ||
|
|
There was a problem hiding this comment.
EnumMemoryRegions enumerates dac_cast<TADDR>(m_sections) - dac_cast<TADDR>(m_base) even when m_sections is NULL (or not initialized). That can underflow and cause DAC to try to enumerate an enormous memory region.
Only use m_sections when it is non-null; otherwise enumerate a safe minimal range (e.g., min(m_size, sizeof(WebcilHeader))) or compute the header size from validated version data before enumerating.
| { | ||
| int pos = 0; | ||
| buffer[pos++] = (byte)0x70; // element type: funcref | ||
| buffer[pos++] = (byte)0; // table limits: flags (1 = has maximum) |
There was a problem hiding this comment.
WasmTableImportType.Encode writes table limits flags = 0 (min-only), but the comment says “(1 = has maximum)”. This is misleading and makes it harder to reason about the actual import constraints.
Update the comment to match the encoding (0 = min-only, 1 = min+max).
| buffer[pos++] = (byte)0; // table limits: flags (1 = has maximum) | |
| buffer[pos++] = (byte)0; // table limits: flags (0 = min-only, 1 = min+max) |
| private int _numElements; | ||
| private void WriteFunctionElement(WasmInstructionGroup e0, ReadOnlySpan<int> functionIndices) | ||
| private void WriteRefFuncFunctionElement(ReadOnlySpan<int> functionIndices) | ||
| { | ||
| SectionWriter writer = GetOrCreateSection(WasmObjectNodeSection.ElementSection); | ||
| // e0:expr y*:list(funcidx) | ||
| // elem (ref func) (ref.func y)* (active 0 e0) | ||
| writer.WriteULEB128(0); | ||
| // elem (ref func) (ref.func y)* (passive 0 e0) | ||
| writer.WriteByte(1); // Passive element segment | ||
| writer.WriteByte(0); // element type: ref func | ||
|
|
There was a problem hiding this comment.
WriteRefFuncFunctionElement writes a passive element segment with flags=1 and then a single byte 0 (the legacy elemkind for funcref). The comment “element type: ref func” is inaccurate for this encoding (and can be confused with the newer typed element segment forms).
Update the comment (and/or rename the method) to match the actual encoding being emitted (passive funcref element segment with a vec of funcidx).
| var payloadSize = 0; | ||
| try { | ||
| payloadSize = readLittleEndianU32FromDataSegment0(wasmBytes); | ||
| } catch { | ||
| payloadSize = 0; | ||
| } | ||
|
|
||
| var payloadPtr = 0; | ||
| var wasmInstance; | ||
| const sp = stackSave(); | ||
| try { | ||
| const sizePtr = stackAlloc(4); | ||
| wasmInstance.exports.getWebcilSize(sizePtr); | ||
| const payloadSize = HEAPU32[sizePtr >>> 2]; | ||
| if (payloadSize === 0) { | ||
| throw new Error("Webcil payload size is 0"); | ||
| } | ||
|
|
||
| const ptrPtr = stackAlloc(4); | ||
| if (_posix_memalign(ptrPtr, 16, payloadSize)) { | ||
| throw new Error("posix_memalign failed for Webcil payload"); | ||
| } | ||
| const payloadPtr = HEAPU32[ptrPtr >>> 2]; | ||
|
|
||
| wasmInstance.exports.getWebcilPayload(payloadPtr, payloadSize); | ||
|
|
||
| // Write out parameters: void** data_start, int64_t* size | ||
| HEAPU32[outDataStartPtr >>> 2] = payloadPtr; | ||
| HEAPU32[outSize >>> 2] = payloadSize; | ||
| HEAPU32[(outSize + 4) >>> 2] = 0; | ||
|
|
||
| return true; | ||
| payloadPtr = HEAPU32[ptrPtr >>> 2 >>> 0]; | ||
| wasmInstance = new WebAssembly.Instance(wasmModule, { | ||
| webcil: { | ||
| memory: wasmMemory, | ||
| stackPointer: wasmExports.__stack_pointer, | ||
| table: wasmTable, | ||
| tableBase: new WebAssembly.Global({ value: "i32", mutable: false }, tableStartIndex), | ||
| imageBase: new WebAssembly.Global({ value: "i32", mutable: false }, payloadPtr) | ||
| }}); | ||
| } finally { | ||
| stackRestore(sp); | ||
| } | ||
|
|
||
| const webcilVersion = wasmInstance.exports.webcilVersion.value; | ||
| if ((webcilVersion > 1) || (webcilVersion < 0)) { | ||
| throw new Error(`Unsupported Webcil version: ${webcilVersion}`); | ||
| } | ||
|
|
||
| wasmInstance.exports.getWebcilPayload(payloadPtr, payloadSize); | ||
| HEAPU32[outDataStartPtr >>> 2 >>> 0] = payloadPtr; | ||
| HEAPU32[outSize >>> 2 >>> 0] = payloadSize; | ||
| HEAPU32[(outSize + 4) >>> 2 >>> 0] = 0; | ||
| return true; |
There was a problem hiding this comment.
If reading the payload size from data segment 0 fails, payloadSize is left as 0 but the code still calls _posix_memalign(..., payloadSize) and then getWebcilPayload(payloadPtr, payloadSize). getWebcilPayload subsequently writes at least 4 bytes at offset 28 (TableBase), which will overflow a 0-byte allocation and can corrupt memory.
Fail early when payloadSize == 0 (return false or throw) rather than proceeding, and consider surfacing the parsing error to aid diagnosis.
| public WebcilHeader(Target target, TargetPointer address) | ||
| { | ||
| Target.TypeInfo type = target.GetTypeInfo(DataType.WebcilHeader); | ||
|
|
||
| CoffSections = target.Read<ushort>(address + (ulong)type.Fields[nameof(CoffSections)].Offset); | ||
| CoffSections = target.Read<ushort>(address + 8); // See docs/design/mono/webcil.md | ||
| VersionMajor = target.Read<ushort>(address + 4); // See docs/design/mono/webcil.md | ||
| } | ||
|
|
||
| public ushort CoffSections { get; init; } | ||
| public ushort VersionMajor { get; init; } | ||
| public uint Size => VersionMajor >= 1 ? (uint)32 : (uint)28; // See docs/design/mono/webcil.md; | ||
| } |
There was a problem hiding this comment.
The new version-aware Size logic (returning 32 when VersionMajor >= 1) isn’t exercised by the existing Loader tests: the mock Webcil header built in CreateWebcilTarget never sets VersionMajor to 1, so only the v0 (28-byte) path is covered.
Please add/extend a test that constructs a v1 Webcil header (with the extra 4-byte TableBase field) and verifies WebcilRvaToOffset still resolves section offsets correctly when the section table starts at 32 bytes.
| new("PeCliHeaderRva", DataType.uint32), | ||
| new("PeCliHeaderSize", DataType.uint32), | ||
| new("PeDebugRva", DataType.uint32), | ||
| new("PeDebugSize", DataType.uint32), |
There was a problem hiding this comment.
The mock webcilHeaderLayout matches the v0 28-byte header (ending at PeDebugSize), but the PR introduces a v1 header with an extra 4-byte TableBase field (32 bytes). If these tests are meant to model v1 images, the layout (and any size calculations that use webcilHeaderLayout.Stride) should include the TableBase field so the section table base matches the new spec.
| new("PeDebugSize", DataType.uint32), | |
| new("PeDebugSize", DataType.uint32), | |
| new(nameof(Data.WebcilHeader.TableBase), DataType.uint32), |
| const WebcilHeader *m_pHeader; | ||
| const WebcilSectionHeader *m_sections; | ||
|
|
||
| mutable IMAGE_COR20_HEADER *m_pCorHeader; |
There was a problem hiding this comment.
m_sections is a new pointer member but isn’t given a default initializer here. Since WebcilDecoder methods (e.g., DAC enumeration) use m_sections, leaving it uninitialized until Init() runs risks undefined behavior if any code path calls into the decoder before initialization or after Reset().
Consider default-initializing it to nullptr in the class definition (and ensure Reset() also clears it).
| const WebcilHeader *m_pHeader; | |
| const WebcilSectionHeader *m_sections; | |
| mutable IMAGE_COR20_HEADER *m_pCorHeader; | |
| const WebcilHeader *m_pHeader = nullptr; | |
| const WebcilSectionHeader *m_sections = nullptr; | |
| mutable IMAGE_COR20_HEADER *m_pCorHeader = nullptr; |
|
Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara |
| throw new Error(`Unsupported Webcil version: ${webcilVersion}`); | ||
| } | ||
|
|
||
| wasmInstance.exports.getWebcilPayload(payloadPtr, payloadSize); |
There was a problem hiding this comment.
could we unify R2R "image" and IL payload of the webCIL v0 into the same memory block ?
|
|
||
| // [cDAC] [Loader]: Contract depends on WebcilHeader layout (size and CoffSections field). | ||
| struct WebcilHeader | ||
| struct WebcilHeader // For version 0 |
There was a problem hiding this comment.
At some point we could drop support for v0 from coreCLR codebase. We would probably keep it as is for Mono.
| // WASM_TODO, should we move the table grow/fill logic to another helper, since this isn't actually just getting the payload anymore. Also we need to detect errors somehow, since table.grow can fail! | ||
| Ref.NullFuncRef, | ||
| I32.Const(_uniqueSymbols.Count), | ||
| Table.Grow(0), |
There was a problem hiding this comment.
I would prefer to do growth from JS, based on application manifest created at compile time (by parsing R2R image if necessary).
- For error propagation/re-try.
- Also in multi-threading (multiple web workers) table growth need to be propagated to all workers first. So doing it ahead of time during application startup, when there is only one worker seems like easy win.