Skip to content

Add logic to load webcil as r2r#126388

Draft
davidwrighton wants to merge 2 commits intodotnet:mainfrom
davidwrighton:add_logic_to_load_webcil_as_r2r
Draft

Add logic to load webcil as r2r#126388
davidwrighton wants to merge 2 commits intodotnet:mainfrom
davidwrighton:add_logic_to_load_webcil_as_r2r

Conversation

@davidwrighton
Copy link
Copy Markdown
Member

  • Update webcil spec to have a spot to store the tableBase in the header during webcil loading
  • Tweak corerun build to make the stack pointer available as an export and make the table size growable
  • Compute payloadSize by reading the wasm bytes directly instead of calling the helper function
  • import "stackPointer", "tableBase", "table" and "imageBase" from "webcil" instead of "env". [TODO! Update the webcil.md to specify that these must be specified when importing a v1 webcil image.]
  • Adjust cDAC handling of webcil images. Notably, the webcil file format is a specification of its own, and its data structures are specified, and shouldn't be drawn from the target, but instead hard-coded.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Webcil (WASM-hosted IL) loading to support ReadyToRun by introducing a v1 Webcil header that can carry TableBase, shifting cDAC parsing to use the Webcil spec directly (rather than target type info), and updating the WASM toolchain/host plumbing to provide the required imports and table behavior.

Changes:

  • Add Webcil v1 header handling (extra TableBase field) and adjust Webcil decoding/validation for versioned headers.
  • Update WASM object writer and corerun host to import stackPointer, imageBase, tableBase, and table from "webcil", and enable growable tables.
  • Update cDAC Loader contract/data parsing and tests to use spec-defined offsets/sizes (hard-coded layouts).

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
src/native/managed/cdac/tests/LoaderTests.cs Updates mock Webcil header layout used in Loader contract tests.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/WebcilSectionHeader.cs Switches Webcil section header reads to spec-defined fixed offsets.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Data/WebcilHeader.cs Reads version/section count at fixed offsets and introduces version-aware header sizing.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/Loader_1.cs Uses spec-based header/section sizes to resolve Webcil RVA→file offset.
src/coreclr/vm/datadescriptor/datadescriptor.inc Removes cDAC type descriptors for Webcil header/section structs (no longer drawn from target).
src/coreclr/utilcode/webcildecoder.cpp Adds v1 header support and version-aware section-table base handling for Webcil decoding/DAC enumeration.
src/coreclr/inc/webcildecoder.h Introduces Webcil v1 header struct and tracks section header base pointer in decoder.
src/coreclr/tools/Common/Wasm/Webcil.cs Bumps Webcil major version constant to 1.
src/coreclr/tools/Common/Compiler/ObjectWriter/WebcilEncoder.cs Makes header encoding size version-aware and introduces TableBase offset constant.
src/coreclr/tools/Common/Compiler/ObjectWriter/WasmObjectWriter.cs Imports globals/table from "webcil", emits passive element segment, and populates TableBase during payload copy.
src/coreclr/tools/Common/Compiler/ObjectWriter/WasmNative.cs Adds a table import type encoding.
src/coreclr/tools/Common/Compiler/ObjectWriter/WasmInstructions.cs Adds WASM instruction encoders for ref.null, table.grow, and table.init.
src/coreclr/hosts/corerun/wasm/libCorerun.js Parses payload size from wasm bytes, instantiates Webcil module with "webcil" imports, and extracts payload.
src/coreclr/hosts/corerun/CMakeLists.txt Enables table growth and exports the stack pointer global for the host.
docs/design/mono/webcil.md Documents v1 header’s TableBase field semantics.
docs/design/datacontracts/Loader.md Updates Loader contract documentation to reference spec-defined Webcil header/section sizes.
Comments suppressed due to low confidence (2)

src/coreclr/tools/Common/Compiler/ObjectWriter/WebcilEncoder.cs:54

  • HeaderEncodeSize can return 32 for v1+, but EmitHeader only writes fields up to offset 24..27. The final 4 bytes (TableBase) are left uninitialized in the stackalloc buffer, so emitted Webcil headers may contain non-deterministic garbage.

Ensure the buffer is initialized (e.g., headerBuffer.Clear()) and explicitly write a value for TableBase (likely 0 in the serialized image, since it is patched during getWebcilPayload).

        public static int HeaderEncodeSize(WebcilHeader header)
        {
            int size = sizeof(uint) +    // Id: 4
                    sizeof(ushort) +     // VersionMajor: 6
                    sizeof(ushort) +     // VersionMinor: 8
                    sizeof(ushort) +     // CoffSections: 10
                    sizeof(ushort) +     // Reserved0: 12
                    sizeof(uint) +       // PeCliHeaderRva: 16
                    sizeof(uint) +       // PeCliHeaderSize: 20
                    sizeof(uint) +       // PeDebugRva: 24
                    sizeof(uint);        // PeDebugSize: 28
            Debug.Assert(size == 28);
            if (header.VersionMajor >= 1)
            {
                size += sizeof(uint); // TableBase: 32
            }
            return size;
        }

        public static int TableBaseOffset => 28;

        public static int EmitHeader(in WebcilHeader header, Stream outputStream)
        {
            int encodeSize = HeaderEncodeSize(header);
            Span<byte> headerBuffer = stackalloc byte[encodeSize];
            BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(0, 4), header.Id);
            BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(4, 2), header.VersionMajor);
            BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(6, 2), header.VersionMinor);
            BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(8, 2), header.CoffSections);
            BinaryPrimitives.WriteUInt16LittleEndian(headerBuffer.Slice(10, 2), header.Reserved0);
            BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(12, 4), header.PeCliHeaderRva);
            BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(16, 4), header.PeCliHeaderSize);
            BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(20, 4), header.PeDebugRva);
            BinaryPrimitives.WriteUInt32LittleEndian(headerBuffer.Slice(24, 4), header.PeDebugSize);

            outputStream.Write(headerBuffer);
            return encodeSize;
        }

src/coreclr/tools/Common/Wasm/Webcil.cs:16

  • WC_VERSION_MAJOR is now 1, but WebcilHeader (in this file) is still a 28-byte struct and does not model the v1 TableBase field described in docs/design/mono/webcil.md. This makes the managed representation inconsistent with the spec and with the encoder logic that can emit a 32-byte header.

Consider introducing an explicit v1 header type (or extending WebcilHeader/adding a trailing field) so the versioned layout is representable and can be encoded/decoded deterministically.

internal static class WebcilConstants
{
    public const int WC_VERSION_MAJOR = 1;
    public const int WC_VERSION_MINOR = 0;

    /// <summary>
    /// 'WbIL' magic bytes interpreted as a little-endian uint32.
    /// </summary>
    public const uint WEBCIL_MAGIC = 0x4c496257;

Comment on lines +6 to 9
using System.ComponentModel.DataAnnotations;
using System.IO;
using System.Runtime.InteropServices;
using System.Text;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System.ComponentModel.DataAnnotations is imported but never used in this test file. This will produce an unused-using warning (often treated as an error in this repo). Remove the using directive.

Suggested change
using System.ComponentModel.DataAnnotations;
using System.IO;
using System.Runtime.InteropServices;
using System.Text;
using System.IO;
using System.Runtime.InteropServices;

Copilot uses AI. Check for mistakes.
Comment on lines 56 to +66
m_size = size;
m_hasContents = (size > 0);
m_pHeader = (m_hasContents && size >= sizeof(WebcilHeader)) ? (const WebcilHeader *)flatBase : NULL;
if (m_pHeader->VersionMajor >= 1)
{
// For version 1 and above, the section headers start after the larger header
if (size < sizeof(WebcilHeader_1))
{
m_pHeader = NULL; // Not enough data for even the header
}
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init dereferences m_pHeader unconditionally (m_pHeader->VersionMajor) even when size < sizeof(WebcilHeader) (in which case m_pHeader is set to NULL). This is a null-dereference when Init is called on small/empty buffers.

Guard the access (e.g., if (m_pHeader != NULL && m_pHeader->VersionMajor >= 1)), and ensure m_sections is set consistently for all paths (also consider initializing/clearing m_sections in the ctor/Reset() since it is a new member).

Copilot uses AI. Check for mistakes.
RETURN FALSE;

COUNT_T headerEnd = sizeof(WebcilHeader) + (COUNT_T)pHeader->CoffSections * sizeof(WebcilSectionHeader);
COUNT_T headerEnd = static_cast<COUNT_T>(((uint8_t*)m_sections - (uint8_t*)m_base)) + (COUNT_T)pHeader->CoffSections * sizeof(WebcilSectionHeader);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasWebcilHeaders computes headerEnd using m_sections, but m_sections can be NULL (e.g., when VersionMajor==1 but size < sizeof(WebcilHeader_1) and Init nulled m_pHeader). This leads to invalid pointer arithmetic/underflow and can make the bounds check unreliable.

Compute the section table start from pHeader->VersionMajor after validating it (e.g., use sizeof(WebcilHeader_1) vs sizeof(WebcilHeader) directly) and return FALSE early if the buffer is too small for the chosen header size.

Suggested change
COUNT_T headerEnd = static_cast<COUNT_T>(((uint8_t*)m_sections - (uint8_t*)m_base)) + (COUNT_T)pHeader->CoffSections * sizeof(WebcilSectionHeader);
COUNT_T headerSize;
if (pHeader->VersionMajor == WEBCIL_VERSION_MAJOR_0)
{
headerSize = sizeof(WebcilHeader);
}
else
{
// WEBCIL_VERSION_MAJOR_1
headerSize = sizeof(WebcilHeader_1);
}
if (m_size < headerSize)
RETURN FALSE;
const uint8_t *sectionTable = reinterpret_cast<const uint8_t *>(m_base) + headerSize;
COUNT_T headerEnd = static_cast<COUNT_T>(sectionTable - reinterpret_cast<const uint8_t *>(m_base)) +
static_cast<COUNT_T>(pHeader->CoffSections) * sizeof(WebcilSectionHeader);

Copilot uses AI. Check for mistakes.
Comment on lines 855 to 866
if (enumThis)
{
DacEnumMemoryRegion(m_base, sizeof(WebcilHeader));
const WebcilHeader *pHeader = (const WebcilHeader *)m_base;
DacEnumMemoryRegion(m_base, dac_cast<TADDR>(m_sections) - dac_cast<TADDR>(m_base));
}

if (HasWebcilHeaders())
{
// Enumerate section headers
const WebcilHeader *pHeader = (const WebcilHeader *)m_base;
DacEnumMemoryRegion(m_base + sizeof(WebcilHeader),
sizeof(WebcilSectionHeader) * pHeader->CoffSections);
DacEnumMemoryRegion(m_sections, sizeof(WebcilSectionHeader) * pHeader->CoffSections);

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EnumMemoryRegions enumerates dac_cast<TADDR>(m_sections) - dac_cast<TADDR>(m_base) even when m_sections is NULL (or not initialized). That can underflow and cause DAC to try to enumerate an enormous memory region.

Only use m_sections when it is non-null; otherwise enumerate a safe minimal range (e.g., min(m_size, sizeof(WebcilHeader))) or compute the header size from validated version data before enumerating.

Copilot uses AI. Check for mistakes.
{
int pos = 0;
buffer[pos++] = (byte)0x70; // element type: funcref
buffer[pos++] = (byte)0; // table limits: flags (1 = has maximum)
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WasmTableImportType.Encode writes table limits flags = 0 (min-only), but the comment says “(1 = has maximum)”. This is misleading and makes it harder to reason about the actual import constraints.

Update the comment to match the encoding (0 = min-only, 1 = min+max).

Suggested change
buffer[pos++] = (byte)0; // table limits: flags (1 = has maximum)
buffer[pos++] = (byte)0; // table limits: flags (0 = min-only, 1 = min+max)

Copilot uses AI. Check for mistakes.
Comment on lines 198 to 206
private int _numElements;
private void WriteFunctionElement(WasmInstructionGroup e0, ReadOnlySpan<int> functionIndices)
private void WriteRefFuncFunctionElement(ReadOnlySpan<int> functionIndices)
{
SectionWriter writer = GetOrCreateSection(WasmObjectNodeSection.ElementSection);
// e0:expr y*:list(funcidx)
// elem (ref func) (ref.func y)* (active 0 e0)
writer.WriteULEB128(0);
// elem (ref func) (ref.func y)* (passive 0 e0)
writer.WriteByte(1); // Passive element segment
writer.WriteByte(0); // element type: ref func

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteRefFuncFunctionElement writes a passive element segment with flags=1 and then a single byte 0 (the legacy elemkind for funcref). The comment “element type: ref func” is inaccurate for this encoding (and can be confused with the newer typed element segment forms).

Update the comment (and/or rename the method) to match the actual encoding being emitted (passive funcref element segment with a vec of funcidx).

Copilot uses AI. Check for mistakes.
Comment on lines +178 to +215
var payloadSize = 0;
try {
payloadSize = readLittleEndianU32FromDataSegment0(wasmBytes);
} catch {
payloadSize = 0;
}

var payloadPtr = 0;
var wasmInstance;
const sp = stackSave();
try {
const sizePtr = stackAlloc(4);
wasmInstance.exports.getWebcilSize(sizePtr);
const payloadSize = HEAPU32[sizePtr >>> 2];
if (payloadSize === 0) {
throw new Error("Webcil payload size is 0");
}

const ptrPtr = stackAlloc(4);
if (_posix_memalign(ptrPtr, 16, payloadSize)) {
throw new Error("posix_memalign failed for Webcil payload");
}
const payloadPtr = HEAPU32[ptrPtr >>> 2];

wasmInstance.exports.getWebcilPayload(payloadPtr, payloadSize);

// Write out parameters: void** data_start, int64_t* size
HEAPU32[outDataStartPtr >>> 2] = payloadPtr;
HEAPU32[outSize >>> 2] = payloadSize;
HEAPU32[(outSize + 4) >>> 2] = 0;

return true;
payloadPtr = HEAPU32[ptrPtr >>> 2 >>> 0];
wasmInstance = new WebAssembly.Instance(wasmModule, {
webcil: {
memory: wasmMemory,
stackPointer: wasmExports.__stack_pointer,
table: wasmTable,
tableBase: new WebAssembly.Global({ value: "i32", mutable: false }, tableStartIndex),
imageBase: new WebAssembly.Global({ value: "i32", mutable: false }, payloadPtr)
}});
} finally {
stackRestore(sp);
}

const webcilVersion = wasmInstance.exports.webcilVersion.value;
if ((webcilVersion > 1) || (webcilVersion < 0)) {
throw new Error(`Unsupported Webcil version: ${webcilVersion}`);
}

wasmInstance.exports.getWebcilPayload(payloadPtr, payloadSize);
HEAPU32[outDataStartPtr >>> 2 >>> 0] = payloadPtr;
HEAPU32[outSize >>> 2 >>> 0] = payloadSize;
HEAPU32[(outSize + 4) >>> 2 >>> 0] = 0;
return true;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If reading the payload size from data segment 0 fails, payloadSize is left as 0 but the code still calls _posix_memalign(..., payloadSize) and then getWebcilPayload(payloadPtr, payloadSize). getWebcilPayload subsequently writes at least 4 bytes at offset 28 (TableBase), which will overflow a 0-byte allocation and can corrupt memory.

Fail early when payloadSize == 0 (return false or throw) rather than proceeding, and consider surfacing the parsing error to aid diagnosis.

Copilot uses AI. Check for mistakes.
Comment on lines 9 to 18
public WebcilHeader(Target target, TargetPointer address)
{
Target.TypeInfo type = target.GetTypeInfo(DataType.WebcilHeader);

CoffSections = target.Read<ushort>(address + (ulong)type.Fields[nameof(CoffSections)].Offset);
CoffSections = target.Read<ushort>(address + 8); // See docs/design/mono/webcil.md
VersionMajor = target.Read<ushort>(address + 4); // See docs/design/mono/webcil.md
}

public ushort CoffSections { get; init; }
public ushort VersionMajor { get; init; }
public uint Size => VersionMajor >= 1 ? (uint)32 : (uint)28; // See docs/design/mono/webcil.md;
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version-aware Size logic (returning 32 when VersionMajor >= 1) isn’t exercised by the existing Loader tests: the mock Webcil header built in CreateWebcilTarget never sets VersionMajor to 1, so only the v0 (28-byte) path is covered.

Please add/extend a test that constructs a v1 Webcil header (with the extra 4-byte TableBase field) and verifies WebcilRvaToOffset still resolves section offsets correctly when the section table starts at 32 bytes.

Copilot uses AI. Check for mistakes.
new("PeCliHeaderRva", DataType.uint32),
new("PeCliHeaderSize", DataType.uint32),
new("PeDebugRva", DataType.uint32),
new("PeDebugSize", DataType.uint32),
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock webcilHeaderLayout matches the v0 28-byte header (ending at PeDebugSize), but the PR introduces a v1 header with an extra 4-byte TableBase field (32 bytes). If these tests are meant to model v1 images, the layout (and any size calculations that use webcilHeaderLayout.Stride) should include the TableBase field so the section table base matches the new spec.

Suggested change
new("PeDebugSize", DataType.uint32),
new("PeDebugSize", DataType.uint32),
new(nameof(Data.WebcilHeader.TableBase), DataType.uint32),

Copilot uses AI. Check for mistakes.
Comment on lines 277 to 280
const WebcilHeader *m_pHeader;
const WebcilSectionHeader *m_sections;

mutable IMAGE_COR20_HEADER *m_pCorHeader;
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_sections is a new pointer member but isn’t given a default initializer here. Since WebcilDecoder methods (e.g., DAC enumeration) use m_sections, leaving it uninitialized until Init() runs risks undefined behavior if any code path calls into the decoder before initialization or after Reset().

Consider default-initializing it to nullptr in the class definition (and ensure Reset() also clears it).

Suggested change
const WebcilHeader *m_pHeader;
const WebcilSectionHeader *m_sections;
mutable IMAGE_COR20_HEADER *m_pCorHeader;
const WebcilHeader *m_pHeader = nullptr;
const WebcilSectionHeader *m_sections = nullptr;
mutable IMAGE_COR20_HEADER *m_pCorHeader = nullptr;

Copilot uses AI. Check for mistakes.
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara
See info in area-owners.md if you want to be subscribed.

throw new Error(`Unsupported Webcil version: ${webcilVersion}`);
}

wasmInstance.exports.getWebcilPayload(payloadPtr, payloadSize);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we unify R2R "image" and IL payload of the webCIL v0 into the same memory block ?


// [cDAC] [Loader]: Contract depends on WebcilHeader layout (size and CoffSections field).
struct WebcilHeader
struct WebcilHeader // For version 0
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we could drop support for v0 from coreCLR codebase. We would probably keep it as is for Mono.

// WASM_TODO, should we move the table grow/fill logic to another helper, since this isn't actually just getting the payload anymore. Also we need to detect errors somehow, since table.grow can fail!
Ref.NullFuncRef,
I32.Const(_uniqueSymbols.Count),
Table.Grow(0),
Copy link
Copy Markdown
Member

@pavelsavara pavelsavara Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to do growth from JS, based on application manifest created at compile time (by parsing R2R image if necessary).

  • For error propagation/re-try.
  • Also in multi-threading (multiple web workers) table growth need to be propagated to all workers first. So doing it ahead of time during application startup, when there is only one worker seems like easy win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-wasm WebAssembly architecture area-crossgen2-coreclr

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants