Skip to content

cDAC: Add metadata locator callback for heap dump support#126369

Draft
max-charlamb wants to merge 3 commits intomainfrom
cdac/heap-dump-metadata-locator
Draft

cDAC: Add metadata locator callback for heap dump support#126369
max-charlamb wants to merge 3 commits intomainfrom
cdac/heap-dump-metadata-locator

Conversation

@max-charlamb
Copy link
Copy Markdown
Member

Note

This PR was generated with the assistance of GitHub Copilot.

Summary

Add a metadata locator callback to the cDAC Target, enabling EcmaMetadata_1 to resolve PE metadata from disk when it cannot be read from target memory. This unblocks heap dump support for most dump tests, reducing dump sizes by 4-6x.

Problem

EcmaMetadata_1.GetReadOnlyMetadataAddress reads the entire loaded PE image (~3.6 MB for System.Private.CoreLib.dll) via TargetStream to parse PE headers and locate metadata. Heap dumps (MiniDumpWithPrivateReadWriteMemory) don't include PE read-only sections, causing VirtualReadException. This forced all tests that need metadata resolution (stackwalking, method name lookup, etc.) to use full dumps (135-149 MB each).

The cDAC stackwalk itself works perfectly on heap dumps — the only blocker was metadata resolution.

Solution

Add a GetReadOnlyMetadataDelegate callback to ContractDescriptorTarget, mirroring the legacy DAC's ICLRMetadataLocator pattern (implemented by SOS via ISymbolService.GetMetadataLocator in the diagnostics repo).

Architecture

EcmaMetadata_1.GetMetadataProvider (ReadOnly case)
  │
  ├─ Try: read PE image from target memory (existing path)
  │
  └─ Catch VirtualReadException:
       └─ Target.TryLocateReadOnlyMetadata(modulePath)
            └─ GetReadOnlyMetadataDelegate (host callback)
                 └─ ClrMdDumpHost: try path on disk → IFileLocator.FindPEImage
                      └─ PEReader.GetMetadata() → return raw bytes

Changes

cDAC core (commit 1):

  • Target.cs — new TryLocateReadOnlyMetadata virtual method (default returns false)
  • ContractDescriptorTarget.cs — new GetReadOnlyMetadataDelegate type stored in DataTargetDelegates, new TryCreate overload
  • EcmaMetadata_1.cs — try/catch VirtualReadException fallback to metadata locator
  • ClrMdDumpHost.cs — implement callback: try module path on disk, then IFileLocator.FindPEImage
  • DumpTestBase.cs — wire callback into target creation

Test changes (commit 2):

  • Remove DumpType => "full" override from 11 test classes (inherit default "heap")
  • Switch 9 debuggee .csproj files from DumpTypes=Full to DumpTypes=Heap
  • 3 test classes remain on full dumps:
    • ExceptionHandlingInfoDumpTests — reads R2R exception clause tables from PE sections
    • EcmaMetadataDumpTests — explicitly tests the PE-memory metadata path
    • AsyncContinuationDumpTests — reads R2R section headers from PE image

Documentation (commit 3):

  • docs/design/datacontracts/EcmaMetadata.md — document the fallback behavior
  • tests/DumpTests/README.md — update debuggee table, add Metadata Locator section

Test results

Total tests: 508
     Passed: 165
     Failed: 0
    Skipped: 343

Dump size comparison

Dump Type StackWalk R2R StackWalk JIT
Full 134.6 MB 148.9 MB
Heap 23.5 MB 34.7 MB
Reduction 5.7x 4.3x

max-charlamb and others added 2 commits March 31, 2026 14:53
Add a GetReadOnlyMetadataDelegate callback to ContractDescriptorTarget that
allows hosts to provide PE metadata from disk when it cannot be read from
target memory (e.g., heap dumps where PE read-only sections are absent).

- Target.cs: Add TryLocateReadOnlyMetadata virtual method (default: false)
- ContractDescriptorTarget.cs: Add delegate type, TryCreate overload, store
  in DataTargetDelegates alongside existing callbacks
- EcmaMetadata_1.cs: Catch VirtualReadException on the ReadOnly path and
  fall back to Target.TryLocateReadOnlyMetadata, mirroring the legacy DAC's
  ICLRMetadataLocator pattern
- ClrMdDumpHost.cs: Implement callback by trying the module's on-disk path
  first, then ClrMD's IFileLocator.FindPEImage for symbol server resolution
- DumpTestBase.cs: Wire callback into ContractDescriptorTarget.TryCreate

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
With the metadata locator callback in place, most dump tests no longer
need full dumps. Switch 11 of 14 test classes and 9 debuggee projects
from full to heap dumps, reducing dump sizes by 4-6x.

Three test classes remain on full dumps because they read R2R PE section
data that is inherently absent from heap dumps (and the legacy DAC also
cannot access from heap dumps):
- ExceptionHandlingInfoDumpTests: R2R exception clause tables
- EcmaMetadataDumpTests: tests PE-memory metadata address path directly
- AsyncContinuationDumpTests: R2R section headers

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

@max-charlamb max-charlamb requested a review from hoyosjs March 31, 2026 19:01
- EcmaMetadata.md: Document the VirtualReadException fallback to
  Target.TryLocateReadOnlyMetadata in GetMetadata, and note that
  GetReadOnlyMetadataAddress does not use this fallback.
- DumpTests/README.md: Update debuggee dump type table to reflect
  the switch to heap dumps. Add Metadata Locator Callback section
  explaining the architecture and ClrMdDumpHost implementation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@max-charlamb max-charlamb force-pushed the cdac/heap-dump-metadata-locator branch from 20ee599 to 9da2258 Compare March 31, 2026 19:03
@github-actions
Copy link
Copy Markdown
Contributor

Note

This review was generated by Copilot (Claude Opus 4.6) with multi-model cross-validation (GPT-5.2, Claude Sonnet 4.5).

🤖 Copilot Code Review — PR #126369

Holistic Assessment

Motivation: The PR addresses a real gap — cDAC cannot resolve ECMA-335 metadata from heap dumps because PE read-only sections are absent. This mirrors the legacy DAC's ICLRMetadataLocator pattern and is well-motivated. Switching tests from full to heap dumps (4-6x smaller) is a clear win.

Approach: The dual mechanism (virtual method on Target + delegate callback through ContractDescriptorTarget) is consistent with how other callbacks (read, write, thread context) are wired through the cDAC. The backward-compatible TryCreate overload preserves all existing callers. The fallback in EcmaMetadata_1 catches VirtualReadException and attempts host-provided metadata before returning null — a reasonable degradation strategy.

Summary: ⚠️ Needs Changes. One clear bug (LoaderDumpTests will silently stop running) and one design concern about exception handling scope. The core architecture is sound, but these issues should be addressed before merge.


Detailed Findings

❌ LoaderDumpTests DumpType mismatch — tests will silently be skipped

LoaderDumpTests (line 17) uses DebuggeeName => "MultiModule", and MultiModule.csproj still has <DumpTypes>Full</DumpTypes>. However, the PR removed the DumpType => "full" override from LoaderDumpTests (visible in the diff), causing it to fall back to the base class default of "heap".

This means InitializeDumpTest will look for the dump at …/heap/<r2rMode>/MultiModule/MultiModule.dmp, but the dump only exists under …/full/…. The test will throw SkipTestException and be silently skipped rather than failing loudly. All 8 LoaderDumpTests methods are affected.

The README.md table correctly still shows MultiModule as "Full", confirming this was unintentional.

Fix: Restore protected override string DumpType => "full"; in LoaderDumpTests.cs.

⚠️ Exception handling — VirtualReadException catch may be too narrow

In EcmaMetadata_1.cs (lines 67-84), only VirtualReadException is caught. However, GetReadOnlyMetadataAddress constructs a PEReader over a TargetStream that reads from target memory. If the PE is partially present in a heap dump (some pages captured, others not), PEReader could throw BadImageFormatException when parsing truncated PE headers — this exception would propagate uncaught, bypassing the metadata locator fallback entirely.

The TargetStream.Read method (line 35) calls target.ReadBuffer which throws VirtualReadException on failure, and PEReader propagates this without wrapping — so the common case IS caught. But the partial-data / corrupt-header edge case exists.

Suggestion: Consider widening the catch to include BadImageFormatException, or alternatively catch both:

catch (Exception ex) when (ex is VirtualReadException or BadImageFormatException)

This matches the pattern used in ClrMdDumpHost.TryReadMetadataFromPE (line 128) which catches both.

⚠️ Behavioral change — ReadOnly path now silently returns null

Previously, if ReadBuffer failed on the ReadOnly metadata path, the VirtualReadException would propagate to the caller. Now it's caught and null is returned if the locator also fails. This is arguably the right behavior for heap dump scenarios (graceful degradation), but callers that previously relied on an exception to detect metadata read failures will now get null silently.

This seems intentional given the design goal, but worth noting for reviewers to confirm callers handle null from GetMetadata correctly. (flagged by GPT-5.2)

💡 System.Exception qualification in test code

ClrMdDumpHost.cs line 128 uses System.Exception fully qualified:

catch (System.Exception ex) when (ex is BadImageFormatException or IOException or UnauthorizedAccessException)

The file has using System; so the System. prefix is unnecessary. The rest of the codebase uses unqualified Exception. (flagged by Claude Sonnet 4.5)

💡 Module enumeration caching opportunity

ClrMdDumpHost.TryGetReadOnlyMetadata calls _dataTarget.DataReader.EnumerateModules() on every invocation (lines 83-105). Since EcmaMetadata_1 caches results in its _metadata dictionary, each module is only resolved once, making this acceptable. However, caching the module list in ClrMdDumpHost would avoid redundant enumeration. (flagged by both models)

✅ Backward compatibility — verified correct

The existing 5-parameter TryCreate overload is preserved and correctly delegates to the new 6-parameter version with getReadOnlyMetadata: null. All existing callers (Entrypoints.cs × 2, ContractDescriptorBuilder.cs × 1) continue to use the old signature unchanged.

✅ Virtual method default — verified correct

Target.TryLocateReadOnlyMetadata returns false with an empty array by default, which is the correct no-op behavior. ContractDescriptorTarget overrides this to delegate to the callback if one was provided.

✅ Test coverage — adequate

Switching 11 test classes from full to heap dumps means those tests now exercise the metadata locator fallback path whenever they need type information, stack frames, or other metadata-dependent data. The 3 test classes remaining on full dumps (EcmaMetadataDumpTests, ExceptionHandlingInfoDumpTests, AsyncContinuationDumpTests) correctly stay on full because they test PE-section-dependent features.

GetPortableFileName extraction — good refactoring

The portable filename extraction logic was previously inline in FindContractDescriptorAddress. Extracting it into a reusable GetPortableFileName helper and using it in the new TryGetReadOnlyMetadata is clean and eliminates duplication.

Generated by Code Review for issue #126369 ·

_target.ReadBuffer(address.Address, data);
return MetadataReaderProvider.FromMetadataImage(ImmutableCollectionsMarshal.AsImmutableArray(data)).GetMetadataReader();
}
catch (VirtualReadException)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClrMD does this automatically, as does SOS and as far as Noah knows, VS does this too. Why do we need this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It automatically redirects reads to the known PE metadata? We are currently using ClrMD to load and read the dumps.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It automatically redirects reads to the known PE metadata?

Its not specific to metadata btw, it applies to any data in the module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants