Skip to content

Add CI skill evals#7139

Draft
dmerand wants to merge 1 commit intodlm-cli-ci-skillsfrom
dlm-cli-ci-skills-eval
Draft

Add CI skill evals#7139
dmerand wants to merge 1 commit intodlm-cli-ci-skillsfrom
dlm-cli-ci-skills-eval

Conversation

@dmerand
Copy link
Copy Markdown
Contributor

@dmerand dmerand commented Apr 1, 2026

What

Add a small repo-local test harness and README on top of #7138:

  • .agents/skills/README.md
  • scripts/test-cli-ci-skills.sh

This is a draft support PR, mainly to make the harness easy to inspect and access while iterating on the CI skill.

Why

While developing the repo-local CI guidance, I wanted a lightweight way to exercise real pi -p prompts against the repo and confirm:

  • the pre-submit skill is actually discovered
  • narrow docs/config/wiring PRs stay lightweight
  • example PR prompts still produce the expected interaction shape

I do not expect this PR to ship as-is. It exists so the harness and its README are visible and reviewable while the underlying skill work is being iterated.

How

  • add .agents/skills/README.md with quick operator guidance for running the harness
  • add scripts/test-cli-ci-skills.sh to run prompt-based smoke tests against:
    • the current branch
    • example Shopify CLI PRs
  • have the harness report:
    • whether the expected skill file was read
    • whether key repo-inspection signals appeared
    • whether lightweight cases stayed free of heavyweight command execution

How to test your changes?

From the repo root:

scripts/test-cli-ci-skills.sh pre-submit-current-branch
scripts/test-cli-ci-skills.sh example-prs
scripts/test-cli-ci-skills.sh all

A passing run means the prompt shape still looks reasonable for the current skill setup. It does not prove the repo change itself is correct.

Measuring impact

How do we know this change was effective? Please choose one:

  • n/a - this doesn't need measurement, e.g. a linting rule or a bug-fix
  • Existing analytics will cater for this addition
  • PR includes analytics changes to measure impact

Checklist

  • I've considered possible cross-platform impacts (Mac, Linux, Windows)
  • I've considered possible documentation changes

Copy link
Copy Markdown
Contributor Author

dmerand commented Apr 1, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@dmerand dmerand mentioned this pull request Apr 1, 2026
@dmerand dmerand force-pushed the dlm-cli-ci-skills-eval branch from 5a9fcad to 2132bee Compare April 1, 2026 01:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant