Skip to content

feat(eval): add input_id passthrough field to preserve caller-supplied correlation ID in results output#2857

Merged
hamza-jeddad merged 2 commits into
mainfrom
2853-eval-preserve-input-eval-json-id-as-session-id-in-results-output
May 21, 2026
Merged

feat(eval): add input_id passthrough field to preserve caller-supplied correlation ID in results output#2857
hamza-jeddad merged 2 commits into
mainfrom
2853-eval-preserve-input-eval-json-id-as-session-id-in-results-output

Conversation

@hamza-jeddad
Copy link
Copy Markdown
Contributor

Plan: Preserve caller-supplied "input_id" from input eval JSON in results output

1. Goal Restatement

Each eval input is one .json file in a directory. When a file contains a top-level "input_id" field, that value must be carried through untouched to the corresponding session entry in the results output (the *.json file and the SQLite .db file). The session's own "id" (a random UUID) is never touched — it is always freshly generated as today. "input_id" is simply a new passthrough field. If the input file has no "input_id", the field is absent from the output — no change to existing behaviour.


2. User-side: Before vs After

Before (broken) After (fixed)
User writes eval file with "input_id": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee" ✅ file is accepted ✅ file is accepted
User runs docker agent eval ✅ runs fine ✅ runs fine
Output session "id" 91907fe1-cd72-4e88-b1a5-1b439675f7c5 (random UUID) 91907fe1-cd72-4e88-b1a5-1b439675f7c5 (same random UUID)
Output session "input_id" ❌ missing — field is dropped entirely "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee" — carried through from the input file
User matches output back to their database record ❌ impossible — input_id is gone ✅ IDs match, correlation works
User writes eval file without "input_id" No input_id in output Same — no input_id in output ✅

3. Root Cause

loadEvalSessions deserializes each .json file into session.Session via json.Unmarshal. session.Session has no InputID field, so "input_id" from the file is silently discarded right at load time and never reaches the output.


4. Affected Files

File What changes
pkg/session/session.go Add InputID string field (`json:"input_id,omitempty"`) to the Session struct.
pkg/evaluation/eval.go After SessionFromEvents, copy evalSess.Session.InputIDresult.Session.InputID.
pkg/evaluation/save_test.go Add a new test TestSessionFromEventsPreservesInputID that verifies input_id is carried through and id is still a fresh UUID.

No changes needed to SessionFromEvents, session.New, or any other file.


5. Step-by-Step Approach

Step 1 — Add InputID field to session.Session in session.go

In the Session struct, add after the ID field:

// ID is the unique identifier for the session
ID string `json:"id"`

// InputID is an optional caller-supplied correlation ID read from the eval
// input file's "input_id" field. It is carried through to the output as-is
// and never used internally. The session's own "id" is always a fresh UUID.
InputID string `json:"input_id,omitempty"`

omitempty ensures the field is absent from output JSON when empty — no change for existing eval files that don't include it.

Verify: go build ./... compiles clean.


Step 2 — Copy InputID through in eval.go

In runSingleEval, after the existing line:

result.Session = SessionFromEvents(events, title, userMessages)

add:

result.Session.InputID = evalSess.Session.InputID

evalSess.Session.InputID is already populated by json.Unmarshal in loadEvalSessions (from Step 1). When the input file had no "input_id", it is an empty string and omitempty keeps it out of the output — correct behaviour.

Verify: go build ./pkg/evaluation/... compiles clean.


Step 3 — Add test TestInputIDPassthrough to save_test.go

Add a test that:

  1. Creates an InputSession whose Session.InputID is set to a known value.
  2. Runs SessionFromEvents and copies InputID as in Step 2.
  3. Asserts result.Session.InputID equals the known value.
  4. Asserts result.Session.ID is a non-empty UUID different from InputID (i.e. the random UUID was not disturbed).
func TestInputIDPassthrough(t *testing.T) {
    t.Parallel()

    const knownInputID = "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"

    sess := SessionFromEvents(nil, "title", []string{"q"})
    sess.InputID = knownInputID

    assert.Equal(t, knownInputID, sess.InputID)
    assert.NotEmpty(t, sess.ID)
    assert.NotEqual(t, knownInputID, sess.ID)
}

Verify: go test ./pkg/evaluation/... -run TestInputIDPassthrough passes.


Step 4 — Run the full test suite

go test ./...

All pre-existing tests must continue to pass with no regressions.


6. Risks and Unknowns

Risk Likelihood Notes
Two input eval files share the same "input_id" Low Fine — input_id is purely a passthrough label, not used as a key anywhere. No uniqueness constraint applies.
Adding InputID to session.Session affects unrelated session serialization Low omitempty means the field is invisible in all existing sessions that don't set it.
SQLite store serializes session.Session as JSON blob — input_id will be included automatically Confirmed No extra work needed; the store round-trips the full struct.

7. Open Questions

None.

@hamza-jeddad hamza-jeddad requested a review from a team as a code owner May 21, 2026 12:35
@hamza-jeddad hamza-jeddad force-pushed the 2853-eval-preserve-input-eval-json-id-as-session-id-in-results-output branch from d426295 to 3487a36 Compare May 21, 2026 12:36
Copy link
Copy Markdown

@docker-agent docker-agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟡 NEEDS ATTENTION

assert.Equal(t, "Auto-generated title", sess.Title)
}

func TestInputIDPassthrough(t *testing.T) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] TestInputIDPassthrough is tautological — it does not exercise the runSingleEval copy path

The test manually assigns sess.InputID = knownInputID and then immediately asserts sess.InputID == knownInputID. That assertion is trivially true regardless of what runSingleEval does — removing the copy line result.Session.InputID = evalSess.Session.InputID in eval.go would not cause this test to fail.

The actual behaviour being fixed — that InputID flows from the deserialized evalSess.Session through runSingleEval into result.Session — is left uncovered. A regression in eval.go would be silent.

Suggested fix: test the end-to-end path. At minimum, construct an InputSession with Session.InputID pre-populated, call runSingleEval (or a lightweight helper that mirrors its copy logic), and assert the output result.Session.InputID equals the input value. Alternatively, add a round-trip integration test that writes a .json eval file containing "input_id", runs the evaluator, and checks that the field appears in the result.

@hamza-jeddad hamza-jeddad force-pushed the 2853-eval-preserve-input-eval-json-id-as-session-id-in-results-output branch from 3487a36 to 075149f Compare May 21, 2026 12:38
@hamza-jeddad hamza-jeddad merged commit 3350fc7 into main May 21, 2026
9 checks passed
@hamza-jeddad hamza-jeddad deleted the 2853-eval-preserve-input-eval-json-id-as-session-id-in-results-output branch May 21, 2026 13:58
@aheritier aheritier added area/testing Test infrastructure, CI/CD, test runners, evaluation kind/feat PR adds a new feature (maps to feat: commit prefix) labels May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/testing Test infrastructure, CI/CD, test runners, evaluation kind/feat PR adds a new feature (maps to feat: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

eval: preserve input eval JSON "id" as session ID in results output

4 participants