Skip to content

feat: dx improvements for optimization package#139

Open
andrewklatzke wants to merge 1 commit intoaklatze/AIC-2178/verify-runs-endpointfrom
aklatzke/AIC-2263/sdk-dx-improvements
Open

feat: dx improvements for optimization package#139
andrewklatzke wants to merge 1 commit intoaklatze/AIC-2178/verify-runs-endpointfrom
aklatzke/AIC-2263/sdk-dx-improvements

Conversation

@andrewklatzke
Copy link
Copy Markdown
Contributor

@andrewklatzke andrewklatzke commented Apr 16, 2026

Requirements

  • I have added test coverage for new or changed functionality
  • I have followed the repository's pull request submission guidelines
  • I have validated my changes against all supported platform versions

Describe the solution you've provided

Improves the developer experience when using the SDK and fixes a bug where the global model was being ignored for judges.

Describe alternatives you've considered

This is a QoL change for folks consuming this SDK method. Weren't really alternatives considered.

Additional context

the TLDR; here is that when implementing this against multiple frameworks I found myself falling into the pattern of specifying the same handler for both agents and judges. Since that's the case, I've updated it so that handle_judge_call is optional and defaults to handle_agent_call if it's not specified. With this change, the optimization config when using an LD-built config is reduced to just this:

OptimizationFromConfigOptions(
    project_key="default",
    handle_agent_call=handle_agent_call,
)

Additionally just adds an is_evaluation flag as the final argument for handle_agent_call so that if you're using the singular method you can still discern which is which if necessary.


Note

Medium Risk
Public callback signatures change (extra is_evaluation arg) and judge-model selection behavior is corrected, which can break existing integrations or alter evaluation results if consumers relied on per-judge model overrides.

Overview
Improves the optimization SDK callback ergonomics by making handle_judge_call optional (defaults to handle_agent_call) and adding an is_evaluation boolean argument to both agent/judge call handlers so a shared implementation can differentiate evaluation vs generation.

Fixes judge execution to always use the globally configured judge_model (while still forwarding judge-flag model parameters like temperature/tools) and routes judge calls through a single internal _judge_call fallback. Tests are updated to reflect the new callback signature and defaulting behavior.

Reviewed by Cursor Bugbot for commit 7074cfa. Bugbot is set up for automated code reviews on this repo. Configure here.

@andrewklatzke andrewklatzke requested a review from a team as a code owner April 16, 2026 22:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants