Summary
BigQueryAgentAnalyticsPlugin does not project the Vertex AI finish_reason field to the attributes column of LLM_RESPONSE events written to BigQuery. This makes it impossible to classify model failure modes (MAX_TOKENS, SAFETY, MALFORMED_FUNCTION_CALL, RECITATION, etc.) via SQL queries against the analytics table — operators must instead parse unstructured Cloud Logging output to recover this signal.
Adding the field would be a small change (~5 lines) with significant observability value for any project using the plugin.
Current behavior (verified empirically 2026-05-08)
SELECT
JSON_VALUE(attributes,'$.finishReason') AS camel,
JSON_VALUE(attributes,'$.finish_reason') AS snake,
JSON_VALUE(attributes,'$.usage_metadata.finish_reason') AS nested,
COUNT(*) AS row_count
FROM `<project>.<dataset>.events`
WHERE event_type = 'LLM_RESPONSE'
AND timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY camel, snake, nested
ORDER BY row_count DESC;
Result against a 7-day window in production:
| camel |
snake |
nested |
row_count |
| NULL |
NULL |
NULL |
309 |
finish_reason is not present in any path (camelCase, snake_case, or nested under usage_metadata). Source inspection confirms why: in google/adk/plugins/bigquery_agent_analytics_plugin.py, _EVENT_VIEW_DEFS[\"LLM_RESPONSE\"] (around lines 1813–1846 in 1.32.0) extracts only response, usage_*_tokens, cached_content_token_count, context_cache_hit_rate, total_ms, ttft_ms, model_version, usage_metadata, cache_metadata. There are zero references to finish_reason / finishReason anywhere in the 3500+ lines of the plugin file. The EventData dataclass and after_model_callback likewise do not capture it, even though the LlmResponse parameter the callback receives includes it from Vertex.
Use case / motivation
-
Failure mode breakdown via SQL: today, distinguishing "this LLM call ended with MAX_TOKENS" from "this LLM call ended with MALFORMED_FUNCTION_CALL" via the analytics table is impossible. The information lives only in unstructured stderr logs, which are not joinable to the structured event stream.
-
Cache hit ratio segmented by finish_reason: the existing cached_content_token_count projection enables computing cache hit ratio (we use this heavily). But understanding the quality of the cached responses (did the cached prefix lead to clean STOPs or to MALFORMED retries?) requires the finish_reason dimension.
-
Model migration health checks: when migrating between model families (e.g., Gemini 2.5 → 3.x), SRE-grade observability needs to compare MALFORMED rate, SAFETY rate, etc., across revisions side by side. Without finish_reason in BQ, the comparison must be reconstructed from logs — slow, brittle, and not amenable to alert policies.
-
Alerting on regression: a Cloud Monitoring alert like "MALFORMED_FUNCTION_CALL rate > 5% sustained over 1h" is straightforward to write against the BQ events table if finish_reason is projected. Without it, the same alert requires a log-based metric pipeline (more moving parts, more cost).
Proposed solution
Add finish_reason (and optionally finish_message) to the LLM_RESPONSE view definition. The field is already present on LlmResponse from Vertex; the plugin only needs to read and project it. Roughly (against 1.32 source structure):
# In _EVENT_VIEW_DEFS[\"LLM_RESPONSE\"]:
\"finish_reason\": lambda llm_response: (
llm_response.finish_reason.name
if llm_response.finish_reason is not None
else None
),
\"finish_message\": lambda llm_response: llm_response.finish_message,
Backwards-compatible: existing consumers reading the documented fields are unaffected; new consumers can opt in via JSON_VALUE(attributes,'$.finish_reason').
If a different snake_case vs camelCase convention is preferred to match other ADK projections, happy to adjust — the empirical probe checked both spellings to be safe.
Alternatives considered
after_model_callback writing to state_delta: feasible as a downstream workaround, but every consumer of the plugin needs to re-implement it, and it bloats the event stream with one extra row per LLM call.
- Custom plugin subclass overriding
LLM_RESPONSE EventData: surgical but creates maintenance burden when the upstream plugin schema evolves.
- Cloud Logging sink → BigQuery via Logs Router: works but introduces a second pipeline (sink config, log-based filters, parsing) for data that is already flowing through the analytics plugin one level up.
All three alternatives are strictly more code and more complexity than projecting the field in the plugin where the data already lives.
Environment
google-adk 1.32.0
- Python 3.12
- Vertex AI backend (`vertexai=True`)
- Plugin:
BigQueryAgentAnalyticsPlugin (default config, single dataset, single events table)
- Models tested:
gemini-2.5-flash
Happy to send a PR if the team agrees this is in scope. Thanks for the great library.
Summary
BigQueryAgentAnalyticsPlugindoes not project the Vertex AIfinish_reasonfield to theattributescolumn ofLLM_RESPONSEevents written to BigQuery. This makes it impossible to classify model failure modes (MAX_TOKENS, SAFETY, MALFORMED_FUNCTION_CALL, RECITATION, etc.) via SQL queries against the analytics table — operators must instead parse unstructured Cloud Logging output to recover this signal.Adding the field would be a small change (~5 lines) with significant observability value for any project using the plugin.
Current behavior (verified empirically 2026-05-08)
Result against a 7-day window in production:
finish_reasonis not present in any path (camelCase, snake_case, or nested underusage_metadata). Source inspection confirms why: ingoogle/adk/plugins/bigquery_agent_analytics_plugin.py,_EVENT_VIEW_DEFS[\"LLM_RESPONSE\"](around lines 1813–1846 in 1.32.0) extracts onlyresponse,usage_*_tokens,cached_content_token_count,context_cache_hit_rate,total_ms,ttft_ms,model_version,usage_metadata,cache_metadata. There are zero references tofinish_reason/finishReasonanywhere in the 3500+ lines of the plugin file. TheEventDatadataclass andafter_model_callbacklikewise do not capture it, even though theLlmResponseparameter the callback receives includes it from Vertex.Use case / motivation
Failure mode breakdown via SQL: today, distinguishing "this LLM call ended with MAX_TOKENS" from "this LLM call ended with MALFORMED_FUNCTION_CALL" via the analytics table is impossible. The information lives only in unstructured stderr logs, which are not joinable to the structured event stream.
Cache hit ratio segmented by finish_reason: the existing
cached_content_token_countprojection enables computing cache hit ratio (we use this heavily). But understanding the quality of the cached responses (did the cached prefix lead to clean STOPs or to MALFORMED retries?) requires the finish_reason dimension.Model migration health checks: when migrating between model families (e.g., Gemini 2.5 → 3.x), SRE-grade observability needs to compare MALFORMED rate, SAFETY rate, etc., across revisions side by side. Without
finish_reasonin BQ, the comparison must be reconstructed from logs — slow, brittle, and not amenable to alert policies.Alerting on regression: a Cloud Monitoring alert like "MALFORMED_FUNCTION_CALL rate > 5% sustained over 1h" is straightforward to write against the BQ events table if
finish_reasonis projected. Without it, the same alert requires a log-based metric pipeline (more moving parts, more cost).Proposed solution
Add
finish_reason(and optionallyfinish_message) to theLLM_RESPONSEview definition. The field is already present onLlmResponsefrom Vertex; the plugin only needs to read and project it. Roughly (against 1.32 source structure):Backwards-compatible: existing consumers reading the documented fields are unaffected; new consumers can opt in via
JSON_VALUE(attributes,'$.finish_reason').If a different snake_case vs camelCase convention is preferred to match other ADK projections, happy to adjust — the empirical probe checked both spellings to be safe.
Alternatives considered
after_model_callbackwriting tostate_delta: feasible as a downstream workaround, but every consumer of the plugin needs to re-implement it, and it bloats the event stream with one extra row per LLM call.LLM_RESPONSEEventData: surgical but creates maintenance burden when the upstream plugin schema evolves.All three alternatives are strictly more code and more complexity than projecting the field in the plugin where the data already lives.
Environment
google-adk1.32.0BigQueryAgentAnalyticsPlugin(default config, single dataset, single events table)gemini-2.5-flashHappy to send a PR if the team agrees this is in scope. Thanks for the great library.