feat(appkit): default analytics format to ARROW_STREAM (BREAKING)#387
Open
jamesbroadhead wants to merge 1 commit into
Open
feat(appkit): default analytics format to ARROW_STREAM (BREAKING)#387jamesbroadhead wants to merge 1 commit into
jamesbroadhead wants to merge 1 commit into
Conversation
useAnalyticsQuery now returns a TypedArrowTable by default instead of a
row array. Callers that need the JSON-row shape must pass
{ format: 'JSON_ARRAY' } explicitly. The default switch applies to the
hook, the chart-data hook, the SQL warehouse connector defaults, and
the analytics plugin request handler.
Why:
- ARROW_STREAM preserves column types (number stays number, bigint stays
bigint) end-to-end. JSON_ARRAY stringifies everything on the wire.
- ARROW IPC is 3-5x more compact than JSON for numeric data and parses
faster on the client.
- This PR stacks on the disposition-fallback PR, which makes both
defaults work across all warehouse variants — but ARROW is the format
the warehouses 'natively' want for INLINE, and aligning with that
avoids the server-side decode the JSON_ARRAY fallback has to do
against inline-arrow-only warehouses.
Migration:
- For tabular code that walks data.length / data[i], either:
(a) opt back into JSON_ARRAY:
useAnalyticsQuery('q', params, { format: 'JSON_ARRAY' });
(b) switch to Arrow API: data.numRows / data.getChild('col')?.get(i)
/ data.toArray().
- DataTable.tsx, dev-playground analytics + dashboard routes, and the
SQL-helpers route are all pinned to JSON_ARRAY in this PR to preserve
their existing rendering.
- The template AnalyticsPage is updated to the Arrow API to demonstrate
the new default.
BREAKING CHANGE: Default format for useAnalyticsQuery and the analytics
plugin request handler is now ARROW_STREAM instead of JSON_ARRAY.
Depends on #329 (the disposition-fallback PR); merge that first.
Signed-off-by: James Broadhead <jamesbroadhead@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Switches the default analytics format from
JSON_ARRAYtoARROW_STREAMacross the hook, the chart-data hook, the SQL warehouse connector defaults, and the analytics plugin request handler.This is a breaking change for code that consumes
useAnalyticsQuery("foo")as a row array (data.length,data[0].col,data.map(...)). Two migration paths are documented below.Depends on #329
Merge #329 first. This PR is stacked on its branch and relies on the bidirectional disposition fallback added there — without that, the ARROW default doesn't work on warehouses that refuse
ARROW_STREAM+INLINE.Why ARROW_STREAM is a better default
Type fidelity. ARROW preserves column types end-to-end —
INTstaysnumber,BIGINTstaysbigint,TIMESTAMPstaysDate. The warehouse'sJSON_ARRAYserialization stringifies everything ("1","2.5", ISO strings); callers have to coerce on their side and any nuance is lost.Wire compactness. Arrow IPC is ~3–5× smaller than JSON for typical numeric data, and the binary parse is materially faster than
JSON.parsefor large tables. Charts/dashboards over real result sets are the primaryuseAnalyticsQueryuse case, and they benefit measurably.Warehouse alignment. The warehouses that only support one INLINE format (the case #329 was originally written for) want
ARROW_STREAM+INLINE. With the ARROW default they get a clean single statement call. With the JSON_ARRAY default they trigger #329's server-side retry-and-decode path — correct but wasteful.Caveat — round trips on classic warehouses. Classic warehouses (and some serverless) reject
ARROW_STREAM+INLINEand requireARROW_STREAM+EXTERNAL_LINKS. Under the ARROW default that's two statement calls (initial INLINE rejected, then EXTERNAL_LINKS) plus the/arrow-resultfetch — vs one call with the JSON_ARRAY default. The type-fidelity + wire-compactness win dominates for the typical analytics workload, but if your app is a small-result-set lookup against a classic warehouse, you may want to opt intoJSON_ARRAYexplicitly.Migration
For code that walks rows as an array (
data.length,data[i].col,data.map):In-repo migrations done in this PR
template/client/src/pages/analytics/AnalyticsPage.tsx— migrated to Arrow API (this is the example new users get fromdatabricks apps init; it should demonstrate the new default).apps/dev-playground/client/src/features/smart-dashboard/hooks/use-dashboard-data.ts— pinned toJSON_ARRAY(7 call sites, all feeding chart components that consume the row-array shape).apps/dev-playground/client/src/routes/analytics.route.tsx— pinned toJSON_ARRAY(3 call sites).apps/dev-playground/client/src/routes/sql-helpers.route.tsx— pinned toJSON_ARRAY.packages/appkit-ui/src/react/table/table-wrapper.tsx(DataTable) — pinned toJSON_ARRAY. The table renders rows as a JS array; an Arrow-native version is a separate optimization.Out-of-repo migrations needed (follow-up PRs)
These external repos all have call sites or examples that assume the JSON_ARRAY shape. None of them are blockers for landing this PR, but they should be updated when this lands or shortly after — flagged here so they don't get lost:
Code in external repos:
databricks/cli—experimental/aitools/templates/appkit/template/{{.project_name}}/client/src/App.tsxusesdata.length/data[0].value.databricks/app-templates—appkit-all-in-one/client/src/pages/analytics/AnalyticsPage.tsxandappkit-analytics/client/src/pages/analytics/AnalyticsPage.tsxuse the same pattern.databricks/devhub—examples/content-moderator/template/client/src/pages/AnalyticsPage.tsxusesuseAnalyticsQuery; needs an audit of its consumers.Docs in external repos (need text refresh, not code):
databricks/devhub—static/raw-docs/appkit/v0/plugins/analytics.md,static/raw-docs/appkit/v0/development/type-generation.md,static/raw-docs/appkit/v0/api/appkit-ui/data/DataTable.mdx, and the AI-skill references under.agents/skills/databricks-apps/references/appkit/*.md.databricks/cli—experimental/aitools/templates/appkit/template/{{.project_name}}/docs/*.md(AI agent docs that show the JSON shape).Tests
analytics.test.ts— flipped the "default format" assertion fromJSON_ARRAYtoARROW_STREAM(the test verifies the default; its meaning is unchanged).analytics.integration.test.ts— the cache test now explicitly requestsJSON_ARRAYbecause the ARROW path bypasses cache by design (inline-stash ids drain on first read, so a cache hit would replay a dead id).use-chart-data.test.ts— flipped two "auto-selects default" assertions.Full suite: 2,674 tests, all green.
This pull request was AI-assisted by Isaac.