Skip to content

feat(sea): wire rowLimit + statementConf + TIMESTAMP_NTZ/LTZ params#408

Closed
msrathore-db wants to merge 2 commits into
msrathore/sea-tls-connectivityfrom
msrathore/sea-statement-options-params
Closed

feat(sea): wire rowLimit + statementConf + TIMESTAMP_NTZ/LTZ params#408
msrathore-db wants to merge 2 commits into
msrathore/sea-tls-connectivityfrom
msrathore/sea-statement-options-params

Conversation

@msrathore-db
Copy link
Copy Markdown
Contributor

@msrathore-db msrathore-db commented May 31, 2026

What

Three statement-option / param-type additions where the kernel + napi binding were already complete but the node SEA layer never exposed them. (Audit "Group A"; the fourth Group-A item, queryTags, was a dropped pre-existing option and is fixed upstream in #403.)

Addition Before After
rowLimit no SEA way to cap rows server-side new ExecuteStatementOptions.rowLimit → napi rowLimit (SEA row_limit)
statementConf no per-statement conf overlay new ExecuteStatementOptions.statementConf → napi statementConf (Thrift confOverlay equivalent)
TIMESTAMP_NTZ / LTZ params every Date coerced to TIMESTAMP DBSQLParameterType.TIMESTAMP_NTZ / TIMESTAMP_LTZ selectable

Why these were "free"

The kernel StatementSpec (row_limit, statement_conf, NTZ/LTZ TypedValue arms) and napi ExecuteOptions already exposed everything; the only missing piece was the node ExecuteStatementOptions surface + threading. No kernel/napi change.

Notes

  • statementConf generalises the existing query_tags serialisation (wired upstream in feat(sea): Thrift-parity — params, intervals, getInfo, SQL-error class, input validation + queryTags #403): a caller-supplied statementConf and queryTags merge into one conf map.
  • rowLimit is SEA-only (the Thrift backend has no execute-time server cap); maxRows remains the client-side per-fetch chunk size on both backends.
  • TIMESTAMP_NTZ/LTZ: toSparkParameter already honours an explicit type and SeaPositionalParams passes the SQL type verbatim to the kernel codec — so only the enum values were needed.

Testing

240 SEA unit tests pass (rowLimit/statementConf forwarding, statementConf+queryTags merge, NTZ/LTZ param mapping). Verified live against pecotesting earlier (rowLimit:7 caps a range(0,100) result to 7 rows; NTZ param round-trips).

Stacking

Stacked on #407#406#403#404.

This pull request and its description were written by Isaac.

Downstream fixes / reviewer note

  • 2026-06-01 review-fix cascade: final tip carries the [SEA-NodeJS] (7/8) Operation lifecycle — cancel / close / finished + INTERVAL parity + napi relocation #384 operation fixes: declared flatbuffers, one-pass IPC duration rewrite, cancel/close local-state rollback on native RPC failure, closed fetch → OperationStateError(Closed), and Arrow duration rewriter cleanup.
  • 2026-06-01 review-fix cascade: final tip also rejects unsupported per-statement useCloudFetch rather than silently ignoring it, and uses native sessionId/statementId for session/operation ids where available.

@github-actions
Copy link
Copy Markdown

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Three statement-option / param-type additions where the kernel + napi were
already ready but the node SEA layer didn't expose them:

- rowLimit: new `ExecuteStatementOptions.rowLimit` → napi `rowLimit`
  (SEA `row_limit`). SEA-only server-side cap; Thrift has no execute-time cap.
- statementConf: new `ExecuteStatementOptions.statementConf` → napi
  `statementConf` (SEA `statement_conf`), the Thrift `confOverlay` equivalent.
  Generalises the existing query_tags serialisation so a caller-supplied
  statementConf and queryTags merge into one conf map (queryTags already
  forwarded upstream).
- TIMESTAMP_NTZ / TIMESTAMP_LTZ: added to `DBSQLParameterType` so callers can
  bind timezone-explicit timestamp params. `toSparkParameter` already honours
  an explicit type and `SeaPositionalParams` passes the SQL type verbatim to
  the kernel codec (which has the NTZ/LTZ arms).

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
The cascade commit that surfaced `statementId` on `SeaNativeStatement` and
`sessionId` on `SeaNativeConnection` (matching the kernel napi binding's
new getters) didn't update the blocking test fakes, breaking compilation.
Add the readonly fields to FakeNativeStatement / FakeNativeConnection
(execution.test.ts) and FakeNativeStatement / FakeMetadataConnection
(metadata.test.ts). The async fakes already carried statementId.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db msrathore-db force-pushed the msrathore/sea-statement-options-params branch from 68a65ce to edd07c8 Compare June 1, 2026 05:18
msrathore-db added a commit that referenced this pull request Jun 3, 2026
Consolidates the last net-new bit of the superseded #408: two SEA-path
DBSQLParameterType variants for binding timezone-explicit timestamps. The type
name flows through the existing param codec (toSparkParameter → sqlType), which
the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the
bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
@msrathore-db
Copy link
Copy Markdown
Contributor Author

Superseded by the consolidated SEA stack (#412#413#414) and closing to minimise open PRs. Coverage verified before closing: rowLimit + statementConf + TIMESTAMP_NTZ/LTZ params → consolidated into #413 (buildExecuteOptions rowLimit/statementConf + DBSQLParameterType.TIMESTAMP_NTZ/LTZ, both warehouse-validated). No unique code is lost.

msrathore-db added a commit that referenced this pull request Jun 3, 2026
Consolidates the last net-new bit of the superseded #408: two SEA-path
DBSQLParameterType variants for binding timezone-explicit timestamps. The type
name flows through the existing param codec (toSparkParameter → sqlType), which
the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the
bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
msrathore-db added a commit that referenced this pull request Jun 4, 2026
)

* [SEA-NodeJS] SEA connection & statement options

Wire the SEA connection-level and per-statement option surfaces onto the
merged-kernel napi binding (thin forwarding — the kernel owns the behaviour):

Connection options (SeaAuth.buildSeaConnectionOptions):
- `maxConnections` → kernel pool size, validated as a positive integer within
  the napi u32 range.
- TLS: `checkServerCertificate` (secure-by-default — omit to keep the kernel's
  verify-on default; `false` opts into insecure) and `customCaCert` (PEM string
  or Buffer; strings are PEM-sanity-checked and normalised to a Buffer before
  the FFI boundary), via the new `buildSeaTlsOptions`.
- `intervalsAsString: true` is always set so SEA interval/duration columns
  render as strings — a byte-compatible drop-in for the Thrift backend.
  `complexTypesAsJson` is intentionally left at the kernel default (native
  Arrow), which already decodes identically to Thrift via the shared converter.

Statement options (SeaSessionBackend.executeStatement, via buildExecuteOptions):
- `queryTimeout` → `queryTimeoutSecs`; `rowLimit` → `rowLimit` (SEA-only cap).
- `queryTags` serialised JS-side (reusing Thrift's `serializeQueryTags`) into
  the reserved `query_tags` conf key, merged with any explicit `statementConf` —
  the napi `queryTags` field can't carry null-valued tags, and the kernel
  rejects setting both. `queryTags` / `queryTimeout` are no longer rejected.
- Still rejected (genuinely unsupported on SEA): `useCloudFetch`,
  `useLZ4Compression`, `stagingAllowedLocalPath`.

`rowLimit` / `statementConf` added to the public `ExecuteStatementOptions`;
SEA-only knobs (`maxConnections` / `checkServerCertificate` / `customCaCert`)
added to the internal `InternalConnectionOptions`.

Validated against a live warehouse: secure-by-default connect, maxConnections,
checkServerCertificate, rowLimit (caps rows), queryTimeout, queryTags,
statementConf, and non-PEM customCaCert rejection.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* [SEA-NodeJS] SEA async execute (submit / poll / awaitResult)

Switch the SEA query path from the blocking `executeStatement` to the kernel's
async `submitStatement`, matching the Thrift backend's always-async
(`runAsync: true`) model. `submitStatement` returns immediately with a pending
`AsyncStatement` (kernel `wait_timeout=0s`) while the query runs server-side.

SeaOperationBackend becomes dual-mode (exactly one of):
- `asyncStatement` (query path): `waitUntilReady()` polls `status()` to a
  terminal state on a 100ms cadence (matching Thrift), firing the progress
  callback each tick. Polling `status()` rather than blocking on `awaitResult()`
  keeps `cancel()` responsive — a blocking awaitResult would hold the kernel
  statement mutex for the whole query and queue cancel behind it. On Succeeded
  it materialises the result handle (first fetch is free); on Failed it drives
  `awaitResult()` to surface the kernel's typed SQL-error envelope; on a
  server-side Cancelled/Closed/Unknown it throws a clear error. `status()`
  reports the real Pending/Running/Succeeded state.
- `statement` (metadata path): the kernel `list*`/`get*` statement is already
  terminal, so `waitUntilReady()` stays the one-shot completion tick.

The fetch pipeline is shared: `awaitResult()`'s `AsyncResultHandle` and the
metadata `Statement` expose the same `fetchNextBatch()` / `schema()` surface, so
`SeaResultsProvider` → `ArrowResultConverter` → `ResultSlicer` consume either
interchangeably via a single memoised fetch handle. cancel()/close() route
through a `lifecycleHandle` abstraction over whichever handle backs the op.

Re-exports the kernel `AsyncStatement` / `AsyncResultHandle` types from
`SeaNativeLoader`.

Validated against a live warehouse: async fetchAll correctness, multi-row drain
(5000 rows), long-running aggregate (count over 20M), kernel SQL-error
surfacing, and cancellation mid-execution. PR1's params/metadata/getInfo all
still pass through the new async path.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* [SEA-NodeJS] prettier-format SEA connection/options + async files (CI code-style)

The CI "Check code style" step runs `prettier . --check` (whole repo);
these files were committed without prettier formatting. Formatting-only.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* [SEA-NodeJS] Address code-review findings on async/TLS/options (#413)

Code-review #413 (81/100). Validated each against the code + a live warehouse:

- F1 (HIGH): the async poll loop threw plain HiveDriverError for server-driven
  Cancelled/Closed/Unknown. The DBSQLOperation facade only mirrors its
  cancelled/closed flags when `err instanceof OperationStateError` (and
  OperationStateError extends HiveDriverError, not the reverse), so a
  server-side cancel/close/admin-kill left the facade desynced. Now throws
  OperationStateError(Canceled/Closed/Unknown) — matching the Thrift backend.
  The Failed branch still surfaces the kernel SQL-error envelope via awaitResult.

- F2 (MED): the server-Cancelled test asserted only instanceOf(HiveDriverError),
  which passes for both the correct and incorrect type — it couldn't catch F1.
  Now asserts instanceOf(OperationStateError) + errorCode, plus a new Closed test.

- F3 (MED): queryTimeout was forwarded to submitStatement but the kernel ignores
  queryTimeoutSecs on submit (always wait_timeout=0s), so the documented public
  option was a silent no-op, and the poll loop had no client-side deadline (a
  stalled Running statement polled forever). Now enforced client-side: the poll
  loop tracks a deadline, best-effort cancels the statement on expiry, and
  throws OperationStateError(Timeout) — matching Thrift's server TIMEDOUT
  outcome. Stopped forwarding the ignored queryTimeoutSecs to the napi options.
  Validated live: a 2s timeout interrupts a slow cross-join with TIMEOUT.

- F4 (LOW): customCaCert PEM string check now requires the END marker too (a
  truncated/headerless cert no longer passes), consistent with the Buffer path.

- F5 (LOW): SeaAuth reads the SEA-only fields (checkServerCertificate /
  customCaCert / maxConnections) through `InternalConnectionOptions` instead of
  ad-hoc inline casts, so a typo'd key fails to compile.

- F6 (LOW): corrected the poll-loop comment — the prior text justified polling
  by an incorrect "blocking awaitResult holds the mutex and queues cancel"
  claim; cancel() is documented lock-free. The real rationale (real-time
  status to the progress callback + cancel observed between ticks) is now stated.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* [SEA-NodeJS] add TIMESTAMP_NTZ / TIMESTAMP_LTZ bound-param types

Consolidates the last net-new bit of the superseded #408: two SEA-path
DBSQLParameterType variants for binding timezone-explicit timestamps. The type
name flows through the existing param codec (toSparkParameter → sqlType), which
the kernel accepts — validated live (SELECT ? with TIMESTAMP_NTZ/LTZ returns the
bound values). On the Thrift backend they degrade to a plain TIMESTAMP bind.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

* [SEA-NodeJS] Address #413 review: TIMESTAMP_LTZ wire type + Int64 queryTimeout coercion

Code-review #413 (gopalldb). Two P1s:

- TIMESTAMP_LTZ was sent verbatim on the wire, but Spark has no distinct
  TIMESTAMP_LTZ type (TIMESTAMP already carries LTZ semantics) — so a Thrift
  caller got an opaque server bind error, and the enum comment falsely claimed
  NTZ/LTZ "degrade to a plain TIMESTAMP bind" (there was no such logic).
  `toSparkParameter` now maps TIMESTAMP_LTZ → `TIMESTAMP` (valid on both Thrift
  and kernel); TIMESTAMP_NTZ stays native (a real Spark type). Comment corrected.
  Added DBSQLParameter tests for both wire types (the Thrift behaviour the
  review flagged as untested) and updated the kernel positional-params test.

- queryTimeout (`number | bigint | Int64`) was coerced with `Number(...)`, which
  yields NaN for an Int64 (node-int64 has no valueOf) → the client-side deadline
  was silently disabled for Int64 inputs. Now uses
  `numberToInt64(...).toNumber()`, matching the Thrift backend. Added a
  regression test that an `Int64(1)` queryTimeout actually fires the deadline
  (OperationStateError(Timeout)) rather than polling forever.

(P1 "queryTimeout silently dropped on submit" and the unbounded-poll note were
already resolved earlier by the client-side deadline fix; doc comment updated to
match. P2 polarity/Date-NTZ items noted for the public-surface follow-up.)

Validated live: NTZ binds natively and LTZ binds as TIMESTAMP on the kernel path.

Co-authored-by: Isaac
Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>

---------

Signed-off-by: Madhavendra Rathore <madhavendra.rathore@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant