Skip to content

[spark] Support JSON format in COPY INTO#7993

Open
JunRuiLee wants to merge 1 commit into
apache:masterfrom
JunRuiLee:copy-into-json-support
Open

[spark] Support JSON format in COPY INTO#7993
JunRuiLee wants to merge 1 commit into
apache:masterfrom
JunRuiLee:copy-into-json-support

Conversation

@JunRuiLee
Copy link
Copy Markdown
Contributor

@JunRuiLee JunRuiLee commented May 27, 2026

Summary

  • Add JSON format support for COPY INTO import and export, alongside existing CSV support
  • JSON uses column-name matching (not positional), with options for MULTI_LINE, NULL_IF, EMPTY_FIELD_AS_NULL, and COMPRESSION
  • CSV-only options (e.g. FIELD_DELIMITER, SKIP_HEADER) are rejected for JSON format with clear error messages

Motivation

JSON is a common format for semi-structured data in data lake scenarios. Some users have requested JSON support for COPY INTO to complement the existing CSV capability.

Part of #8005

Changes

  • Grammar: Add JSON lexer token to PaimonSqlExtensions.g4
  • CopyOptions.scala: Add FileFormatType.JSON, format-specific option validation and Spark reader/writer option mapping
  • CopyIntoTableExec.scala: JSON reads with column-name schema (vs CSV positional _c0/_c1), dispatch .json() / .csv() by format type
  • CopyIntoLocationExec.scala: Dispatch export by format type
  • Documentation: Updated sql-write.md with JSON syntax, options, and column mapping semantics

Tests

Added 16 JSON test cases covering: basic import, column-name matching, multi-line, explicit column list, NULL_IF, export, option validation, round-trip (export then import), extra/missing fields handling, malformed data abort, bad cast abort, GZIP compression, and date/timestamp column casting.

Add JSON format support for COPY INTO import and export, alongside
existing CSV support. JSON uses column-name matching (not positional),
with options for MULTI_LINE, DATE_FORMAT, TIMESTAMP_FORMAT, NULL_IF,
EMPTY_FIELD_AS_NULL, and COMPRESSION. CSV-only options are rejected
for JSON format. Includes 9 new tests and documentation updates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant