Skip to content

pgcopy,pgwire: fix CSV decoding of quoted NULL markers and end-of-copy marker#36732

Open
def- wants to merge 1 commit into
MaterializeInc:mainfrom
def-:pr-copy-from
Open

pgcopy,pgwire: fix CSV decoding of quoted NULL markers and end-of-copy marker#36732
def- wants to merge 1 commit into
MaterializeInc:mainfrom
def-:pr-copy-from

Conversation

@def-
Copy link
Copy Markdown
Contributor

@def- def- commented May 26, 2026

The csv crate strips quoting during parsing, so decode_copy_format_csv could not distinguish a quoted NULL marker from an unquoted one (silent data corruption: e.g. with default params, a literal "" decoded to SQL NULL instead of the empty string), and the end-of-copy marker check treated a quoted "\." value as the bare \. terminator (silent data loss: the row and every subsequent row were dropped).

Switch the decoder to csv-core so we can inspect each field's leading input byte and recover per-field quote state; the NULL-marker and end-of-copy checks both now require the field to be unquoted. The pgwire COPY row scanner had the same quote-blind end-marker check; switch it to match against the raw input bytes for the in-progress record.

@def- def- marked this pull request as ready for review May 27, 2026 16:09
@def- def- requested a review from a team as a code owner May 27, 2026 16:09
@def- def- requested a review from ggevay May 27, 2026 16:10
@def- def- marked this pull request as draft May 28, 2026 01:29
…y marker

The csv crate strips quoting during parsing, so `decode_copy_format_csv`
could not distinguish a quoted NULL marker from an unquoted one (silent
data corruption: e.g. with default params, a literal `""` decoded to
SQL NULL instead of the empty string), and the end-of-copy marker check
treated a quoted `"\."` value as the bare `\.` terminator (silent data
loss: the row and every subsequent row were dropped).

Switch the decoder to csv-core so we can inspect each field's leading
input byte and recover per-field quote state; the NULL-marker and
end-of-copy checks both now require the field to be unquoted. The pgwire
COPY row scanner had the same quote-blind end-marker check; switch it
to match against the raw input bytes for the in-progress record.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@def- def- marked this pull request as ready for review May 28, 2026 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant