[SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference by zhengruifeng · Pull Request #56116 · apache/spark

zhengruifeng · 2026-05-26T09:30:23Z

What changes were proposed in this pull request?

Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only.

python/docs/source/reference/pyspark.sql/dataframe.rst:

DataFrame.zipWithIndex

python/docs/source/reference/pyspark.sql/datasource.rst:

DataSourceStreamReader.getDefaultReadLimit
DataSourceStreamReader.reportLatestOffset

python/docs/source/reference/pyspark.sql/io.rst:

DataFrameReader.changes

python/docs/source/reference/pyspark.ss/io.rst:

DataStreamReader.changes
DataStreamReader.name

Why are the changes needed?

All of the above are public, marked .. versionadded:: 4.2.0, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference.

Original JIRAs:

DataFrame.zipWithIndex — SPARK-55229 / SPARK-55231
DataSourceStreamReader.getDefaultReadLimit / reportLatestOffset — SPARK-55304
DataFrameReader.changes / DataStreamReader.changes — SPARK-55950
DataStreamReader.name — SPARK-55121

Does this PR introduce any user-facing change?

Documentation-only change; the methods themselves are unchanged.

How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (DataFrame.zipWithIndex is appended after the existing trailing DataFrame.pandas_api since it is alphabetically last).

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (model: claude-opus-4-7)

dongjoon-hyun

+1, LGTM. Thank you, @zhengruifeng . Are these all missing instances? How do we verify it?

How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (DataFrame.zipWithIndex is appended after the existing trailing DataFrame.pandas_api since it is alphabetically last).

zhengruifeng · 2026-05-27T03:30:37Z

+1, LGTM. Thank you, @zhengruifeng . Are these all missing instances? How do we verify it?

How was this patch tested?

Docs-only change. New entries inserted alphabetically within each autosummary block (DataFrame.zipWithIndex is appended after the existing trailing DataFrame.pandas_api since it is alphabetically last).

I am asking AI to generate the candidates and then check manually.
Will send new PRs if I find more issues

Add three methods added in Spark 4.2 that were missing from the Python API reference: - DataFrame.zipWithIndex (SPARK-55229/SPARK-55231) - DataSourceStreamReader.getDefaultReadLimit (SPARK-55304) - DataSourceStreamReader.reportLatestOffset (SPARK-55304) The methods themselves were shipped with .. versionadded:: 4.2.0 and are exported from their respective public modules; only the autosummary entries in reference/pyspark.sql/{dataframe,datasource}.rst were absent.

…ference Add the changes() reader method (SPARK-55950, .. versionadded:: 4.2.0) to the Python API reference for both the batch and streaming sides: - DataFrameReader.changes -> reference/pyspark.sql/io.rst - DataStreamReader.changes -> reference/pyspark.ss/io.rst

DataStreamReader.name (SPARK-55121, .. versionadded:: 4.2.0) is public on pyspark.sql.streaming.DataStreamReader but was missing from reference/pyspark.ss/io.rst. Insert alphabetically between load and option.

…erence ### What changes were proposed in this pull request? Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only. `python/docs/source/reference/pyspark.sql/dataframe.rst`: - `DataFrame.zipWithIndex` `python/docs/source/reference/pyspark.sql/datasource.rst`: - `DataSourceStreamReader.getDefaultReadLimit` - `DataSourceStreamReader.reportLatestOffset` `python/docs/source/reference/pyspark.sql/io.rst`: - `DataFrameReader.changes` `python/docs/source/reference/pyspark.ss/io.rst`: - `DataStreamReader.changes` - `DataStreamReader.name` ### Why are the changes needed? All of the above are public, marked `.. versionadded:: 4.2.0`, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference. Original JIRAs: - `DataFrame.zipWithIndex` — SPARK-55229 / SPARK-55231 - `DataSourceStreamReader.getDefaultReadLimit` / `reportLatestOffset` — SPARK-55304 - `DataFrameReader.changes` / `DataStreamReader.changes` — SPARK-55950 - `DataStreamReader.name` — SPARK-55121 ### Does this PR introduce _any_ user-facing change? Documentation-only change; the methods themselves are unchanged. ### How was this patch tested? Docs-only change. New entries inserted alphabetically within each autosummary block (`DataFrame.zipWithIndex` is appended after the existing trailing `DataFrame.pandas_api` since it is alphabetically last). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (model: claude-opus-4-7) Closes #56116 from zhengruifeng/spark-doc-methods-dev2. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com> (cherry picked from commit 64a8b51) Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>

zhengruifeng · 2026-05-27T05:34:48Z

thanks all, merged to master/4.x/4.2

zhengruifeng changed the title ~~[PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference~~ [PYTHON][DOCS] Add missing 4.2 entries to PySpark API reference May 26, 2026

zhengruifeng changed the title ~~[PYTHON][DOCS] Add missing 4.2 entries to PySpark API reference~~ [SPARK-57072[PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference May 26, 2026

zhengruifeng changed the title ~~[SPARK-57072[PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference~~ [SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference May 26, 2026

zhengruifeng requested review from HyukjinKwon, dongjoon-hyun and huaxingao May 26, 2026 09:40

zhengruifeng marked this pull request as ready for review May 26, 2026 10:52

dongjoon-hyun approved these changes May 26, 2026

View reviewed changes

huaxingao approved these changes May 27, 2026

View reviewed changes

HyukjinKwon approved these changes May 27, 2026

View reviewed changes

zhengruifeng added 3 commits May 27, 2026 03:32

[PYTHON][DOCS] Add DataStreamReader.name to PySpark API reference

5a1dedf

DataStreamReader.name (SPARK-55121, .. versionadded:: 4.2.0) is public on pyspark.sql.streaming.DataStreamReader but was missing from reference/pyspark.ss/io.rst. Insert alphabetically between load and option.

zhengruifeng force-pushed the spark-doc-methods-dev2 branch from c95af5b to 5a1dedf Compare May 27, 2026 03:33

zhengruifeng closed this in 64a8b51 May 27, 2026

zhengruifeng deleted the spark-doc-methods-dev2 branch May 27, 2026 05:34

zhengruifeng mentioned this pull request May 27, 2026

[SPARK-57116][SQL][PYTHON][DOC] Fix versionadded/@since for kll_merge_agg_* (4.1.0 -> 4.1.2) #56135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference#56116

[SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference#56116
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:spark-doc-methods-dev2

zhengruifeng commented May 26, 2026 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

zhengruifeng commented May 27, 2026

How was this patch tested?

Uh oh!

zhengruifeng commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zhengruifeng commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

How was this patch tested?

Uh oh!

zhengruifeng commented May 27, 2026

How was this patch tested?

Uh oh!

zhengruifeng commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhengruifeng commented May 26, 2026 •

edited

Loading