[SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference#56116
[SPARK-57072][PYTHON][DOCS] Add missing 4.2 methods to PySpark API reference#56116zhengruifeng wants to merge 3 commits into
Conversation
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @zhengruifeng . Are these all missing instances? How do we verify it?
How was this patch tested?
Docs-only change. New entries inserted alphabetically within each autosummary block (DataFrame.zipWithIndex is appended after the existing trailing DataFrame.pandas_api since it is alphabetically last).
I am asking AI to generate the candidates and then check manually. |
Add three methods added in Spark 4.2 that were missing from the Python
API reference:
- DataFrame.zipWithIndex (SPARK-55229/SPARK-55231)
- DataSourceStreamReader.getDefaultReadLimit (SPARK-55304)
- DataSourceStreamReader.reportLatestOffset (SPARK-55304)
The methods themselves were shipped with .. versionadded:: 4.2.0 and are
exported from their respective public modules; only the autosummary
entries in reference/pyspark.sql/{dataframe,datasource}.rst were absent.
…ference Add the changes() reader method (SPARK-55950, .. versionadded:: 4.2.0) to the Python API reference for both the batch and streaming sides: - DataFrameReader.changes -> reference/pyspark.sql/io.rst - DataStreamReader.changes -> reference/pyspark.ss/io.rst
DataStreamReader.name (SPARK-55121, .. versionadded:: 4.2.0) is public on pyspark.sql.streaming.DataStreamReader but was missing from reference/pyspark.ss/io.rst. Insert alphabetically between load and option.
c95af5b to
5a1dedf
Compare
…erence ### What changes were proposed in this pull request? Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only. `python/docs/source/reference/pyspark.sql/dataframe.rst`: - `DataFrame.zipWithIndex` `python/docs/source/reference/pyspark.sql/datasource.rst`: - `DataSourceStreamReader.getDefaultReadLimit` - `DataSourceStreamReader.reportLatestOffset` `python/docs/source/reference/pyspark.sql/io.rst`: - `DataFrameReader.changes` `python/docs/source/reference/pyspark.ss/io.rst`: - `DataStreamReader.changes` - `DataStreamReader.name` ### Why are the changes needed? All of the above are public, marked `.. versionadded:: 4.2.0`, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference. Original JIRAs: - `DataFrame.zipWithIndex` — SPARK-55229 / SPARK-55231 - `DataSourceStreamReader.getDefaultReadLimit` / `reportLatestOffset` — SPARK-55304 - `DataFrameReader.changes` / `DataStreamReader.changes` — SPARK-55950 - `DataStreamReader.name` — SPARK-55121 ### Does this PR introduce _any_ user-facing change? Documentation-only change; the methods themselves are unchanged. ### How was this patch tested? Docs-only change. New entries inserted alphabetically within each autosummary block (`DataFrame.zipWithIndex` is appended after the existing trailing `DataFrame.pandas_api` since it is alphabetically last). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (model: claude-opus-4-7) Closes #56116 from zhengruifeng/spark-doc-methods-dev2. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com> (cherry picked from commit 64a8b51) Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
…erence ### What changes were proposed in this pull request? Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only. `python/docs/source/reference/pyspark.sql/dataframe.rst`: - `DataFrame.zipWithIndex` `python/docs/source/reference/pyspark.sql/datasource.rst`: - `DataSourceStreamReader.getDefaultReadLimit` - `DataSourceStreamReader.reportLatestOffset` `python/docs/source/reference/pyspark.sql/io.rst`: - `DataFrameReader.changes` `python/docs/source/reference/pyspark.ss/io.rst`: - `DataStreamReader.changes` - `DataStreamReader.name` ### Why are the changes needed? All of the above are public, marked `.. versionadded:: 4.2.0`, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference. Original JIRAs: - `DataFrame.zipWithIndex` — SPARK-55229 / SPARK-55231 - `DataSourceStreamReader.getDefaultReadLimit` / `reportLatestOffset` — SPARK-55304 - `DataFrameReader.changes` / `DataStreamReader.changes` — SPARK-55950 - `DataStreamReader.name` — SPARK-55121 ### Does this PR introduce _any_ user-facing change? Documentation-only change; the methods themselves are unchanged. ### How was this patch tested? Docs-only change. New entries inserted alphabetically within each autosummary block (`DataFrame.zipWithIndex` is appended after the existing trailing `DataFrame.pandas_api` since it is alphabetically last). ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (model: claude-opus-4-7) Closes #56116 from zhengruifeng/spark-doc-methods-dev2. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com> (cherry picked from commit 64a8b51) Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
|
thanks all, merged to master/4.x/4.2 |
What changes were proposed in this pull request?
Add public PySpark APIs that were added in Spark 4.2 but missing from the rendered Python API reference. This PR is documentation-only.
python/docs/source/reference/pyspark.sql/dataframe.rst:DataFrame.zipWithIndexpython/docs/source/reference/pyspark.sql/datasource.rst:DataSourceStreamReader.getDefaultReadLimitDataSourceStreamReader.reportLatestOffsetpython/docs/source/reference/pyspark.sql/io.rst:DataFrameReader.changespython/docs/source/reference/pyspark.ss/io.rst:DataStreamReader.changesDataStreamReader.nameWhy are the changes needed?
All of the above are public, marked
.. versionadded:: 4.2.0, and reachable through their respective public modules, but the autosummary entries were never added so they do not appear in the rendered API reference.Original JIRAs:
DataFrame.zipWithIndex— SPARK-55229 / SPARK-55231DataSourceStreamReader.getDefaultReadLimit/reportLatestOffset— SPARK-55304DataFrameReader.changes/DataStreamReader.changes— SPARK-55950DataStreamReader.name— SPARK-55121Does this PR introduce any user-facing change?
Documentation-only change; the methods themselves are unchanged.
How was this patch tested?
Docs-only change. New entries inserted alphabetically within each autosummary block (
DataFrame.zipWithIndexis appended after the existing trailingDataFrame.pandas_apisince it is alphabetically last).Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (model: claude-opus-4-7)