[SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs by zhengruifeng · Pull Request #56110 · apache/spark

zhengruifeng · 2026-05-26T07:17:03Z

What changes were proposed in this pull request?

This PR extends the SBT precompile-sharing pattern (parent: SPARK-56830; prior sub-tasks: SPARK-56768 pyspark, SPARK-56831 sparkr, SPARK-56943 JVM build) to the two remaining SBT-compiling jobs in .github/workflows/build_and_test.yml that still run their own full Spark compile:

docker-integration-tests
k8s-integration-tests

Concretely:

The existing precompile job's if: gate is extended to also fire when docker-integration-tests == 'true' or k8s-integration-tests == 'true' in the precondition output, so the artifact is available whenever either job needs it.
The precompile SBT invocation adds -Pkubernetes-integration-tests, so the integration-tests submodule's target/ ends up in the shared artifact and the k8s job doesn't have to recompile it.
docker-integration-tests:
- needs: precondition -> needs: [precondition, precompile]
- if: extended with (!cancelled()) && so the job still runs if precompile is cancelled.
- Adds "Download precompiled artifact" + "Extract precompiled artifact" steps between Java setup and Run tests, with graceful fallback (continue-on-error: true).
- Run tests exports SKIP_SCALA_BUILD=true when extraction succeeded; dev/run-tests.py already honors this flag and skips build_apache_spark + build_spark_assembly_sbt.
k8s-integration-tests:
- Same needs: and if: change.
- Adds the same Download/Extract steps after Java setup.
- The actual test runs via a direct build/sbt ... "kubernetes-integration-tests/test" call rather than dev/run-tests.py, so no SKIP_SCALA_BUILD is set. SBT sees the extracted target/ and skips compilation of the pre-built modules (Spark Core, SQL, kubernetes, integration-tests, ...); only the small SparkR Scala bindings still compile (the precompile doesn't include -Psparkr because that profile activates core/buildRPackage, which shells out to R, and the precompile runner doesn't have R installed).

Optional: graceful fallback if precompile fails

Same pattern as the prior sub-tasks:

precompile keeps continue-on-error: true.
Both consumers' "Download precompiled artifact" step is gated on needs.precompile.result == 'success' and has continue-on-error: true.
"Extract precompiled artifact" is gated on the download succeeding and has continue-on-error: true.
For docker, SKIP_SCALA_BUILD=true is exported only when steps.extract-precompiled.outcome == 'success'; otherwise dev/run-tests.py runs the original local SBT build.
For k8s, if extraction fails, SBT compiles from scratch as before.

Worst case is degraded to the pre-PR behavior, not a workflow failure.

Profile coverage

The precompile job runs:

./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
  -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
  -Pdocker-integration-tests -Pkubernetes-integration-tests -Pvolcano \
  Test/package streaming-kinesis-asl-assembly/assembly connect/assembly assembly/package

docker-integration-tests: profile is in the precompile invocation; the module's target/ is pre-built, so dev/run-tests --modules docker-integration-tests only runs the test phase.
k8s-integration-tests: -Pkubernetes and -Pkubernetes-integration-tests are both in the precompile, so the integration-tests submodule is pre-built. The job's direct SBT call adds -Psparkr, which triggers compile of the small SparkR Scala bindings on top of the reused target/. Net work in this job drops from "compile all of Spark + integration tests + sparkr" to "compile only the sparkr module".

Why are the changes needed?

Today every scheduled / dispatched run of build_and_test.yml that requires docker-integration-tests or k8s-integration-tests re-runs the same SBT compile that precompile already produced for pyspark / sparkr / build. Wiring these two consumers to the existing artifact removes that duplicate work for free (precompile is already running).

Does this PR introduce any user-facing change?

No. CI infrastructure change only.

How was this patch tested?

The change is exercised by the CI run of this PR itself. The Download/Extract steps log artifact size; the Run tests step prints Reusing precompiled artifact, skipping local SBT build. for the docker job when the fast path is taken. If the precompile job is forced to fail (or its artifact is missing), both consumers fall back to the original local SBT build.

Measured CI timings before vs after are posted as a comment on this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

zhengruifeng · 2026-05-26T10:34:46Z

CI performance: before vs after

Comparing per-job wall time on real CI runs (n=2 BEFORE, n=2 AFTER):

Job	Before avg (n=2)	After (initial, n=1)	After (with `-Pkubernetes-integration-tests` in precompile, n=1)	Savings vs before
Precompile Spark	16m34s	16m13s	16m49s	~0 (within noise)
Run Docker integration tests	90m48s	74m12s	74m28s	~16m (~18%)
Run Spark on Kubernetes Integration test	66m56s	65m48s	75m24s*	within noise

Samples:

BEFORE-1: zhengruifeng/spark run 26072669641 (2026-05-19, on SPARK-56943's PR branch -- precompile already produced an artifact, but docker/k8s didn't consume it).
BEFORE-2: zhengruifeng/spark run 25551778074 (2026-05-08, earlier push on the same PR).
AFTER-1: zhengruifeng/spark run 26438104273 (2026-05-26, initial PR commit -- consumers wired up, but precompile did NOT yet include -Pkubernetes-integration-tests).
AFTER-2: zhengruifeng/spark run 26505208761 (2026-05-27, after followup adding -Pkubernetes-integration-tests to precompile).

* The AFTER-2 k8s total looks ~10m worse, but step-level breakdown shows ~7m of it is unrelated Checkout Spark + Sync the current branch slowness on that runner (5m25s + 1m51s vs 0m37s + 0m01s in AFTER-1) -- pure GitHub-side noise. The actual Run Spark on K8S integration test step itself was 59m54s vs 57m50s, +2m04s, within typical CI variance on a 60-minute step.

Reading the result

Docker is a clean win -- ~16m saved per run, ~18% of job wall time, same payoff shape as the pyspark sharing in SPARK-56768. Docker tests are compile-heavy relative to their other work.
K8s savings are small / within noise. Adding -Pkubernetes-integration-tests to the precompile (the followup commit) means the integration-tests submodule is no longer compiled at test time. But its compile is small, and the test step is dominated by Minikube startup, Spark Docker image build, and the actual K8s integration test execution. SparkR Scala bindings still compile in the k8s job because -Psparkr can't be added to precompile without installing R (it activates core/buildRPackage, which shells out to R/install-dev.sh).
The precompile job itself is ~0.5m longer with the -Pkubernetes-integration-tests addition (16m13s -> 16m49s). Negligible.

Net per scheduled run

Docker savings (~16m) are real and consistent. K8s savings exist but are not measurable above CI variance. Even where the wall-clock impact on k8s is small, the change is no-cost (precompile is already running for pyspark / sparkr / build), and the silent fallback means worst case is degraded to the pre-PR behavior.

…t CI jobs Generated-by: Claude Code (Opus 4.7)

…n precompile Extends the precompile invocation to also build the kubernetes-integration-tests submodule and the SparkR Scala bindings. With both included, the k8s integration test job's SBT call ('build/sbt -Phadoop-3 -Psparkr -Pkubernetes -Pvolcano -Pkubernetes-integration-tests ... kubernetes-integration-tests/test') sees compiled classes for every active profile in the extracted target/ and only runs the test phase rather than compiling those modules first. Generated-by: Claude Code (Opus 4.7)

The previous followup added -Psparkr to the precompile SBT invocation, but -Psparkr activates 'core/buildRPackage' which shells out to R's install-dev.sh to build the SparkR R package. The precompile runner does not have R installed, so the task fails with 'Nonzero exit value: 1' (see PR run 26493097995). Keeping the runner R-free is cheaper than installing R for every consumer of the precompile artifact, since the only saving is ~30-60s of Scala compile on the small SparkR module, and the consumers that activate -Psparkr (sparkr, k8s-integration-tests) install R themselves and rebuild that module incrementally on top of the extracted target/. -Pkubernetes-integration-tests stays in the precompile. Generated-by: Claude Code (Opus 4.7)

…tegration test CI jobs ### What changes were proposed in this pull request? This PR extends the SBT precompile-sharing pattern (parent: [SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830); prior sub-tasks: [SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768) pyspark, [SPARK-56831](https://issues.apache.org/jira/browse/SPARK-56831) sparkr, [SPARK-56943](https://issues.apache.org/jira/browse/SPARK-56943) JVM build) to the two remaining SBT-compiling jobs in `.github/workflows/build_and_test.yml` that still run their own full Spark compile: - `docker-integration-tests` - `k8s-integration-tests` Concretely: - The existing `precompile` job's `if:` gate is extended to also fire when `docker-integration-tests == 'true'` or `k8s-integration-tests == 'true'` in the precondition output, so the artifact is available whenever either job needs it. - The precompile SBT invocation adds `-Pkubernetes-integration-tests`, so the integration-tests submodule's `target/` ends up in the shared artifact and the k8s job doesn't have to recompile it. - `docker-integration-tests`: - `needs: precondition` -> `needs: [precondition, precompile]` - `if:` extended with `(!cancelled()) &&` so the job still runs if precompile is cancelled. - Adds "Download precompiled artifact" + "Extract precompiled artifact" steps between Java setup and `Run tests`, with graceful fallback (`continue-on-error: true`). - `Run tests` exports `SKIP_SCALA_BUILD=true` when extraction succeeded; `dev/run-tests.py` already honors this flag and skips `build_apache_spark` + `build_spark_assembly_sbt`. - `k8s-integration-tests`: - Same `needs:` and `if:` change. - Adds the same Download/Extract steps after Java setup. - The actual test runs via a direct `build/sbt ... "kubernetes-integration-tests/test"` call rather than `dev/run-tests.py`, so no `SKIP_SCALA_BUILD` is set. SBT sees the extracted `target/` and skips compilation of the pre-built modules (Spark Core, SQL, kubernetes, integration-tests, ...); only the small SparkR Scala bindings still compile (the precompile doesn't include `-Psparkr` because that profile activates `core/buildRPackage`, which shells out to R, and the precompile runner doesn't have R installed). ### Optional: graceful fallback if precompile fails Same pattern as the prior sub-tasks: - `precompile` keeps `continue-on-error: true`. - Both consumers' "Download precompiled artifact" step is gated on `needs.precompile.result == 'success'` and has `continue-on-error: true`. - "Extract precompiled artifact" is gated on the download succeeding and has `continue-on-error: true`. - For docker, `SKIP_SCALA_BUILD=true` is exported only when `steps.extract-precompiled.outcome == 'success'`; otherwise `dev/run-tests.py` runs the original local SBT build. - For k8s, if extraction fails, SBT compiles from scratch as before. Worst case is degraded to the pre-PR behavior, not a workflow failure. ### Profile coverage The precompile job runs: ``` ./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \ -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \ -Pdocker-integration-tests -Pkubernetes-integration-tests -Pvolcano \ Test/package streaming-kinesis-asl-assembly/assembly connect/assembly assembly/package ``` - `docker-integration-tests`: profile is in the precompile invocation; the module's `target/` is pre-built, so `dev/run-tests --modules docker-integration-tests` only runs the test phase. - `k8s-integration-tests`: `-Pkubernetes` and `-Pkubernetes-integration-tests` are both in the precompile, so the integration-tests submodule is pre-built. The job's direct SBT call adds `-Psparkr`, which triggers compile of the small SparkR Scala bindings on top of the reused `target/`. Net work in this job drops from "compile all of Spark + integration tests + sparkr" to "compile only the sparkr module". ### Why are the changes needed? Today every scheduled / dispatched run of `build_and_test.yml` that requires `docker-integration-tests` or `k8s-integration-tests` re-runs the same SBT compile that `precompile` already produced for `pyspark` / `sparkr` / `build`. Wiring these two consumers to the existing artifact removes that duplicate work for free (precompile is already running). ### Does this PR introduce _any_ user-facing change? No. CI infrastructure change only. ### How was this patch tested? The change is exercised by the CI run of this PR itself. The Download/Extract steps log artifact size; the Run tests step prints `Reusing precompiled artifact, skipping local SBT build.` for the docker job when the fast path is taken. If the precompile job is forced to fail (or its artifact is missing), both consumers fall back to the original local SBT build. Measured CI timings before vs after are posted as a comment on this PR. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Code (Opus 4.7) Closes #56110 from zhengruifeng/share-precompile-integration-tests-dev5. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com> (cherry picked from commit b96b633) Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>

zhengruifeng · 2026-05-28T08:15:55Z

thanks, merged to master/4.x/4.2

zhengruifeng changed the title ~~[INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs~~ [SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs May 26, 2026

zhengruifeng marked this pull request as ready for review May 26, 2026 10:24

zhengruifeng marked this pull request as draft May 27, 2026 03:49

zhengruifeng added 2 commits May 27, 2026 05:39

[INFRA] Share SBT precompile artifact with docker/k8s integration tes…

0d79a5b

…t CI jobs Generated-by: Claude Code (Opus 4.7)

zhengruifeng force-pushed the share-precompile-integration-tests-dev5 branch from d5d0bff to a5c3157 Compare May 27, 2026 05:39

zhengruifeng marked this pull request as ready for review May 28, 2026 06:55

zhengruifeng requested review from HyukjinKwon, LuciferYang and dongjoon-hyun May 28, 2026 07:02

HyukjinKwon approved these changes May 28, 2026

View reviewed changes

zhengruifeng closed this in b96b633 May 28, 2026

zhengruifeng deleted the share-precompile-integration-tests-dev5 branch May 28, 2026 08:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs#56110

[SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs#56110
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:share-precompile-integration-tests-dev5

zhengruifeng commented May 26, 2026 •

edited

Loading

Uh oh!

zhengruifeng commented May 26, 2026 •

edited

Loading

Uh oh!

zhengruifeng commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhengruifeng commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Optional: graceful fallback if precompile fails

Profile coverage

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI performance: before vs after

Reading the result

Net per scheduled run

Uh oh!

zhengruifeng commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zhengruifeng commented May 26, 2026 •

edited

Loading

zhengruifeng commented May 26, 2026 •

edited

Loading