Skip to content

[SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs#56110

Closed
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:share-precompile-integration-tests-dev5
Closed

[SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs#56110
zhengruifeng wants to merge 3 commits into
apache:masterfrom
zhengruifeng:share-precompile-integration-tests-dev5

Conversation

@zhengruifeng
Copy link
Copy Markdown
Contributor

@zhengruifeng zhengruifeng commented May 26, 2026

What changes were proposed in this pull request?

This PR extends the SBT precompile-sharing pattern (parent: SPARK-56830; prior sub-tasks: SPARK-56768 pyspark, SPARK-56831 sparkr, SPARK-56943 JVM build) to the two remaining SBT-compiling jobs in .github/workflows/build_and_test.yml that still run their own full Spark compile:

  • docker-integration-tests
  • k8s-integration-tests

Concretely:

  • The existing precompile job's if: gate is extended to also fire when docker-integration-tests == 'true' or k8s-integration-tests == 'true' in the precondition output, so the artifact is available whenever either job needs it.
  • The precompile SBT invocation adds -Pkubernetes-integration-tests, so the integration-tests submodule's target/ ends up in the shared artifact and the k8s job doesn't have to recompile it.
  • docker-integration-tests:
    • needs: precondition -> needs: [precondition, precompile]
    • if: extended with (!cancelled()) && so the job still runs if precompile is cancelled.
    • Adds "Download precompiled artifact" + "Extract precompiled artifact" steps between Java setup and Run tests, with graceful fallback (continue-on-error: true).
    • Run tests exports SKIP_SCALA_BUILD=true when extraction succeeded; dev/run-tests.py already honors this flag and skips build_apache_spark + build_spark_assembly_sbt.
  • k8s-integration-tests:
    • Same needs: and if: change.
    • Adds the same Download/Extract steps after Java setup.
    • The actual test runs via a direct build/sbt ... "kubernetes-integration-tests/test" call rather than dev/run-tests.py, so no SKIP_SCALA_BUILD is set. SBT sees the extracted target/ and skips compilation of the pre-built modules (Spark Core, SQL, kubernetes, integration-tests, ...); only the small SparkR Scala bindings still compile (the precompile doesn't include -Psparkr because that profile activates core/buildRPackage, which shells out to R, and the precompile runner doesn't have R installed).

Optional: graceful fallback if precompile fails

Same pattern as the prior sub-tasks:

  • precompile keeps continue-on-error: true.
  • Both consumers' "Download precompiled artifact" step is gated on needs.precompile.result == 'success' and has continue-on-error: true.
  • "Extract precompiled artifact" is gated on the download succeeding and has continue-on-error: true.
  • For docker, SKIP_SCALA_BUILD=true is exported only when steps.extract-precompiled.outcome == 'success'; otherwise dev/run-tests.py runs the original local SBT build.
  • For k8s, if extraction fails, SBT compiles from scratch as before.

Worst case is degraded to the pre-PR behavior, not a workflow failure.

Profile coverage

The precompile job runs:

./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
  -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
  -Pdocker-integration-tests -Pkubernetes-integration-tests -Pvolcano \
  Test/package streaming-kinesis-asl-assembly/assembly connect/assembly assembly/package
  • docker-integration-tests: profile is in the precompile invocation; the module's target/ is pre-built, so dev/run-tests --modules docker-integration-tests only runs the test phase.
  • k8s-integration-tests: -Pkubernetes and -Pkubernetes-integration-tests are both in the precompile, so the integration-tests submodule is pre-built. The job's direct SBT call adds -Psparkr, which triggers compile of the small SparkR Scala bindings on top of the reused target/. Net work in this job drops from "compile all of Spark + integration tests + sparkr" to "compile only the sparkr module".

Why are the changes needed?

Today every scheduled / dispatched run of build_and_test.yml that requires docker-integration-tests or k8s-integration-tests re-runs the same SBT compile that precompile already produced for pyspark / sparkr / build. Wiring these two consumers to the existing artifact removes that duplicate work for free (precompile is already running).

Does this PR introduce any user-facing change?

No. CI infrastructure change only.

How was this patch tested?

The change is exercised by the CI run of this PR itself. The Download/Extract steps log artifact size; the Run tests step prints Reusing precompiled artifact, skipping local SBT build. for the docker job when the fast path is taken. If the precompile job is forced to fail (or its artifact is missing), both consumers fall back to the original local SBT build.

Measured CI timings before vs after are posted as a comment on this PR.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

@zhengruifeng zhengruifeng changed the title [INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs [SPARK-57069][INFRA] Share SBT precompile artifact with docker/k8s integration test CI jobs May 26, 2026
@zhengruifeng zhengruifeng marked this pull request as ready for review May 26, 2026 10:24
@zhengruifeng
Copy link
Copy Markdown
Contributor Author

zhengruifeng commented May 26, 2026

CI performance: before vs after

Comparing per-job wall time on real CI runs (n=2 BEFORE, n=2 AFTER):

Job Before avg (n=2) After (initial, n=1) After (with -Pkubernetes-integration-tests in precompile, n=1) Savings vs before
Precompile Spark 16m34s 16m13s 16m49s ~0 (within noise)
Run Docker integration tests 90m48s 74m12s 74m28s ~16m (~18%)
Run Spark on Kubernetes Integration test 66m56s 65m48s 75m24s* within noise

Samples:

* The AFTER-2 k8s total looks ~10m worse, but step-level breakdown shows ~7m of it is unrelated Checkout Spark + Sync the current branch slowness on that runner (5m25s + 1m51s vs 0m37s + 0m01s in AFTER-1) -- pure GitHub-side noise. The actual Run Spark on K8S integration test step itself was 59m54s vs 57m50s, +2m04s, within typical CI variance on a 60-minute step.

Reading the result

  • Docker is a clean win -- ~16m saved per run, ~18% of job wall time, same payoff shape as the pyspark sharing in SPARK-56768. Docker tests are compile-heavy relative to their other work.
  • K8s savings are small / within noise. Adding -Pkubernetes-integration-tests to the precompile (the followup commit) means the integration-tests submodule is no longer compiled at test time. But its compile is small, and the test step is dominated by Minikube startup, Spark Docker image build, and the actual K8s integration test execution. SparkR Scala bindings still compile in the k8s job because -Psparkr can't be added to precompile without installing R (it activates core/buildRPackage, which shells out to R/install-dev.sh).
  • The precompile job itself is ~0.5m longer with the -Pkubernetes-integration-tests addition (16m13s -> 16m49s). Negligible.

Net per scheduled run

Docker savings (~16m) are real and consistent. K8s savings exist but are not measurable above CI variance. Even where the wall-clock impact on k8s is small, the change is no-cost (precompile is already running for pyspark / sparkr / build), and the silent fallback means worst case is degraded to the pre-PR behavior.

@zhengruifeng zhengruifeng marked this pull request as draft May 27, 2026 03:49
…t CI jobs

Generated-by: Claude Code (Opus 4.7)
…n precompile

Extends the precompile invocation to also build the kubernetes-integration-tests
submodule and the SparkR Scala bindings. With both included, the k8s
integration test job's SBT call ('build/sbt -Phadoop-3 -Psparkr -Pkubernetes
-Pvolcano -Pkubernetes-integration-tests ... kubernetes-integration-tests/test')
sees compiled classes for every active profile in the extracted target/ and
only runs the test phase rather than compiling those modules first.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng force-pushed the share-precompile-integration-tests-dev5 branch from d5d0bff to a5c3157 Compare May 27, 2026 05:39
The previous followup added -Psparkr to the precompile SBT invocation, but
-Psparkr activates 'core/buildRPackage' which shells out to R's install-dev.sh
to build the SparkR R package. The precompile runner does not have R installed,
so the task fails with 'Nonzero exit value: 1' (see PR run 26493097995).

Keeping the runner R-free is cheaper than installing R for every consumer of
the precompile artifact, since the only saving is ~30-60s of Scala compile on
the small SparkR module, and the consumers that activate -Psparkr (sparkr,
k8s-integration-tests) install R themselves and rebuild that module
incrementally on top of the extracted target/. -Pkubernetes-integration-tests
stays in the precompile.

Generated-by: Claude Code (Opus 4.7)
@zhengruifeng zhengruifeng marked this pull request as ready for review May 28, 2026 06:55
zhengruifeng added a commit that referenced this pull request May 28, 2026
…tegration test CI jobs

### What changes were proposed in this pull request?

This PR extends the SBT precompile-sharing pattern (parent: [SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830); prior sub-tasks: [SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768) pyspark, [SPARK-56831](https://issues.apache.org/jira/browse/SPARK-56831) sparkr, [SPARK-56943](https://issues.apache.org/jira/browse/SPARK-56943) JVM build) to the two remaining SBT-compiling jobs in `.github/workflows/build_and_test.yml` that still run their own full Spark compile:

- `docker-integration-tests`
- `k8s-integration-tests`

Concretely:

- The existing `precompile` job's `if:` gate is extended to also fire when `docker-integration-tests == 'true'` or `k8s-integration-tests == 'true'` in the precondition output, so the artifact is available whenever either job needs it.
- The precompile SBT invocation adds `-Pkubernetes-integration-tests`, so the integration-tests submodule's `target/` ends up in the shared artifact and the k8s job doesn't have to recompile it.
- `docker-integration-tests`:
  - `needs: precondition` -> `needs: [precondition, precompile]`
  - `if:` extended with `(!cancelled()) &&` so the job still runs if precompile is cancelled.
  - Adds "Download precompiled artifact" + "Extract precompiled artifact" steps between Java setup and `Run tests`, with graceful fallback (`continue-on-error: true`).
  - `Run tests` exports `SKIP_SCALA_BUILD=true` when extraction succeeded; `dev/run-tests.py` already honors this flag and skips `build_apache_spark` + `build_spark_assembly_sbt`.
- `k8s-integration-tests`:
  - Same `needs:` and `if:` change.
  - Adds the same Download/Extract steps after Java setup.
  - The actual test runs via a direct `build/sbt ... "kubernetes-integration-tests/test"` call rather than `dev/run-tests.py`, so no `SKIP_SCALA_BUILD` is set. SBT sees the extracted `target/` and skips compilation of the pre-built modules (Spark Core, SQL, kubernetes, integration-tests, ...); only the small SparkR Scala bindings still compile (the precompile doesn't include `-Psparkr` because that profile activates `core/buildRPackage`, which shells out to R, and the precompile runner doesn't have R installed).

### Optional: graceful fallback if precompile fails

Same pattern as the prior sub-tasks:
- `precompile` keeps `continue-on-error: true`.
- Both consumers' "Download precompiled artifact" step is gated on `needs.precompile.result == 'success'` and has `continue-on-error: true`.
- "Extract precompiled artifact" is gated on the download succeeding and has `continue-on-error: true`.
- For docker, `SKIP_SCALA_BUILD=true` is exported only when `steps.extract-precompiled.outcome == 'success'`; otherwise `dev/run-tests.py` runs the original local SBT build.
- For k8s, if extraction fails, SBT compiles from scratch as before.

Worst case is degraded to the pre-PR behavior, not a workflow failure.

### Profile coverage

The precompile job runs:
```
./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
  -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
  -Pdocker-integration-tests -Pkubernetes-integration-tests -Pvolcano \
  Test/package streaming-kinesis-asl-assembly/assembly connect/assembly assembly/package
```

- `docker-integration-tests`: profile is in the precompile invocation; the module's `target/` is pre-built, so `dev/run-tests --modules docker-integration-tests` only runs the test phase.
- `k8s-integration-tests`: `-Pkubernetes` and `-Pkubernetes-integration-tests` are both in the precompile, so the integration-tests submodule is pre-built. The job's direct SBT call adds `-Psparkr`, which triggers compile of the small SparkR Scala bindings on top of the reused `target/`. Net work in this job drops from "compile all of Spark + integration tests + sparkr" to "compile only the sparkr module".

### Why are the changes needed?

Today every scheduled / dispatched run of `build_and_test.yml` that requires `docker-integration-tests` or `k8s-integration-tests` re-runs the same SBT compile that `precompile` already produced for `pyspark` / `sparkr` / `build`. Wiring these two consumers to the existing artifact removes that duplicate work for free (precompile is already running).

### Does this PR introduce _any_ user-facing change?

No. CI infrastructure change only.

### How was this patch tested?

The change is exercised by the CI run of this PR itself. The Download/Extract steps log artifact size; the Run tests step prints `Reusing precompiled artifact, skipping local SBT build.` for the docker job when the fast path is taken. If the precompile job is forced to fail (or its artifact is missing), both consumers fall back to the original local SBT build.

Measured CI timings before vs after are posted as a comment on this PR.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

Closes #56110 from zhengruifeng/share-precompile-integration-tests-dev5.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit b96b633)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
zhengruifeng added a commit that referenced this pull request May 28, 2026
…tegration test CI jobs

### What changes were proposed in this pull request?

This PR extends the SBT precompile-sharing pattern (parent: [SPARK-56830](https://issues.apache.org/jira/browse/SPARK-56830); prior sub-tasks: [SPARK-56768](https://issues.apache.org/jira/browse/SPARK-56768) pyspark, [SPARK-56831](https://issues.apache.org/jira/browse/SPARK-56831) sparkr, [SPARK-56943](https://issues.apache.org/jira/browse/SPARK-56943) JVM build) to the two remaining SBT-compiling jobs in `.github/workflows/build_and_test.yml` that still run their own full Spark compile:

- `docker-integration-tests`
- `k8s-integration-tests`

Concretely:

- The existing `precompile` job's `if:` gate is extended to also fire when `docker-integration-tests == 'true'` or `k8s-integration-tests == 'true'` in the precondition output, so the artifact is available whenever either job needs it.
- The precompile SBT invocation adds `-Pkubernetes-integration-tests`, so the integration-tests submodule's `target/` ends up in the shared artifact and the k8s job doesn't have to recompile it.
- `docker-integration-tests`:
  - `needs: precondition` -> `needs: [precondition, precompile]`
  - `if:` extended with `(!cancelled()) &&` so the job still runs if precompile is cancelled.
  - Adds "Download precompiled artifact" + "Extract precompiled artifact" steps between Java setup and `Run tests`, with graceful fallback (`continue-on-error: true`).
  - `Run tests` exports `SKIP_SCALA_BUILD=true` when extraction succeeded; `dev/run-tests.py` already honors this flag and skips `build_apache_spark` + `build_spark_assembly_sbt`.
- `k8s-integration-tests`:
  - Same `needs:` and `if:` change.
  - Adds the same Download/Extract steps after Java setup.
  - The actual test runs via a direct `build/sbt ... "kubernetes-integration-tests/test"` call rather than `dev/run-tests.py`, so no `SKIP_SCALA_BUILD` is set. SBT sees the extracted `target/` and skips compilation of the pre-built modules (Spark Core, SQL, kubernetes, integration-tests, ...); only the small SparkR Scala bindings still compile (the precompile doesn't include `-Psparkr` because that profile activates `core/buildRPackage`, which shells out to R, and the precompile runner doesn't have R installed).

### Optional: graceful fallback if precompile fails

Same pattern as the prior sub-tasks:
- `precompile` keeps `continue-on-error: true`.
- Both consumers' "Download precompiled artifact" step is gated on `needs.precompile.result == 'success'` and has `continue-on-error: true`.
- "Extract precompiled artifact" is gated on the download succeeding and has `continue-on-error: true`.
- For docker, `SKIP_SCALA_BUILD=true` is exported only when `steps.extract-precompiled.outcome == 'success'`; otherwise `dev/run-tests.py` runs the original local SBT build.
- For k8s, if extraction fails, SBT compiles from scratch as before.

Worst case is degraded to the pre-PR behavior, not a workflow failure.

### Profile coverage

The precompile job runs:
```
./build/sbt -Phadoop-3 -Pyarn -Pspark-ganglia-lgpl -Phadoop-cloud -Phive \
  -Pkubernetes -Pjvm-profiler -Pkinesis-asl -Phive-thriftserver \
  -Pdocker-integration-tests -Pkubernetes-integration-tests -Pvolcano \
  Test/package streaming-kinesis-asl-assembly/assembly connect/assembly assembly/package
```

- `docker-integration-tests`: profile is in the precompile invocation; the module's `target/` is pre-built, so `dev/run-tests --modules docker-integration-tests` only runs the test phase.
- `k8s-integration-tests`: `-Pkubernetes` and `-Pkubernetes-integration-tests` are both in the precompile, so the integration-tests submodule is pre-built. The job's direct SBT call adds `-Psparkr`, which triggers compile of the small SparkR Scala bindings on top of the reused `target/`. Net work in this job drops from "compile all of Spark + integration tests + sparkr" to "compile only the sparkr module".

### Why are the changes needed?

Today every scheduled / dispatched run of `build_and_test.yml` that requires `docker-integration-tests` or `k8s-integration-tests` re-runs the same SBT compile that `precompile` already produced for `pyspark` / `sparkr` / `build`. Wiring these two consumers to the existing artifact removes that duplicate work for free (precompile is already running).

### Does this PR introduce _any_ user-facing change?

No. CI infrastructure change only.

### How was this patch tested?

The change is exercised by the CI run of this PR itself. The Download/Extract steps log artifact size; the Run tests step prints `Reusing precompiled artifact, skipping local SBT build.` for the docker job when the fast path is taken. If the precompile job is forced to fail (or its artifact is missing), both consumers fall back to the original local SBT build.

Measured CI timings before vs after are posted as a comment on this PR.

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Opus 4.7)

Closes #56110 from zhengruifeng/share-precompile-integration-tests-dev5.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
(cherry picked from commit b96b633)
Signed-off-by: Ruifeng Zheng <ruifengz@foxmail.com>
@zhengruifeng
Copy link
Copy Markdown
Contributor Author

thanks, merged to master/4.x/4.2

@zhengruifeng zhengruifeng deleted the share-precompile-integration-tests-dev5 branch May 28, 2026 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants