Skip to content

[Feat] Add helm charts for nydus-snapshotter for convenient K8s deploytment #719

Open
Zephyrcf wants to merge 10 commits intocontainerd:mainfrom
Zephyrcf:feat/add-helm
Open

[Feat] Add helm charts for nydus-snapshotter for convenient K8s deploytment #719
Zephyrcf wants to merge 10 commits intocontainerd:mainfrom
Zephyrcf:feat/add-helm

Conversation

@Zephyrcf
Copy link
Copy Markdown
Contributor

@Zephyrcf Zephyrcf commented Mar 17, 2026

Overview

Please briefly describe the changes your pull request makes.

Related Issues

Please link to the relevant issue. For example: Fix #123 or Related #456.
#717

Change Details

Please describe your changes in detail:

Test Results

If you have any relevant screenshots or videos that can help illustrate your changes, please add them here.

Change Type

Please select the type of change your pull request relates to:

  • Bug Fix
  • Feature Addition
  • Documentation Update
  • Code Refactoring
  • Performance Improvement
  • Other (please describe)

Self-Checklist

Before submitting a pull request, please ensure you have completed the following:

  • I have run a code style check and addressed any warnings/errors.
  • I have added appropriate comments to my code (if applicable).
  • I have updated the documentation (if applicable).
  • I have written appropriate unit tests.

Changwei Ge and others added 8 commits March 17, 2026 17:12
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Changwei Ge <gechangwei@bytedance.com>
Signed-off-by: Zephyrcf <zinsist777@gmail.com>
Copilot AI review requested due to automatic review settings March 17, 2026 09:35
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 28.94737% with 108 lines in your changes missing coverage. Please review.
✅ Project coverage is 22.99%. Comparing base (fc330cc) to head (51e56a0).
⚠️ Report is 14 commits behind head on main.

Files with missing lines Patch % Lines
pkg/manager/daemon_adaptor.go 30.55% 95 Missing and 5 partials ⚠️
pkg/manager/daemon_event.go 0.00% 3 Missing ⚠️
pkg/system/system.go 0.00% 3 Missing ⚠️
pkg/manager/manager.go 0.00% 1 Missing ⚠️
snapshot/snapshot.go 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #719      +/-   ##
==========================================
+ Coverage   22.02%   22.99%   +0.96%     
==========================================
  Files         130      131       +1     
  Lines       11931    12194     +263     
==========================================
+ Hits         2628     2804     +176     
- Misses       8960     9040      +80     
- Partials      343      350       +7     
Files with missing lines Coverage Δ
config/config.go 38.33% <ø> (ø)
pkg/manager/manager.go 0.00% <0.00%> (ø)
snapshot/snapshot.go 5.49% <0.00%> (-0.01%) ⬇️
pkg/manager/daemon_event.go 0.00% <0.00%> (ø)
pkg/system/system.go 5.28% <0.00%> (+0.04%) ⬆️
pkg/manager/daemon_adaptor.go 17.81% <30.55%> (+17.81%) ⬆️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@imeoer imeoer requested review from BraveY and imeoer March 17, 2026 09:41
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Helm-based Kubernetes deployment path for nydus-snapshotter, including a dedicated “Helm image” build and E2E workflow support. To make the DaemonSet-based deployment feasible, it also introduces a delegate_nydusd mode so the snapshotter can start nydusd via host systemd when running in a container with host namespaces.

Changes:

  • Add Helm chart templates/values to deploy nydus-snapshotter as a privileged DaemonSet with optional host containerd bootstrapping.
  • Introduce delegate_nydusd config wiring and implement delegated nydusd startup via systemd-run + host mount namespace entry.
  • Add CI support: Helm chart lint/template + kind-based E2E, and publish an additional “helm” container image flavor.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
snapshot/snapshot.go Wires DelegateNydusd into the manager construction (currently for fusedev manager).
pkg/system/system.go Switches daemon upgrade launch path to the new StartDaemonProcess helper.
pkg/manager/manager.go Adds DelegateNydusd option/field to Manager.
pkg/manager/daemon_event.go Updates upgrade flow to use StartDaemonProcess.
pkg/manager/daemon_adaptor.go Implements delegated daemon spawn (systemd-run), mount-namespace execution helper, and PID parsing.
misc/snapshotter/setup.sh Adds an entrypoint script that installs artifacts and optionally restarts host containerd after socket readiness.
misc/snapshotter/config.toml Fixes typos and aligns formatting for the sample config.
misc/snapshotter/Dockerfile Adjusts kubectl sourcing behavior (now uses the “stable” channel).
docker/Dockerfile Introduces a new “Helm image” Dockerfile that bundles containerd-nydus-grpc, ctr, and helper tooling.
config/config.go Adds delegate_nydusd to the snapshotter TOML config schema.
charts/values.yaml Adds Helm values for image/config and DaemonSet/host setup defaults.
charts/templates/snapshotter-config.yaml Adds ConfigMap rendering for config.toml.
charts/templates/service-account.yaml Adds (optional) ServiceAccount creation.
charts/templates/runc-nydus.yaml Adds RuntimeClass definitions for runc-nydus and runc-overlayfs.
charts/templates/daemonset.yaml Adds the main DaemonSet, including invasive host containerd config mutation/restart path.
charts/templates/cluster-role.yaml Adds a ClusterRole (currently empty rules).
charts/templates/cluster-role-binding.yaml Adds ClusterRoleBinding for the ServiceAccount (to the empty ClusterRole).
charts/templates/_helpers.tpl Provides naming helpers and config rendering/legacy mapping logic.
charts/Chart.yaml Adds Helm chart metadata.
README.md Updates Kubernetes installation guidance to reference Helm install.
.github/workflows/release.yml Adds build/push step for the Helm image variant.
.github/workflows/helm-e2e.yml Adds Helm lint/template and kind-based E2E + hot-upgrade CI coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.


sleep 30

for i in $(seq 1 12); do
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've already used 'kubectl wait' later to wait for the snapshotter pod to be ready, so why do we need a loop check here?

Copy link
Copy Markdown
Contributor Author

@Zephyrcf Zephyrcf Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.
kubectl wait later only waits for the final Ready condition, but here we need a stronger check. In CI we've seen cases where that is not enough: the DaemonSet is scheduled, but the new snapshotter pod is still being recreated,stuck in init, or failing with FailedCreatePodSandBox. The loop in tests/helpers/helm-e2e.sh lets us track the actual candidate pod and fail early with useful diagnostics instead of just timing out.

apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: runc-nydus
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we don't recommend using the RuntimeClass-level snapshotter approach, as containerd's implementation of it is still experimental (not sure about the current status in containerd v2.x), and it has caused some snapshot issues. We'd suggest sticking with the global snapshotter configuration in the helm chart for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. the chart already sets nydus as the default snapshotter in containerd. the runc-overlayfs RuntimeClass is only for bootstrapping the snapshotter pod itself with runc+overlayfs.

helm upgrade "${HELM_RELEASE}" charts/ \
--namespace "${HELM_NAMESPACE}" \
--reuse-values \
--set snapshotter.config.daemon.nydusd_path=/opt/nydus/bin/new-nydusd \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new nydusd is already included in the new snapshotter image, why do we still need to set nydusd_path here? This adds complexity to the hot upgrade process.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently creating two snapshotter images. The corresponding nydus versions inside the images are different for hot upgrade testing.

{{- end }}

{{/*
Default snapshotter config values used to render config.toml.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better not to declare these here, as it could complicate maintenance when configurations change in the future. Letting users modify snapshotter-config.yaml directly is sufficient.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I retain part of the configuration in the tpl so that DaemonSet rolling can be triggered through checksum/configuration, while providing basic configuration for users to modify.


echo "=== Configuring containerd ==="
cat > "$desiredProxyConfig" <<'EOF'
version = 2
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make configuration changes compatible with various versions based on the original containerd version: dragonflyoss/nydus#1810

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants