Skip to content

fix(sandbox): harden seccomp filter to block dangerous syscalls#740

Merged
johntmyers merged 1 commit intomainfrom
fix/security-harden-seccomp-filter
Apr 2, 2026
Merged

fix(sandbox): harden seccomp filter to block dangerous syscalls#740
johntmyers merged 1 commit intomainfrom
fix/security-harden-seccomp-filter

Conversation

@johntmyers
Copy link
Copy Markdown
Collaborator

Summary

Hardens the seccomp-BPF filter in the sandbox runtime to block syscalls that enable sandbox escape. The existing filter only restricted socket domains; this change adds unconditional and conditional blocks for 9 additional dangerous syscalls.

Changes

  • crates/openshell-sandbox/src/sandbox/linux/seccomp.rs:
    • Unconditional blocks for memfd_create, ptrace, bpf, process_vm_readv, io_uring_setup, and mount -- these syscalls have no legitimate use in the sandbox agent runtime
    • Conditional blocks for execveat (when AT_EMPTY_PATH flag is set), unshare (when CLONE_NEWUSER flag is set), and seccomp (when operation is SECCOMP_SET_MODE_FILTER) -- blocks dangerous flag combinations while preserving normal use
    • Added add_masked_arg_rule helper for MaskedEq condition construction
    • Added 5 unit tests validating filter compilation and rule structure
  • architecture/sandbox.md: Updated seccomp section with socket, unconditional, and conditional block tables
  • architecture/security-policy.md: Added blocked syscalls and conditionally blocked syscalls tables with NR references

Motivation

The seccomp filter used a default-allow policy that only restricted specific socket domains. This left the sandbox vulnerable to:

  • Fileless execution: memfd_create + execveat(AT_EMPTY_PATH) creates and executes binaries entirely in memory, bypassing Landlock's path-based filesystem controls
  • Cross-process attacks: ptrace and process_vm_readv allow inspecting/injecting memory in other sandbox processes
  • Kernel attack surface: bpf and io_uring_setup expose kernel subsystems with extensive CVE history
  • Privilege escalation: unshare(CLONE_NEWUSER) enables user namespace creation for UID remapping

These syscalls are all blocked by Docker's default seccomp profile. This change brings the sandbox in line with that baseline for these specific syscalls.

Testing

  • 5 unit tests added: filter compilation for both proxy and block modes, helper function validation, unconditional block presence, conditional rule presence
  • Pre-commit checks: passed
  • Unit test suite: passed

Checklist

  • Code follows the project's coding standards
  • Tests added for new functionality
  • Documentation updated (architecture/sandbox.md, architecture/security-policy.md)
  • No secrets or credentials in the diff
  • Changes scoped to the security concern only

@johntmyers johntmyers added the topic:security Security issues label Apr 2, 2026
@johntmyers johntmyers requested a review from a team as a code owner April 2, 2026 20:21
@johntmyers johntmyers added the test:e2e Requires end-to-end coverage label Apr 2, 2026
@johntmyers johntmyers merged commit 8887d7c into main Apr 2, 2026
16 of 18 checks passed
@johntmyers johntmyers deleted the fix/security-harden-seccomp-filter branch April 2, 2026 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage topic:security Security issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants