How do seccomp and AppArmor/SELinux profiles restrict container behavior?

Detailed Answer

seccomp — restricting which system calls are even allowed

Every action a program takes that involves the kernel (opening a file, creating a socket, forking a process) goes through a system call (syscall). seccomp (secure computing mode) lets you define a filter specifying exactly which syscalls a process is allowed to make. Any syscall not on the allowed list is blocked outright — typically causing the calling process to receive an error, or be killed, depending on configuration — regardless of what file permissions or capabilities might otherwise seem to allow.

docker run --security-opt seccomp=default.json myapp:1.0

Docker actually applies a default seccomp profile automatically, blocking around 44 of the roughly 300+ available Linux syscalls. This targets syscalls that are rarely needed by typical containerized applications but have historically been associated with container escapes or kernel-level exploits — things like kexec_load, various rarely-needed namespace/mount-manipulation syscalls, and others. Most applications never notice this default restriction at all, since they simply never call the blocked syscalls in normal operation.

docker run --security-opt seccomp=unconfined myapp:1.0    # disables seccomp filtering entirely -- generally a bad idea

Disabling seccomp entirely (unconfined) removes this layer of protection. This is occasionally necessary for specialized workloads that genuinely need a normally-blocked syscall, such as certain low-level debugging or tracing tools, or some specialized networking software. But this should be a deliberate, narrow exception, not a default reached for just to make an error message go away without understanding why it occurred.

AppArmor / SELinux — mandatory access control beyond syscall filtering

Where seccomp restricts which syscalls can be made at all, AppArmor (common on Ubuntu/Debian-based systems) and SELinux (common on RHEL/Fedora-based systems) restrict what a process can actually do with the syscalls it's allowed to make. This includes which specific files it can read or write, what network operations it can perform, and which capabilities it can use, based on a named security profile applied to the process.

docker run --security-opt apparmor=docker-default myapp:1.0

Docker applies a default AppArmor profile automatically on systems where AppArmor is available, similarly restricting a range of higher-risk operations by default without requiring any explicit configuration from the person running the container.

How these layers relate to namespaces, cgroups, and capabilities

Namespaces:    controls what a process can SEE (isolation)
cgroups:        controls how much a process can USE (resource limits)
Capabilities:   controls WHICH root-like privileges a process has, if any
seccomp:         controls WHICH SYSTEM CALLS a process can make at all
AppArmor/SELinux: controls WHAT a process can DO with specific files/resources/capabilities

These are genuinely complementary, layered defenses. A container could pass a resource-usage cgroup check, and be running as a properly non-root user with capabilities already dropped (see those questions). It could still benefit from an additional seccomp/AppArmor layer that specifically blocks syscalls or file access that shouldn't be reachable at all, in case some other assumption in the chain turns out to be wrong. This layering is a textbook example of defense in depth — no single mechanism is assumed to be perfectly sufficient on its own.

Why most users never think about these layers explicitly

Docker applies sensible seccomp and AppArmor defaults automatically, without requiring explicit configuration for the common case. This is exactly why many practitioners aren't aware these protections are active at all. "Just running in a container" already provides meaningfully more restriction than "just running as a regular host process," for exactly this reason. The defaults are worth leaving in place for the overwhelming majority of workloads. Disabling either layer (unconfined) should be treated as a deliberate, narrowly-scoped exception requiring real justification — never a default troubleshooting step for a confusing error.

How do seccomp and AppArmor/SELinux profiles restrict container behavior?

Quick Answer

Detailed Answer

seccomp — restricting which system calls are even allowed

AppArmor / SELinux — mandatory access control beyond syscall filtering

How these layers relate to namespaces, cgroups, and capabilities

Why most users never think about these layers explicitly

Related Resources

Related Questions