Skip to main content
Career Paths
Concepts
Linux Namespaces
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

Linux Namespaces: The Kernel Primitive Behind Every Container

Linux namespaces wrap a global kernel resource so that processes inside a namespace see their own isolated instance of it. Every container runtime — Docker, containerd, CRI-O — creates namespaces at container start. Understanding namespaces is the difference between cargo-culting "containers are secure" and knowing exactly what isolation you have and where it ends.

Relevant for:JuniorMid-levelSeniorStaff
Why this matters at your level
Junior

Know that containers are not VMs. Know the 6 namespace types by name and what each isolates. Be able to explain why a container process cannot see sibling containers via ps.

Mid-level

Inspect namespaces via /proc/<pid>/ns/. Trace what clone() flags the container runtime passes. Understand user namespaces and why they matter for rootless containers.

Senior

Understand namespace escape attack vectors (CVE-2019-5736). Know the difference between namespace isolation and seccomp/AppArmor/SELinux mandatory access control. Design container security policies that layer all three.

Staff

Evaluate whether new workloads require user namespaces. Audit the blast radius of host-mounted volumes and hostPID:true pods. Own the threat model for the container runtime layer across the platform.

Linux Namespaces: The Kernel Primitive Behind Every Container

Linux namespaces wrap a global kernel resource so that processes inside a namespace see their own isolated instance of it. Every container runtime — Docker, containerd, CRI-O — creates namespaces at container start. Understanding namespaces is the difference between cargo-culting "containers are secure" and knowing exactly what isolation you have and where it ends.

~5 min read
Be the first to complete!
LIVEHost Escape — Docker runc CVE-2019-5736 — February 2019
Breaking News
T+0

Attacker controls a malicious image accessible to the cluster

T+5m

Container runs as UID 0, reads /proc/self/exe — a symlink to host runc binary

T+8m

Race loop opens /proc/self/exe for writing while runc is executing the container

T+10m

Host runc binary overwritten with attacker payload via the open file descriptor

T+11m

Next container launch executes attacker payload as root on the node — full host compromise

—Critical severity
—Docker, LXC, Podman affected
—Attacker privilege post-exploit
—Patch deployment window

The question this raises

If containers use namespace isolation, how can a process inside one escape to the host — and which of the 6 namespace types failed to prevent it?

Test your assumption first

Your security team reports a container can run ps aux and see processes from other containers and the host. Which Pod spec field is most likely causing this?

Lesson outline

What Problem Namespaces Solve

The core problem: global kernel resources

Before namespaces, all processes on a Linux host shared one global view of every resource — PID 1 was always init, /proc showed every process, every program could see every network interface. Running isolated workloads on one host required full VMs. Namespaces give each group of processes its own isolated view of a specific kernel resource without spawning a separate kernel.

PID Namespace

Use for: Isolates process ID numbers. Container processes think PID 1 is their init process. Host processes are invisible. Signal delivery is confined within the namespace.

Network Namespace

Use for: Isolates network interfaces, IP routing tables, firewall rules, and socket state. Each container gets its own eth0 with its own IP. A host veth pair bridges into the container.

Mount Namespace

Use for: Isolates the filesystem mount table. Container sees its own rootfs. The host /etc, /var, and /proc are not visible unless explicitly bind-mounted into the container.

User Namespace

Use for: Maps container UIDs/GIDs to different host UIDs. UID 0 inside maps to an unprivileged host UID — enabling rootless container runtimes where container root does not equal host root.

UTS Namespace

Use for: Isolates hostname and domain name. The container can set its own hostname (e.g., pod name) without affecting the host or other containers.

IPC Namespace

Use for: Isolates System V IPC objects (shared memory, semaphores, message queues) and POSIX message queues. Prevents cross-container shared memory attacks.

The System View: What Isolation Actually Looks Like

HOST KERNEL (one shared kernel for all containers)
+------------------------------------------------------------------+
|  Global Namespace (init_ns)                                      |
|  PID: 1(systemd) 2(kthreadd) 847(containerd) 901(kubelet)...    |
|  NET: eth0(192.168.1.10) lo docker0(172.17.0.1)                 |
|                                                                  |
|  Container A Namespaces         Container B Namespaces           |
|  +-----------------------------+ +-----------------------------+ |
|  | PID_NS: 1(nginx) 7(sh)     | | PID_NS: 1(redis) 4(sh)     | |
|  | NET_NS: eth0(10.244.0.5)   | | NET_NS: eth0(10.244.0.6)   | |
|  | MNT_NS: /etc/nginx rootfs  | | MNT_NS: /etc/redis rootfs  | |
|  | UTS_NS: hostname=pod-a     | | UTS_NS: hostname=pod-b     | |
|  | USER_NS: uid0->uid65534    | | USER_NS: uid0->uid65535    | |
|  +-----------------------------+ +-----------------------------+ |
|       veth0 <---> cni0 bridge <---> veth1                       |
+------------------------------------------------------------------+
NOTE: hostPID:true removes the PID_NS boundary entirely.
      Container process sees ALL 901 host processes.

Each container has its own namespace for each resource type. The host kernel is shared — namespaces only restrict what each container can see, not kernel execution itself.

Container isolation: misconception vs reality

Situation
Before
After

A container is started with docker run or as a Kubernetes pod

“The container is isolated like a VM — it has its own kernel, its own process table, and is completely separated from the host and other containers.”

“The container is a process group running on the HOST kernel with a restricted view of specific kernel resources. One kernel, many namespace views. Anything the kernel exposes that is not namespace-aware is shared by all containers.”

A pod spec includes hostPID: true

“The pod gets some extra visibility but is still isolated. It can see more processes but is still safely sandboxed.”

“hostPID:true completely removes PID namespace isolation. The container can see, signal, and ptrace every process on the node including kubelet and containerd. This is effectively host access for process operations.”

How It Actually Works: Namespace Creation Internals

From container start to isolated process

→

01

1. Runtime calls clone() with namespace flags — containerd calls clone(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC) to create a new process that is the first member of fresh namespaces for each requested type.

→

02

2. New namespaces start empty (or forked from parent) — the new PID namespace starts empty, the cloned process becomes PID 1 inside it. The new NET namespace starts with only loopback. MNT namespace is a copy of the parent's mount table, then the runtime unmounts host mounts and mounts the container rootfs.

→

03

3. veth pair bridges network namespace to host — the runtime creates a virtual ethernet pair: one end (eth0) placed inside the container's NET namespace, the other (vethXXX) stays on the host and bridges to cni0. This is how pod traffic flows to other pods and the outside world.

→

04

4. /proc/<pid>/ns/ files pin the namespace alive — while the process runs, its namespaces are pinned by files under /proc/<pid>/ns/<type>. Even if all processes in a namespace exit, the namespace stays alive if another process holds an open fd to its ns file. This is how Kubernetes pause containers work — holding namespaces open while app containers restart.

05

5. setns() enters an existing namespace — any process can enter an existing namespace by opening /proc/<pid>/ns/<type> and calling setns(fd). This is how kubectl exec works — it enters the container's namespaces to provide a shell inside the container's view of the world.

1

1. Runtime calls clone() with namespace flags — containerd calls clone(CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC) to create a new process that is the first member of fresh namespaces for each requested type.

2

2. New namespaces start empty (or forked from parent) — the new PID namespace starts empty, the cloned process becomes PID 1 inside it. The new NET namespace starts with only loopback. MNT namespace is a copy of the parent's mount table, then the runtime unmounts host mounts and mounts the container rootfs.

3

3. veth pair bridges network namespace to host — the runtime creates a virtual ethernet pair: one end (eth0) placed inside the container's NET namespace, the other (vethXXX) stays on the host and bridges to cni0. This is how pod traffic flows to other pods and the outside world.

4

4. /proc/<pid>/ns/ files pin the namespace alive — while the process runs, its namespaces are pinned by files under /proc/<pid>/ns/<type>. Even if all processes in a namespace exit, the namespace stays alive if another process holds an open fd to its ns file. This is how Kubernetes pause containers work — holding namespaces open while app containers restart.

5

5. setns() enters an existing namespace — any process can enter an existing namespace by opening /proc/<pid>/ns/<type> and calling setns(fd). This is how kubectl exec works — it enters the container's namespaces to provide a shell inside the container's view of the world.

inspect-namespaces.sh
1# Get the PID of a running container on a node (requires node access)
2$ crictl inspect <container-id> | grep '"pid"'
3 "pid": 12345
4
5# List namespace files -- inode number identifies the namespace
6$ ls -la /proc/12345/ns/
7lrwxrwxrwx /proc/12345/ns/ipc -> ipc:[4026532456]
8lrwxrwxrwx /proc/12345/ns/mnt -> mnt:[4026532457]
The inode number in brackets uniquely identifies the namespace. Two processes sharing the same inode are in the same namespace.
9lrwxrwxrwx /proc/12345/ns/net -> net:[4026532458]
10lrwxrwxrwx /proc/12345/ns/pid -> pid:[4026532459]
11lrwxrwxrwx /proc/12345/ns/uts -> uts:[4026532460]
12lrwxrwxrwx /proc/12345/ns/user -> user:[4026531837]
13# user inode matches host? Container is NOT using user namespaces.
If the user namespace inode matches the host init process, user namespaces are NOT active -- UID 0 in container equals UID 0 on host.
14# UID 0 inside = UID 0 on host for file operations.
15
16# Enter the container's network namespace from the host
nsenter lets you enter a container namespace from the host. kubectl exec uses the same mechanism.
17$ nsenter --target 12345 --net -- ip addr
181: lo: <LOOPBACK,UP>
192: eth0@if45: <BROADCAST,UP> inet 10.244.0.5/24
20
21# Check if two containers share PID namespace (dangerous!)
22$ stat --format="%i" /proc/12345/ns/pid /proc/67890/ns/pid
23# Same inode number = shared PID namespace -- they see each other's processes

What Breaks in Production: Blast Radius

Blast radius when namespace isolation is removed or bypassed

  • hostPID:true — Container can ptrace, kill, or read /proc/<pid>/mem of every process on the node including kubelet and containerd
  • hostNetwork:true — Container bypasses network namespace entirely — can sniff all unencrypted node traffic and bypass all NetworkPolicy rules
  • hostIPC:true — Container can read shared memory segments from other containers and host processes — IPC-based credential theft
  • privileged:true — Disables seccomp and AppArmor, grants 40+ Linux capabilities including CAP_SYS_ADMIN, mounts /dev — treat as full host access
  • Volume mount of /proc or /sys — Gives container write access to host kernel state — can modify kernel parameters, trigger crashes, or read sensitive host data
  • User namespace not enabled — UID 0 inside container equals UID 0 on host for most file operations and kernel calls — running as root in container is running as root on host

Debugging pod with full host namespace access

Bug
# DO NOT use in production -- grants host-level access
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
spec:
  hostPID: true        # sees ALL host processes
  hostNetwork: true    # bypasses all NetworkPolicy
  hostIPC: true        # reads host shared memory
  containers:
  - name: debug
    image: busybox
    securityContext:
      privileged: true # disables seccomp, AppArmor, grants CAP_SYS_ADMIN
Fix
# Safe debugging pod -- isolated to its own namespaces
apiVersion: v1
kind: Pod
metadata:
  name: debug-pod
spec:
  # No hostPID, hostNetwork, or hostIPC
  containers:
  - name: debug
    image: busybox
    securityContext:
      runAsNonRoot: true
      runAsUser: 1000
      allowPrivilegeEscalation: false
      capabilities:
        drop: ["ALL"]
      readOnlyRootFilesystem: true

The wrong version combines 4 host namespace flags + privileged mode -- this gives the pod visibility into the host kernel equivalent to a root shell. The correct version drops all capabilities and refuses privilege escalation. Even if the container image is compromised, it cannot reach the host kernel in any meaningful way.

Decision Guide: Which Namespace Permissions Are Safe?

Does the pod need to see host processes (profiling, node-level monitoring)?
YesUse hostPID:true ONLY on dedicated monitoring DaemonSets with PodSecurity Restricted enforcement. Never on application workloads.
NoDo not set hostPID -- pods default to their own PID namespace.
Does the pod need to bind to ports below 1024 or access host network interfaces?
YesUse hostNetwork:true ONLY for network-level infrastructure (kube-proxy, Cilium, Calico). Consider NET_BIND_SERVICE capability instead for port binding only.
NoDo not set hostNetwork -- pods get their own network namespace via CNI.
Is the container image from a trusted, continuously scanned registry?
YesApply defense-in-depth: runAsNonRoot, drop ALL capabilities, readOnlyRootFilesystem:true, seccomp:RuntimeDefault.
NoBlock the image via OPA/Kyverno admission controller until it passes image scanning policy.

Cost and Complexity: Namespace Permission Trade-offs

PermissionLegitimate Use CaseAttack Surface AddedRisk Level
hostPID:trueNode-level profiling (perf, strace)ptrace/kill ANY process including kubeletCritical -- treat as host access
hostNetwork:trueCNI plugins, ingress controllers needing host portsSniff all unencrypted traffic, bypass NetworkPolicyHigh -- limit to infra DaemonSets
privileged:trueKernel module loading, device driversDisables all MAC (AppArmor/SELinux/seccomp), grants /dev accessCritical -- avoid entirely
User namespace enabledRootless container runtimesMinimal -- UID 0 maps to unprivileged host UIDLow -- this is the target default
runAsUser:0, no user NSLegacy applications expecting rootUID 0 inside = UID 0 on host for file operationsHigh -- enforce runAsNonRoot in policy

Exam Answer vs. Production Reality

1 / 2

What namespaces actually isolate

📖 What the exam expects

Linux provides 6 namespace types: PID (process trees), Network (interfaces, routing), Mount (filesystem view), UTS (hostname), IPC (shared memory), User (UID/GID mappings). Created via clone() with CLONE_NEW* flags. Visible at /proc/<pid>/ns/.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Asked in Kubernetes security interviews, CKS exam, and senior platform engineering roles as "what is the difference between a container and a VM" or "what does privileged:true actually do".

Common questions:

  • What is the difference between a Linux namespace and a cgroup?
  • What does hostPID:true in a Pod spec actually enable?
  • How does a rootless container runtime use user namespaces?
  • Why is running a container as root dangerous even with namespace isolation?
  • What is a namespace escape and how do you prevent them at the platform level?

Strong answer: Mentioning /proc/<pid>/ns/ and the ability to inspect or enter namespaces. Knowing the nsenter command. Understanding that user namespace UID 0 maps to unprivileged host UID. Discussing CVE-2019-5736 as a namespace escape vector.

Red flags: Saying "containers are isolated from the host" without qualification. Believing privileged:true only adds capabilities without understanding it disables seccomp and AppArmor. Not knowing what hostPID:true enables.

Related concepts

Explore topics that connect to this one.

  • Linux cgroups: Resource Governance for Every Container
  • Container Runtimes & OCI: The Layer That Actually Runs Your Containers
  • Containers: Linux Kernel Foundations

Suggested next

Often learned after this topic.

Linux cgroups: Resource Governance for Every Container

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.

Sign in to track your progress and mark lessons complete.

Continue learning

Linux cgroups: Resource Governance for Every Container

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.