Kubernetes Security Hardening: Locking Down the Cluster, Not Just the Image
Scanning your images is table stakes. Real Kubernetes security happens at runtime: Pod Security Standards, dropped capabilities, default-deny networking, least-privilege RBAC, and admission control. A practical, manifest-first guide.
You scanned the image. The cluster is still wide open.
Who this is for
You can deploy to Kubernetes, you run an image scanner in CI, and you assumed that was 'security'. This article is for engineers who want to harden what happens **after** the image lands, the pod, the network, the permissions, and the cluster, without a six-month security project.
Here is the uncomfortable truth: a perfectly scanned, zero-CVE image will happily run as root, mount the host filesystem, talk to every other pod in the cluster, and read every secret in the namespace, if you let it. Image scanning answers *'is the software I shipped known-vulnerable?'* It says nothing about *'what can this pod do once it's running?'* Those are different questions, and the second one is where real-world breaches live.
We covered the supply-chain half in container image security scanning. This article is the runtime half. We will work through Pod Security Standards, Linux capabilities, NetworkPolicies, RBAC, secrets, and admission control, each with manifests you can paste into a cluster today.
Kubernetes hardening is not one switch. It is a set of independent walls, each assuming the wall in front of it has already failed.
The mental model: the 4C's of cloud-native security
The official model is the 4C's: Cloud, Cluster, Container, Code, four nested layers, each enclosing the next. The point of nesting is defense in depth: a weakness in an inner layer is contained by the layer outside it, and a strong inner layer can't save you if the outer one is open.
The walled city (perimeter, gates, guards)Cloud, your VPC, IAM, security groups, the API server's network exposure
The castle keep inside the cityCluster, RBAC, admission control, etcd encryption, the kubelet
A locked room inside the keepContainer, the pod's securityContext, dropped capabilities, read-only root
The strongbox inside the roomCode, your app: input validation, dependencies, secrets handling
Each C is a wall around the one inside it. An attacker has to breach every wall, in order, to reach your data.
Most teams over-invest in one layer and forget the rest. A locked strongbox (great code) inside a room with the door propped open (a privileged pod) inside an unguarded keep (cluster-admin handed out freely) is not secure, it just *feels* secure. Harden every C, starting from the outside.
Defense in depth, drawn out
Here is the same idea as concrete layers an attacker has to traverse. A compromised container is the assumed starting point, everything to its left is what should still stand between the attacker and your data.
Defense in depth: Cloud → Cluster → Node → Pod → Container. Each layer is an independent control; dashed lines are the controls that contain a breach of the layer below.
1
Cloud
Lock down who can reach the API server and what the cluster's node IAM role can touch. A leaked node role with broad cloud permissions is a breach amplifier.
2
Cluster
RBAC decides who can do what via the API. Admission controllers reject non-compliant pods before they ever schedule.
3
Node
The kubelet, the container runtime, and seccomp profiles constrain what a process can ask the host kernel to do.
4
Pod
securityContext + NetworkPolicy decide the pod's privileges and who it can talk to. This is your highest-leverage layer.
5
Container
Even with everything above, your code still needs to validate input and not leak secrets. The innermost wall is still a wall.
Pod Security Standards: the three baselines
Kubernetes ships three named security profiles, privileged, baseline, and restricted, collectively the Pod Security Standards (PSS). They replaced the deprecated PodSecurityPolicy. You apply them per-namespace with the built-in Pod Security Admission controller using labels, no extra install required.
Standard
Posture
Key things it blocks
privileged
No restrictions
Nothing, host mounts, privileged containers, hostNetwork all allowed. Use only for system/CNI workloads.
Everything baseline blocks PLUS: must runAsNonRoot, drop ALL capabilities, no privilege escalation, seccomp RuntimeDefault required
What each Pod Security Standard allows and blocks. 'restricted' is the target for any workload that handles real data.
yaml
# Enforce 'restricted' on a namespace. Pods that violate it are REJECTED.# 'warn' and 'audit' let you roll out gradually before flipping 'enforce'.apiVersion: v1
kind: Namespace
metadata:
name: payments
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/audit: restricted
Roll out with warn before enforce
Set `warn: restricted` first and watch which existing pods trigger warnings. Fix those, then flip `enforce`. Flipping enforce blind on a live namespace will reject your next deploy and page you at the worst time.
A hardened pod: securityContext done right
The securityContext is where you make a pod boring to attack. Four settings carry most of the weight: run as a non-root user, drop every Linux capability, make the root filesystem read-only, and pin a seccomp profile. Together they pass the restricted standard.
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: payments
spec:
replicas: 3selector:
matchLabels: { app: api }
template:
metadata:
labels: { app: api }
spec:
automountServiceAccountToken: false# don't hand the pod an API token it never usessecurityContext:
runAsNonRoot: true# refuse to start as UID 0runAsUser: 10001fsGroup: 10001seccompProfile:
type: RuntimeDefault # block dangerous syscallscontainers:
- name: api
image: registry.example.com/api:1.8.2@sha256:abc... # pin by digestports:
- containerPort: 8080securityContext:
allowPrivilegeEscalation: false# no setuid escalationreadOnlyRootFilesystem: true# filesystem is immutablecapabilities:
drop: ["ALL"] # start from zero Linux capabilitiesvolumeMounts:
- name: tmp
mountPath: /tmp # writable scratch, since root FS is read-onlyvolumes:
- name: tmp
emptyDir: {}
runAsNonRoot + runAsUser, a process that isn't root can't trivially escape to the host even if the runtime has a bug.
capabilities.drop: [ALL], Linux capabilities are root's powers, split up (NET_ADMIN, SYS_ADMIN, etc.). Almost no app needs any. Drop all, add back only what breaks.
readOnlyRootFilesystem, an attacker can't drop a binary or modify your app on disk. Mount an emptyDir for the few paths that genuinely need writes.
seccompProfile: RuntimeDefault, blocks the ~40 dangerous syscalls a normal app never makes. Free kernel-attack-surface reduction.
NetworkPolicies: default-deny, then allow-list
By default every pod can talk to every other pod, in every namespace. Flat. That means one compromised pod can scan and reach your database, your secrets manager, the metadata endpoint, everything. NetworkPolicies turn that flat network into a set of explicit allow-lists. The pattern is default-deny first, then open specific paths, the same posture we cover in zero-trust networking for beginners.
yaml
# Step 1: deny ALL ingress and egress in the namespace.# An empty podSelector selects every pod; empty policyTypes lists = deny everything.apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: payments
spec:
podSelector: {}
policyTypes: ["Ingress", "Egress"]
The single most common NetworkPolicy mistake: you default-deny egress, forget to allow UDP/TCP 53 to kube-dns, and suddenly nothing in the namespace can resolve a hostname. Every connection times out and it looks like a total outage. Allow DNS explicitly.
One more gotcha: NetworkPolicies are only enforced if your CNI plugin supports them (Calico, Cilium, and most managed CNIs do). On a CNI that ignores them, the manifests apply cleanly and do absolutely nothing, verify enforcement before you rely on it.
RBAC: least privilege, and the ServiceAccount trap
RBAC controls who can do what through the Kubernetes API. The rule is least privilege: grant the narrowest set of verbs on the narrowest set of resources in the narrowest scope. A Role is namespaced; a ClusterRole is cluster-wide. Bind them with a RoleBinding (namespace) or ClusterRoleBinding (cluster). This is the same least-privilege principle as cloud IAM from first principles, just inside the cluster.
yaml
# A ServiceAccount the 'api' pod runs as.apiVersion: v1
kind: ServiceAccount
metadata:
name: api-sa
namespace: payments
---
# Role: read-only on configmaps, in this namespace only. Nothing else.apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: api-config-reader
namespace: payments
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
---
# Bind the Role to the ServiceAccount.apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: api-config-reader-binding
namespace: payments
subjects:
- kind: ServiceAccount
name: api-sa
namespace: payments
roleRef:
kind: Role
name: api-config-reader
apiGroup: rbac.authorization.k8s.io
Overprivileged ServiceAccount tokens are the breach multiplier
Every pod gets a ServiceAccount, and by default its token is auto-mounted at /var/run/secrets. If that SA is bound to a broad role, or worse, **cluster-admin**, then any RCE in that pod hands the attacker the keys to the cluster. Two fixes: set `automountServiceAccountToken: false` on pods that don't call the API, and NEVER bind a workload SA to cluster-admin. Audit it with: `kubectl get clusterrolebindings -o wide | grep cluster-admin`.
Practice the verbs and resource scoping hands-on in the kubectl lab, and the manifest packaging side in the helm lab.
Secrets: encrypt at rest, and stop pasting them into Git
A Kubernetes Secret is not encrypted, it is base64-encoded, which is encoding, not encryption. Anyone with get secrets in the namespace, or read access to etcd, can decode it instantly. Two things make secrets actually safe.
Encryption at rest, configure the API server's EncryptionConfiguration so Secrets are encrypted in etcd (ideally with a KMS provider, so the key lives in your cloud KMS, not on disk). Most managed clusters offer this as a checkbox, turn it on.
External secret operators, keep the source of truth in a real secrets manager (Vault, AWS Secrets Manager, GCP Secret Manager) and let an operator like External Secrets Operator sync them in. Your Git repo references the secret by name; the actual value never lands in version control.
yaml
# External Secrets Operator: the cluster pulls the value from AWS Secrets Manager.# The repo only ever sees this reference, never the secret itself.apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-credentials
namespace: payments
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: ClusterSecretStore
target:
name: db-credentials # the k8s Secret it will createdata:
- secretKey: password
remoteRef:
key: prod/payments/db
property: password
Never kubectl apply a Secret from a file in Git
If a Secret manifest with a real (base64) value is committed, treat that value as permanently compromised, it lives in Git history forever. Rotate it and move to an external store. base64 fools no one.
Admission control: make the rules unbypassable
Everything above is only as good as your team's discipline, unless you enforce it at admission time. Admission controllers intercept every object on its way into the cluster and can reject or mutate it *before* it persists. Pod Security Admission covers the PSS profiles; for anything custom, the two standard tools are OPA Gatekeeper (policy as Rego) and Kyverno (policy as YAML). Kyverno tends to win on readability.
yaml
# Kyverno: reject any pod that doesn't drop ALL capabilities.# 'enforce' = block. Switch to 'audit' to report-only during rollout.apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-drop-all-capabilities
spec:
validationFailureAction: enforce
background: truerules:
- name: drop-all-caps
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "Containers must drop ALL Linux capabilities."foreach:
- list: "request.object.spec.containers"deny:
conditions:
all:
- key: "ALL"operator: AnyNotIn
value: "{{ element.securityContext.capabilities.drop || `[]` }}"
yaml
# OPA Gatekeeper: the same intent, expressed as a constraint against a# ConstraintTemplate that bundles the Rego. (Template omitted for brevity.)apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredDropCapabilities
metadata:
name: must-drop-all
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
requiredDropCapabilities: ["ALL"]
The payoff: a developer who forgets drop: [ALL] gets a clear rejection at kubectl apply, not a quiet vulnerability in production. Policy becomes a property of the cluster, not a hope about reviewers.
Common mistakes that cost hours (or breaches)
Treating image scanning as 'done'. A clean scan says nothing about runtime privileges. Scanning and hardening are two different jobs.
Leaving namespaces at the default (privileged) posture. No Pod Security label means anything goes. Label every namespace restricted or baseline.
Default-deny egress without allowing DNS. Port 53 to kube-dns, or the whole namespace goes dark and looks like a total outage.
Binding workload ServiceAccounts to cluster-admin 'to make it work'. That single shortcut turns any pod RCE into full cluster compromise.
Auto-mounting SA tokens into pods that never call the API. Free credential for an attacker. Set automountServiceAccountToken: false.
Committing base64 Secrets to Git and thinking they're hidden. base64 is encoding, not encryption. Use an external secret store.
Writing NetworkPolicies on a CNI that doesn't enforce them. They apply silently and do nothing. Verify your CNI supports them.
Setting admission policies to enforce on day one across a live cluster. Start in audit/warn, fix violations, then enforce.
Takeaways
The whole article in nine lines
Image scanning is the supply chain; this is the runtime. You need both.
Think in 4C's: Cloud, Cluster, Container, Code, nested walls, each assuming the inner one failed.
Label every namespace with a Pod Security Standard, aim for `restricted`.
Harden pods: runAsNonRoot, drop ALL capabilities, readOnlyRootFilesystem, seccomp RuntimeDefault.
Default-deny NetworkPolicies, then allow-list specific paths, and remember DNS.
RBAC least privilege; never bind workloads to cluster-admin.
Disable token auto-mount for pods that don't call the API.
Encrypt Secrets at rest and keep their source of truth in an external store, not Git.
Enforce all of it with admission control (Kyverno / Gatekeeper) so policy can't be skipped.
Where to go next
Hardening compounds: each wall you add is cheap on its own and devastating to an attacker in aggregate. Pick the highest-leverage gap in your cluster today, usually it's a missing default-deny NetworkPolicy or a workload bound to cluster-admin, and close that one this week.
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.