Hardening infrastructure and application config — using IaC, policy as code, and continuous compliance — so default and drifted settings never create exploitable risk.
Hardening infrastructure and application config — using IaC, policy as code, and continuous compliance — so default and drifted settings never create exploitable risk.
Lesson outline
Every year, the OWASP Top 10 includes "Security Misconfiguration" as a top cause of breaches — consistently in the top 5. Verizon's Data Breach Investigations Report finds that misconfiguration is involved in a majority of cloud breaches. And yet most development teams invest heavily in code security (SAST, code review, secure coding) while their infrastructure configuration is managed by whoever has console access and a deadline.
Misconfiguration facts
Gartner: through 2025, 99% of cloud security failures will be the customer's fault — almost all misconfiguration, not platform vulnerabilities. Capital One: $80M fine, 100M+ records. Twitch: 125GB leak. Facebook: 540M records in a public MongoDB. None of these required a sophisticated zero-day exploit — they required finding a misconfigured service.
Configuration security is not just about one setting — it is about every layer of your stack: cloud IAM and network rules, container security context and network policies, application environment variables, and infrastructure-as-code definitions. Each layer has its own failure modes. Use the explorer below to see the most common misconfigurations at each layer and how to prevent them.
Select a layer to explore real misconfiguration risks and how to fix them
Common misconfigurations — click to explore
⚠ Real-world example
Capital One breach (2019): misconfigured WAF role allowed SSRF to access S3 buckets containing 100M customer records.
✅ How to fix it
Enable S3 Block Public Access at account level. Enforce via aws_s3_bucket_public_access_block in Terraform + OPA policy.
The three defenses
Secure configuration at scale requires three mutually reinforcing controls:
The three-pillar model
terraform apply — blocking non-compliant config before it existsThe console access question
Should engineers have console access to production? The answer should be "emergency only, with audit trail, with a process to revert changes to IaC within 24 hours." Console access for routine work is the primary cause of configuration drift. Session Manager + IaC-only changes + break-glass procedures for emergencies is the mature posture.
The Center for Internet Security (CIS) publishes hardening benchmarks for cloud platforms, operating systems, containers, and applications. These are consensus standards developed with hundreds of security experts and represent the minimum security baseline for each technology.
| CIS Benchmark | Coverage | Automated scanner |
|---|---|---|
| CIS AWS Foundations | IAM, S3, CloudTrail, VPC, KMS, monitoring | Prowler, AWS Security Hub |
| CIS Kubernetes | API server, etcd, kubelet, network policies, RBAC | kube-bench |
| CIS Docker | Host config, daemon config, container runtime, images | Docker Bench for Security |
| CIS Linux (Ubuntu/RHEL) | Filesystem, services, network, logging, access | CIS-CAT, Lynis |
| CIS GCP | IAM, cloud storage, logging, networking, SQL | Forseti, Cloud Security Command Center |
| CIS Azure | IAM, storage, database, networking, monitoring | Azure Security Center, Prowler |
Run these benchmarks against your environments on initial setup and continuously thereafter. They provide a prioritized, evidence-based starting point for hardening — much better than starting from scratch.
Start with CIS Level 1 — it is the 80/20
CIS benchmarks have Level 1 (essential, low-operational-impact) and Level 2 (comprehensive, may affect functionality). Level 1 covers 80% of common misconfiguration risk with minimal operational disruption. Implement Level 1 first for all environments. Level 2 for environments handling sensitive data (PCI, HIPAA scope).
1#!/bin/bash2# Run Prowler CIS AWS Foundations benchmark against your AWS account3# Requires: AWS credentials with SecurityAudit + ViewOnlyAccess permissions4# Install: pip install prowler56# Run full CIS Level 1 check7prowler aws \8--compliance cis_level1_aws \9--output-formats html json csv \10--output-directory ./security-reports/prowler \11--severity critical high1213# Run specific CIS check group (e.g., IAM only)14prowler aws \15--compliance cis_level1_aws \16--group iam \17--output-formats json1819# Run against multiple accounts (with assume-role)20for ACCOUNT_ID in 123456789 987654321 456789123; do21prowler aws \22--role arn:aws:iam::${ACCOUNT_ID}:role/ProwlerAuditRole \23--compliance cis_level1_aws \24--output-formats json \25--output-directory ./security-reports/${ACCOUNT_ID}26done2728# Schedule in CI (weekly baseline report):29# 0 0 * * 1 /scripts/run-prowler.sh > /var/log/prowler-$(date +%Y%m%d).log
In Kubernetes, configuration security is enforced at admission time — when a resource is created or updated. Admission controllers (OPA Gatekeeper, Kyverno) evaluate every manifest against policy before it is applied to the cluster.
Essential Kubernetes security configuration checks
1# Kyverno policy: enforce container security best practices2# Applied at admission time — pods violating policy are rejected3apiVersion: kyverno.io/v14kind: ClusterPolicy5metadata:6name: restrict-pod-security7annotations:8policies.kyverno.io/title: Restrict Pod Security9policies.kyverno.io/severity: high10policies.kyverno.io/description: >-11Enforce security context requirements for all pods.12Pods running as root or privileged are rejected.13spec:14validationFailureAction: Enforce # Reject non-compliant pods (use Audit to dry-run first)15background: true16rules:17- name: restrict-privileged18match:19any:20- resources:21kinds: [Pod]22validate:23message: "Privileged containers are not allowed."24pattern:25spec:26containers:27- =(securityContext):28=(privileged): false2930- name: require-non-root31match:32any:33- resources:34kinds: [Pod]35validate:36message: "Containers must not run as root. Set runAsNonRoot: true."37pattern:38spec:39securityContext:40runAsNonRoot: true41containers:42- =(securityContext):43=(runAsUser): ">0"4445- name: require-read-only-root46match:47any:48- resources:49kinds: [Pod]50validate:51message: "Root filesystem must be read-only."52pattern:53spec:54containers:55- securityContext:56readOnlyRootFilesystem: true
Even with perfect IaC and policy scanning, configuration drift happens: emergency console changes, third-party tools that modify config, or AWS service updates that change default behaviors. Continuous compliance is the detection layer.
Continuous compliance architecture
01
Define policy rules as code (AWS Config Rules, Cloud Custodian policies, or OPA policies)
02
Schedule evaluation: AWS Config evaluates every resource change in real-time; Cloud Custodian can run on a schedule or event-driven
03
Alert on violations: high-severity findings go to PagerDuty or Slack security channel immediately; medium/low create JIRA tickets
04
Auto-remediate safe fixes: AWS Config remediation actions or Cloud Custodian actions can auto-fix deterministic violations (re-enable S3 block public access, remove public EC2 IP) — but only for changes with no service impact
05
Track as security debt: medium/low findings that cannot be auto-remediated go into the security backlog with owner and SLA
06
Report for compliance: generate compliance reports from AWS Config or Cloud Custodian output for SOC 2, HIPAA, or internal audits
Define policy rules as code (AWS Config Rules, Cloud Custodian policies, or OPA policies)
Schedule evaluation: AWS Config evaluates every resource change in real-time; Cloud Custodian can run on a schedule or event-driven
Alert on violations: high-severity findings go to PagerDuty or Slack security channel immediately; medium/low create JIRA tickets
Auto-remediate safe fixes: AWS Config remediation actions or Cloud Custodian actions can auto-fix deterministic violations (re-enable S3 block public access, remove public EC2 IP) — but only for changes with no service impact
Track as security debt: medium/low findings that cannot be auto-remediated go into the security backlog with owner and SLA
Report for compliance: generate compliance reports from AWS Config or Cloud Custodian output for SOC 2, HIPAA, or internal audits
AWS Config vs Cloud Custodian: when to use each
AWS Config excels at AWS-native resource evaluation with managed rules and native remediation. Cloud Custodian is more flexible — supports multi-cloud (AWS, GCP, Azure), has richer action capabilities (email, Slack, tag, stop, quarantine), and can be version-controlled alongside your IaC. Use AWS Config for AWS-native compliance reporting; Cloud Custodian for multi-cloud or complex remediation workflows.
Cloud Security Engineer, DevSecOps Engineer, and Platform Engineering roles. Often in "design a secure cloud account baseline" questions or "how would you enforce least privilege at scale?" system design prompts.
Common questions:
Strong answer: Mentions IMDS v2 when discussing AWS instance security. Knows that Checkov / tfsec run in CI before terraform apply. Can explain OPA Rego policy syntax conceptually. Talks about golden AMIs or base Terraform modules as a way to bake in security defaults. Mentions CIS Benchmarks without prompting.
Red flags: Thinks "secure configuration" means having a security team review configs manually. Cannot name any IaC policy scanning tool. Has no answer for how to handle configuration drift. Thinks least privilege is a code concern rather than a configuration concern.
Quick check · Secure Configuration
1 / 4
Key takeaways
What is configuration drift and why is it a security risk?
Configuration drift is when the actual state of a system diverges from its intended/documented state — usually through manual changes (console edits, SSH and fix) that bypass the IaC-managed config. It's a security risk because drifted configs may introduce vulnerabilities (opening a security group, disabling encryption) that don't appear in the IaC code and may go undetected for months.
Why is "least privilege" considered a configuration property, not a code property?
Least privilege is about what a system or service is allowed to do — defined in IAM roles, security groups, RBAC, network policies — not in application code. A perfectly written application can be compromised if it runs with an over-permissive IAM role that allows it (or an attacker exploiting it) to access all S3 buckets or assume admin. Configuration defines the blast radius; code determines the attack surface.
What is the difference between Checkov/tfsec (IaC scanner) and AWS Config (continuous compliance)?
Checkov/tfsec scan Terraform or CloudFormation code in CI before resources are created — they prevent misconfigurations from ever reaching production. AWS Config evaluates live cloud resources continuously against rules — it detects misconfigurations that slipped through (console changes, API calls, drift) and alerts or auto-remediates. You need both: prevent during development, detect after deploy.
From the books
DevSecOps: A Leader's Guide to Producing Secure Software
Chapter 7: Securing the Infrastructure Layer
The book emphasizes that configuration security is a "force multiplier" — getting it right protects everything running on the infrastructure, regardless of code-level vulnerabilities. It recommends treating infrastructure configuration with the same code review standards as application code: pull request, automated scan, peer review, automated test before merge.
💡 Analogy
Configuration is the locks, windows, and alarm system of your house — and misconfiguration is accidentally leaving the back door open. The technology (the house) can be excellent, but if the lock is set to "anyone can enter," the house is not secure. The key insight: most configuration security isn't about clever technical controls — it's about systematically ensuring defaults are safe, changes are reviewed, and drift is detected and corrected quickly.
⚡ Core Idea
Secure configuration has three layers: prevent (define correct config in IaC, enforce with policy-as-code before apply), detect (continuous compliance scans find what slipped through), and remediate (alert or auto-correct within a defined SLA). Misconfiguration is the #1 cause of cloud breaches not because the tools are bad, but because manually managing configuration at scale without automation is impossible — defaults get left on, quick fixes get made and forgotten.
🎯 Why It Matters
Gartner predicts that through 2025, 99% of cloud security failures will be the customer's fault — almost all due to misconfiguration, not sophisticated attacks. Capital One ($80M fine, 100M records), Twitch (125GB source code leak), Facebook (540M records in a public MongoDB) — these weren't zero-days. They were misconfigurations that an automated policy check would have caught.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.