How to design systems with security built in — covering defense-in-depth, zero-trust architecture, threat modelling (STRIDE), security design patterns, and the principle of least privilege applied at every layer.
How to design systems with security built in — covering defense-in-depth, zero-trust architecture, threat modelling (STRIDE), security design patterns, and the principle of least privilege applied at every layer.
Security added after a system is designed is expensive, fragile, and incomplete. Most major breaches exploit architectural weaknesses — over-privileged services, flat networks, missing encryption boundaries — not just code bugs.
Systems are built with implicit trust between services, overly broad permissions, single security layers that fail catastrophically when breached, and no systematic analysis of what could go wrong.
Security is designed in from the start. Every service has minimal permissions. Network segments limit blast radius. Threat modelling identifies risks before code is written. Multiple independent security layers mean no single failure compromises the whole system.
Lesson outline
Defense-in-depth is the principle that security should be implemented in multiple independent layers. If one layer fails, others remain. It is the architectural equivalent of a bank vault: locked building → locked room → locked vault → locked safe deposit box.
[ Perimeter ] WAF, DDoS protection, CDN edge
|
[ Network ] VPC segmentation, Security Groups, Network Policies
|
[ Identity ] IAM least privilege, MFA, service accounts, zero-trust
|
[ Application ] Input validation, output encoding, SAST/DAST
|
[ Data ] Encryption at rest (AES-256), in transit (TLS 1.3)
|
[ Endpoint ] Container hardening, read-only FS, seccomp profiles
Each layer assumes the layer above it has been breached.Each layer is independent. Compromise of one layer does not compromise all others.
Design each layer to assume the layers above it have been breached
The network layer should not trust that WAF has blocked all attacks. The application layer should not trust that the network prevents all unauthorised access. The data layer should not trust that the application prevents all SQL injection. Each layer has its own controls.
Zero-trust is the architectural principle of "never trust, always verify." No user, device, or service is trusted by default — even inside the corporate network.
Zero-trust core tenets
| Traditional perimeter model | Zero-trust model |
|---|---|
| Trust the internal network | Verify every request regardless of origin |
| VPN grants broad internal access | Micro-segmented access per resource per identity |
| Flat internal network | East-west traffic encrypted + inspected (mTLS) |
| Long-lived credentials | Short-lived tokens, just-in-time access |
| Access control at network edge only | Access control at every service, every request |
Threat modelling is the systematic process of identifying security risks during design — before code is written. STRIDE is the most widely used threat modelling framework.
| Letter | Threat | Example | Mitigation |
|---|---|---|---|
| S | Spoofing — impersonating a user or service | Fake JWT token, ARP spoofing | Strong authentication, mTLS, certificate pinning |
| T | Tampering — modifying data in transit or at rest | SQL injection, MITM modification | HMAC, signing, parameterised queries, TLS |
| R | Repudiation — denying performing an action | "I never deleted that record" | Audit logs, digital signatures |
| I | Information Disclosure — exposing sensitive data | Verbose error messages, exposed S3 | Encryption, access controls, error handling |
| D | Denial of Service — making service unavailable | DDoS, resource exhaustion, ReDoS | Rate limiting, auto-scaling, input size limits |
| E | Elevation of Privilege — gaining higher access | SQL injection → admin, IDOR | Input validation, access control, least privilege |
Threat modelling process (4-step)
01
Decompose the system — draw a Data Flow Diagram: external entities, processes, data stores, trust boundaries
02
Identify threats — apply STRIDE to each element: what can be spoofed? tampered? DoS'd?
03
Rate threats — use DREAD or CVSS to prioritise: Damage × Reproducibility × Exploitability × Affected users × Discoverability
04
Mitigate — for each high-priority threat, define a mitigation and a test case that verifies it
Decompose the system — draw a Data Flow Diagram: external entities, processes, data stores, trust boundaries
Identify threats — apply STRIDE to each element: what can be spoofed? tampered? DoS'd?
Rate threats — use DREAD or CVSS to prioritise: Damage × Reproducibility × Exploitability × Affected users × Discoverability
Mitigate — for each high-priority threat, define a mitigation and a test case that verifies it
STRIDE works best before a sprint — 1 hour at design time prevents weeks of remediation
Threat modelling is most effective when applied to a new feature's design before implementation. A 1-hour whiteboard session before a sprint produces security requirements. The same exercise after delivery costs 10x as much to remediate.
Least privilege means every entity has only the minimum access required to perform its function.
| Layer | Least Privilege Application | Anti-pattern |
|---|---|---|
| IAM / Cloud | Service roles with only specific API calls on specific resources | AdministratorAccess on every service |
| Database | App user has SELECT, INSERT, UPDATE — no DROP, CREATE, pg_dump | App connects as the database superuser |
| Kubernetes RBAC | Pod ServiceAccount with only needed verbs and resources | Default ServiceAccount with ClusterAdmin |
| Network | Security Groups allow only required ports from specific sources | SG rule: 0.0.0.0/0 on all ports |
| OS / Container | Run as non-root, read-only filesystem, dropped capabilities | Run as root, privileged container |
| Human access | Just-in-time access with approval workflow, time-limited | Permanent admin access for all engineers |
Over-permissioned service accounts are how lateral movement works
The Capital One attacker compromised an EC2 instance with an IAM role that had GetObject on all S3 buckets. A properly scoped role would have limited the blast radius to just that service's data.
1# Least-privilege IAM role for a read-only API service (Terraform)2resource "aws_iam_role_policy" "api_minimal" {3name = "api-service-minimal"4role = aws_iam_role.api_service.id5policy = jsonencode({6Version = "2012-10-17"7Statement = [8{9Effect = "Allow"10Action = ["s3:GetObject", "s3:ListBucket"]11Resource = [12"arn:aws:s3:::myapp-assets-prod",Scope to specific bucket ARN — never *13"arn:aws:s3:::myapp-assets-prod/*"14# NOT "arn:aws:s3:::*" — scope to the specific bucket15]16},17{18Effect = "Allow"19Action = ["secretsmanager:GetSecretValue"]Scope to specific secret path prefix20Resource = "arn:aws:secretsmanager:us-east-1:123456:secret:prod/myapp/*"21# NOT all secrets22}23]24})25}2627# Kubernetes — minimal RBAC for a service that only reads ConfigMaps28apiVersion: rbac.authorization.k8s.io/v129kind: Role30metadata:31name: configmap-reader32namespace: production33rules:resourceNames scopes to a single ConfigMap34- apiGroups: [""]35resources: ["configmaps"]36verbs: ["get", "list"] # NOT create/update/delete37resourceNames: ["myapp-config"] # Only this specific ConfigMap
Several well-established patterns address recurring security design challenges.
Key security design patterns
Security architecture appears in staff/principal engineer interviews, security architect roles, and system design rounds for security-sensitive products. Expect whiteboard exercises applying STRIDE to a given architecture.
Common questions:
Strong answer: Can name all 6 STRIDE threats, explains zero-trust as identity-centric not network-centric, and describes blast radius containment via least-privilege IAM and network segmentation.
Red flags: Thinking a firewall alone is sufficient security, not knowing what threat modelling is, or conflating authentication with authorisation at the architecture level.
Quick check · Security Architecture
1 / 3
Key takeaways
Apply STRIDE to a payment processing service. Give one threat per letter.
S (Spoofing): Attacker forges a payment request using a stolen API key. T (Tampering): MITM attack modifies the payment amount in transit. R (Repudiation): Merchant denies initiating a refund — need audit log with signature. I (Information Disclosure): Response includes full card number — should return only last 4 digits. D (Denial of Service): Flood the payment endpoint to block legitimate transactions. E (Elevation of Privilege): A read-only API token triggers a refund — missing function-level authorisation check.
What is the difference between a flat network and a micro-segmented network in terms of security blast radius?
In a flat network, every service can reach every other service — if one is compromised, the attacker has network access to all databases and APIs. In a micro-segmented network, each service can only reach the services it legitimately needs (defined by NetworkPolicies, Security Groups, or service mesh). A compromise is contained to that service's segment — the attacker cannot pivot to a database the compromised service has no business reason to access.
From the books
Threat Modeling: Designing for Security (Adam Shostack, Wiley)
Chapter 3: STRIDE per Element; Chapter 11: Threat Modelling in Practice
The definitive book on threat modelling by the creator of the SDL threat modelling tool at Microsoft. Covers STRIDE in depth with practical worked examples.
Zero Trust Networks (Evan Gilman & Doug Barth, O'Reilly)
Practical guide to designing and implementing zero-trust architecture with real-world case studies.
💡 Analogy
Security architecture is like town planning, not just building locks
⚡ Core Idea
A lock on a door is a control. Security architecture is the decision to put the bank vault in the basement, behind a lobby, with a guard, cameras, silent alarm, and police monitoring. Each layer slows down, detects, and contains — not just stops. No single lock is trusted to stop everything.
🎯 Why It Matters
Individual security controls can all be bypassed. Architectural security means bypassing one control does not give an attacker everything. The Capital One breach happened because multiple layers of weakness (over-privileged IAM + reachable IMDS + no anomaly detection) aligned simultaneously.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.