Why declarative IaC exists, how Terraform's plan/apply/state workflow operates, what the state file is and why losing it is catastrophic, Terraform vs CloudFormation vs Pulumi trade-offs, and the module and remote state patterns used in production.
Why declarative IaC exists, how Terraform's plan/apply/state workflow operates, what the state file is and why losing it is catastrophic, Terraform vs CloudFormation vs Pulumi trade-offs, and the module and remote state patterns used in production.
Lesson outline
Infrastructure as Code is the practice of defining and managing cloud resources through version-controlled configuration files rather than manual console clicks or one-off CLI commands. The core problem it solves is not laziness — it is drift, repeatability, and review.
Problems IaC solves that manual operations cannot
Declarative vs imperative IaC
Terraform, CloudFormation, and Pulumi with a declarative style describe what you want (desired state). The tool computes the diff against current state and executes only the necessary changes. Ansible and shell scripts describe how to get there (imperative). Declarative IaC is idempotent — running it twice does nothing on the second run. Imperative scripts are not: running "aws ec2 create-security-group" twice creates two groups. Declarative is strongly preferred for infrastructure definitions.
Terraform's workflow has three phases: write configuration (HCL), run terraform plan to see what will change, run terraform apply to make it so. The state file is what enables the plan to know the difference between "this resource needs to be created" and "this resource already exists and needs to be updated."
The Terraform workflow in a team environment
01
Write HCL configuration defining resources. Commit to a feature branch.
02
Open a pull request. CI runs terraform fmt (formatting) and terraform validate (syntax check).
03
CI runs terraform plan against a remote state backend. The plan output is posted as a PR comment showing exact changes.
04
Team reviews the plan. Security team checks for overly permissive IAM policies or public S3 buckets.
05
After approval, CI runs terraform apply using the reviewed plan. State file is updated automatically.
06
Monitor the apply output and confirm resources are created correctly. Tag the release.
Write HCL configuration defining resources. Commit to a feature branch.
Open a pull request. CI runs terraform fmt (formatting) and terraform validate (syntax check).
CI runs terraform plan against a remote state backend. The plan output is posted as a PR comment showing exact changes.
Team reviews the plan. Security team checks for overly permissive IAM policies or public S3 buckets.
After approval, CI runs terraform apply using the reviewed plan. State file is updated automatically.
Monitor the apply output and confirm resources are created correctly. Tag the release.
The state file is the source of truth — treat it like a production database
Terraform state (terraform.tfstate) records the mapping between your HCL resource definitions and the actual cloud resources they manage. If the state file is deleted, Terraform no longer knows which resources it manages. Running terraform apply after a state deletion can create duplicate resources, or worse — Terraform may try to create resources that already exist and fail midway, leaving the environment in an inconsistent state. The state file can also contain plaintext secrets (database passwords, API keys) as values of sensitive resource attributes. Never store state in a local file in a team environment. Never commit state to git.
Remote state backend requirements for teams
1# terraform/main.tf — example remote state backend configuration2terraform {3required_version = ">= 1.5.0"4required_providers {5aws = {6source = "hashicorp/aws"7version = "~> 5.0"8}9}1011backend "s3" {Always use remote backend in team environments — local state causes corruption and secrets leakage12bucket = "my-company-terraform-state"13key = "production/vpc/terraform.tfstate"14region = "us-east-1"DynamoDB table prevents concurrent applies from corrupting state15encrypt = true # SSE-S3 encryption16dynamodb_table = "terraform-state-locks" # DynamoDB for state locking17# Never hardcode credentials here — use IAM role or env vars18}19}2021# Example VPC resource using a module22module "vpc" {23source = "terraform-aws-modules/vpc/aws"24version = "5.1.0"2526name = "production-vpc"27cidr = "10.0.0.0/16"28azs = ["us-east-1a", "us-east-1b", "us-east-1c"]2930private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]31public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]3233enable_nat_gateway = true34single_nat_gateway = false # One NAT per AZ for resilience35enable_dns_hostnames = true36single_nat_gateway = false is more expensive but required for AZ-resilient private subnet egress37tags = {38Environment = "production"39ManagedBy = "terraform"40}41}4243# View current state without applying44# terraform state list45# terraform state show module.vpc.aws_vpc.this[0]4647# Import a resource created outside Terraform (emergency console fix)48# terraform import aws_s3_bucket.my_bucket my-existing-bucket-name4950# Detect drift between state and actual infrastructure51# terraform plan -refresh-only
The IaC ecosystem has three primary tools, each with different design philosophies, ecosystem maturity, and operational trade-offs. Knowing when to use each — and what pain points each brings — is essential knowledge for cloud engineers.
| Dimension | Terraform (HCL) | CloudFormation (YAML/JSON) | Pulumi (Python/TypeScript/Go) |
|---|---|---|---|
| Language | HCL (domain-specific) | YAML / JSON | General-purpose (Python, TS, Go, C#) |
| Multi-cloud | Yes (900+ providers) | AWS only | Yes (major providers) |
| State management | External (S3 + DynamoDB or TF Cloud) | AWS-managed (S3 internally) | Pulumi Cloud or self-managed |
| Drift detection | terraform plan -refresh-only | CloudFormation Drift Detection | pulumi refresh |
| Destroy protection | prevent_destroy lifecycle rule | DeletionPolicy: Retain | protect option on resource |
| Nested / modular | Modules (local and registry) | Nested stacks | Stacks + component resources |
| Testing | Terratest, terraform test (1.6+) | cfn-lint, taskcat | Pulumi testing SDK |
| Secrets handling | Sensitive attribute, not encrypted in state | SSM/Secrets Manager references | Encrypted secrets in state |
| Learning curve | Low (HCL is simple) | Medium (YAML is verbose) | Low for existing programmers, high for ops |
Choose Terraform for multi-cloud or when the team knows cloud infrastructure; CloudFormation when you need native AWS integrations
Terraform wins on multi-cloud and community module ecosystem. CloudFormation wins when you need native AWS integrations (StackSets for multi-account, Service Catalog, CloudFormation Hooks for policy enforcement) and AWS-managed state. Pulumi wins when your team is composed of developers who want to write real code with conditionals, loops, and unit tests rather than declarative configuration. For a team new to IaC, Terraform is the safest default due to the richest ecosystem and community.
Always use terraform plan -out to prevent plan drift between CI and apply
Running terraform plan and then terraform apply in separate CI steps can cause the apply to execute a different plan if infrastructure changed between the two steps. Use terraform plan -out=tfplan to save the exact plan, then terraform apply tfplan to execute exactly that plan. This is the correct production workflow.
As an IaC codebase grows, unstructured HCL becomes just as hard to maintain as unstructured code. Module patterns, workspace strategies, and the monorepo vs repo-per-service question significantly affect how productive your team is with Terraform over time.
IaC organisation patterns in production
Large state files are a reliability risk — split them early
A single Terraform state file managing 500 resources has two problems: (1) every plan operation must lock the entire state, serialising all IaC operations across the team; (2) a corrupted state file means 500 resources are all unmanaged simultaneously. The industry rule of thumb is no more than 100–150 resources per state file. Split by logical domain: networking state, ECS cluster state, database state, IAM state.
Cloud engineer, DevOps engineer, and platform engineer interviews universally ask about IaC. Both conceptual questions ("what is state?") and operational scenarios ("apply failed halfway — now what?") are common. Senior roles expect module design and team workflow patterns.
Common questions:
Try this question: Is there an existing IaC codebase or is this greenfield? How many engineers will be applying Terraform? Is the workload multi-cloud or AWS-only? Are there compliance requirements for audit logging of infrastructure changes?
Strong answer: Mentions S3 versioning and DynamoDB locking unprompted when describing remote state. Explains the -out flag for plan/apply separation. Talks about splitting state files by domain for blast radius reduction. Mentions terraform state list and terraform import for state recovery scenarios.
Red flags: Does not know what the state file is. Suggests storing state in git. Cannot explain what happens when apply fails partway through. Treats Terraform and Ansible as equivalent tools (they solve different problems).
Key takeaways
💡 Analogy
Infrastructure as Code is the architectural blueprint for your building. Terraform is the construction crew that reads the blueprint and builds what it specifies. The state file is the "as-built drawing" — a record of what was actually constructed, including which bolts were used, where every pipe runs, and what deviates from the original plan. Without the as-built drawing, even the original architect cannot safely renovate the building, because they do not know if what was built matches what was designed. Lose the as-built drawing and the crew treats the building as unmeasured — they cannot safely add or remove anything without risking structural damage.
⚡ Core Idea
Terraform computes the difference between desired state (HCL) and current state (state file + cloud API), then executes only the changes needed to reconcile them. The state file is the bridge between code and reality. Without it, Terraform cannot know which resources it manages.
🎯 Why It Matters
IaC is non-negotiable at any serious scale: it enables PR review for infrastructure, reproducible environments, disaster recovery, and drift detection. But the state file creates a new category of operational risk — it is a security-sensitive, business-critical file that must be stored securely, backed up, locked against concurrent access, and never deleted. Understanding the plan/apply/state workflow and its failure modes separates engineers who use Terraform as a productivity tool from those who create new categories of outage with it.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.