Ansible, Chef, Puppet, and SaltStack enforce desired state across fleets of servers — eliminating configuration drift and replacing error-prone manual SSH sessions with repeatable, auditable automation.
Ansible, Chef, Puppet, and SaltStack enforce desired state across fleets of servers — eliminating configuration drift and replacing error-prone manual SSH sessions with repeatable, auditable automation.
Lesson outline
Before configuration management tools existed, engineers maintained servers by SSHing in and running commands manually. Every engineer had slightly different habits. Every server slowly diverged. This is called configuration drift — and it is the silent killer of production reliability.
Desired State vs Imperative Commands
Imperative: "run these commands in this order." Declarative/desired state: "this is how the machine should look." Configuration management tools take the declarative approach — you describe the end state, and the tool figures out how to get there. If the state is already correct, the tool does nothing.
The core concept is idempotency: running a playbook or recipe 1 time or 100 times produces the same result. This means you can safely re-run configuration on every server, on a schedule, without fear of breaking things that are already correct.
What configuration management tools manage
The Golden Rule of Config Management
No human should ever SSH into a production server and make a change without capturing that change in configuration management. "Just this once" exceptions compound into drift. If you made a change manually, your next step is to encode it in the playbook.
Configuration management tools fall into two architectural camps: push (controller pushes changes to nodes) and pull (nodes pull their config from a central server). Each has trade-offs that affect how you operate at scale.
| Model | How It Works | Tools | Pros | Cons |
|---|---|---|---|---|
| Push | Central controller SSHs into nodes and applies config on demand | Ansible | Agentless (no software on nodes), simple to start, great for ad-hoc runs | Controller must reach all nodes; slower at scale; no continuous drift correction |
| Pull | Agent on each node polls central server every N minutes and applies its config | Chef, Puppet, SaltStack (pull mode) | Scales well, continuous drift correction, nodes self-heal | Agents must be installed and maintained; more infrastructure to run |
In practice: Ansible dominates for its simplicity and agentless design — great for teams starting out or managing heterogeneous environments. Chef and Puppet are preferred at large enterprises (thousands of nodes) where continuous pull-based convergence and enterprise support matter. SaltStack offers both push and pull with very high performance.
Ansible Playbook Example
--- # nginx.yml - hosts: webservers become: true tasks: - name: Install nginx apt: name: nginx state: present - name: Deploy config template: src: nginx.conf.j2 dest: /etc/nginx/nginx.conf notify: Restart nginx handlers: - name: Restart nginx service: name=nginx state=restarted Running: ansible-playbook nginx.yml -i inventory This is idempotent: running it 10 times has the same effect as running it once.
Configuration drift happens when the actual state of a server diverges from its intended state. It starts small — a developer adds a debug flag, an on-call engineer installs a package, a failed deploy leaves a half-configured file. Over time, every server becomes a unique snowflake.
Signs You Have a Drift Problem
"Works on my server but not theirs." Intermittent failures that only affect some instances. "Do not restart that server — something will break." Engineers afraid to scale because new instances behave differently. These are all symptoms of unmanaged configuration drift.
Configuration management solves drift through continuous convergence: run your tool on a schedule (every 30 minutes with Puppet/Chef, or via cron with Ansible) and any drift gets corrected automatically. Some tools also have drift detection modes that report divergence without changing anything — useful for compliance audits.
Config management vs containers vs IaC — when to use each
Beyond simple playbooks, production Ansible use involves three key concepts: inventory (which servers exist), roles (reusable bundles of tasks), and Vault (encrypted secrets).
01
Inventory: Define your servers in static files or dynamic inventory scripts (AWS, GCP plugins). Group them: [webservers], [databases], [workers]. Playbooks target groups.
02
Roles: Organize tasks into reusable roles: roles/nginx/tasks/main.yml, roles/nginx/templates/nginx.conf.j2. Roles are composable — a playbook applies multiple roles to a host.
03
Variables: Use group_vars/ and host_vars/ to customize behavior per environment. dev has debug: true, prod has debug: false. Same role, different behavior.
04
Vault: Encrypt secrets with ansible-vault encrypt_string. Store encrypted values in vars files. Run with --ask-vault-pass or --vault-password-file. Secrets in version control, safely encrypted.
05
CI integration: Run ansible-playbook --check (dry run) in CI on every PR. Run --diff to show what would change. Gate merges on successful dry runs.
Inventory: Define your servers in static files or dynamic inventory scripts (AWS, GCP plugins). Group them: [webservers], [databases], [workers]. Playbooks target groups.
Roles: Organize tasks into reusable roles: roles/nginx/tasks/main.yml, roles/nginx/templates/nginx.conf.j2. Roles are composable — a playbook applies multiple roles to a host.
Variables: Use group_vars/ and host_vars/ to customize behavior per environment. dev has debug: true, prod has debug: false. Same role, different behavior.
Vault: Encrypt secrets with ansible-vault encrypt_string. Store encrypted values in vars files. Run with --ask-vault-pass or --vault-password-file. Secrets in version control, safely encrypted.
CI integration: Run ansible-playbook --check (dry run) in CI on every PR. Run --diff to show what would change. Gate merges on successful dry runs.
Test Your Playbooks with Molecule
Molecule is the standard testing framework for Ansible roles. It spins up Docker containers or VMs, runs your role, then verifies the result. Add Molecule tests to every role: they catch broken playbooks before they reach production.
Most teams start with Ansible because it requires no agents — just Python on the managed nodes and SSH access from the controller.
01
Audit one server: Pick one production server. Document every package, file, and service that is not default. This is your baseline.
02
Write a playbook: Translate that baseline into an Ansible playbook. Run it against a fresh VM and verify the result matches the production server.
03
Run against all servers: Apply the playbook to your entire fleet. Fix any failures — these reveal existing drift.
04
Automate execution: Set up a cron job or CI pipeline that runs the playbook on a schedule. Now drift is continuously corrected.
05
Code review all changes: Mandate that all server changes go through the playbook via pull request. No more raw SSH changes.
Audit one server: Pick one production server. Document every package, file, and service that is not default. This is your baseline.
Write a playbook: Translate that baseline into an Ansible playbook. Run it against a fresh VM and verify the result matches the production server.
Run against all servers: Apply the playbook to your entire fleet. Fix any failures — these reveal existing drift.
Automate execution: Set up a cron job or CI pipeline that runs the playbook on a schedule. Now drift is continuously corrected.
Code review all changes: Mandate that all server changes go through the playbook via pull request. No more raw SSH changes.
In DevOps and SRE interviews, config management comes up in "how do you manage server fleets" questions. System design rounds may ask how you ensure consistency across hundreds of instances. Behavioral questions may probe past incidents caused by configuration drift.
Common questions:
Strong answer: Explaining idempotency clearly. Knowing push vs pull trade-offs. Mentioning Ansible Vault for secrets. Understanding that config management, IaC, and containers are complementary, not competing. Describing how continuous execution prevents drift.
Red flags: Thinking config management is obsolete because of containers (containers solve the app layer; VMs, bare metal, and OS config still need it). Confusing Ansible playbooks with shell scripts (idempotency and desired state are the key differences). Not knowing what idempotency means.
Quick check · Configuration Management
1 / 3
Key takeaways
💡 Analogy
Configuration management is like a building inspector with a checklist. Every week, the inspector visits every apartment (server) and checks: Is the smoke detector installed? Is the fire extinguisher present? Is the door lock working? If anything is missing or broken, the inspector fixes it on the spot. The inspector does not care who broke what — they just make sure every apartment matches the spec. You (the landlord) never need to manually visit apartments — you trust the inspector to keep everything consistent.
⚡ Core Idea
Describe the desired state of your servers once, in code. Run the tool to converge every server to that state. Idempotency means runs are safe to repeat. Drift is automatically corrected. No more snowflake servers.
🎯 Why It Matters
Configuration drift is insidious — it builds slowly and only becomes visible during incidents. By the time "that server" has a problem, it may have hundreds of undocumented manual changes. Configuration management makes servers boring and predictable, which is exactly what you want in production.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.