What are common Terraform anti-patterns and best practices for large teams?
Quick Answer
Anti-patterns: one giant monolithic state file for the entire org (a mistake in one team's resource can block or corrupt everyone's plan, and applies get slow); hardcoded values instead of variables/data sources; unpinned provider/module versions causing surprise breakage; secrets committed to `.tfvars`; manual console changes alongside Terraform-managed resources, causing drift. Best practices: split state per environment/service to limit blast radius, pin all versions, enforce `fmt`/`validate`/`plan` in CI before merge, use remote state with locking, keep modules small and composable, and require PR review for every `apply`-triggering change — treat infrastructure changes with the same rigor as application code.
Detailed Answer
Interviewers often ask this to see whether you've actually operated Terraform at scale versus only used it on a solo project — the failure modes here are specific and recurring.
Common anti-patterns
1. One monolithic state file for the entire organization.
# Anti-pattern: every team's resources in one root module / one state file
resource "aws_vpc" "shared" { ... }
resource "aws_eks_cluster" "team_a" { ... }
resource "aws_rds_cluster" "team_b" { ... }
# ...hundreds more, all sharing one terraform.tfstate
Every team's resources live in a single apply, so a mistake anywhere blocks (or corrupts) everyone, applies get slower as the resource count grows, and the blast radius of any single change is enormous.
2. Hardcoded values instead of variables/data sources.
# Anti-pattern
resource "aws_instance" "web" {
ami = "ami-0abcdef1234567890" # only valid in one region, one account
}
# Better
resource "aws_instance" "web" {
ami = data.aws_ami.latest.id # resolved per-environment via a data source
}
Account IDs, AMI IDs, and CIDR ranges baked directly into resource blocks make the same configuration impossible to reuse across environments and force copy-paste-and-edit instead of parameterization.
3. Unpinned provider/module versions.
# Anti-pattern: no ref, no version — silently tracks whatever is newest
module "vpc" {
source = "git::https://github.com/my-org/modules.git//vpc"
}
# Better
module "vpc" {
source = "git::https://github.com/my-org/modules.git//vpc?ref=v2.3.0"
}
A bare source with no ?ref=, or no version constraint on a provider, means the next terraform init -upgrade can silently pull in breaking changes.
4. Secrets committed to .tfvars or hardcoded in .tf files. Permanently exposes credentials in git history — see the secrets-management question for the fix (pull from a secrets manager or inject via TF_VAR_* in CI).
5. Manual console changes alongside Terraform-managed resources. Causes drift that erodes trust in plan output over time (see the drift-detection question).
6. No plan review step — applying directly from a local machine without anyone else seeing the diff first.
Best practices for large teams
- Split state by environment and by service/domain, not one file per org — this limits blast radius and lets teams operate independently:
environments/ prod/ network/ # own state compute/ # own state data/ # own state - Pin every version — providers, modules, and the Terraform CLI itself:
terraform { required_version = ">= 1.7.0, < 2.0.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 5.0" } } } - Enforce
fmt,validate, linting (tflint/tfsec/checkov), andplanreview in CI before any merge that would trigger anapply. - Use a remote backend with locking, always, even for small teams — the moment more than one person touches a configuration, local state is a liability.
- Keep modules small, composable, and independently versioned, with a clear, minimal interface (variables in, outputs out).
- Require PR review for every change that can trigger
apply, with mandatory approval gates for production, mirroring the rigor applied to application code. - Restrict console/manual access to Terraform-managed resources so drift can't creep in silently.
Interview-ready summary
Nearly every anti-pattern above boils down to treating infrastructure code with less rigor than application code — the fix is almost always "apply the same engineering discipline (review, versioning, testing, isolation) that you'd already insist on for a application codebase."