What are common Terraform anti-patterns and best practices for large teams?

Detailed Answer

Interviewers often ask this to see whether you've actually operated Terraform at scale versus only used it on a solo project — the failure modes here are specific and recurring.

Common anti-patterns

1. One monolithic state file for the entire organization.

# Anti-pattern: every team's resources in one root module / one state file
resource "aws_vpc" "shared" { ... }
resource "aws_eks_cluster" "team_a" { ... }
resource "aws_rds_cluster" "team_b" { ... }
# ...hundreds more, all sharing one terraform.tfstate

Every team's resources live in a single apply, so a mistake anywhere blocks (or corrupts) everyone, applies get slower as the resource count grows, and the blast radius of any single change is enormous.

2. Hardcoded values instead of variables/data sources.

# Anti-pattern
resource "aws_instance" "web" {
  ami = "ami-0abcdef1234567890"   # only valid in one region, one account
}

# Better
resource "aws_instance" "web" {
  ami = data.aws_ami.latest.id     # resolved per-environment via a data source
}

Account IDs, AMI IDs, and CIDR ranges baked directly into resource blocks make the same configuration impossible to reuse across environments and force copy-paste-and-edit instead of parameterization.

3. Unpinned provider/module versions.

# Anti-pattern: no ref, no version — silently tracks whatever is newest
module "vpc" {
  source = "git::https://github.com/my-org/modules.git//vpc"
}

# Better
module "vpc" {
  source  = "git::https://github.com/my-org/modules.git//vpc?ref=v2.3.0"
}

A bare source with no ?ref=, or no version constraint on a provider, means the next terraform init -upgrade can silently pull in breaking changes.

4. Secrets committed to .tfvars or hardcoded in .tf files. Permanently exposes credentials in git history — see the secrets-management question for the fix (pull from a secrets manager or inject via TF_VAR_* in CI).

5. Manual console changes alongside Terraform-managed resources. Causes drift that erodes trust in plan output over time (see the drift-detection question).

6. No plan review step — applying directly from a local machine without anyone else seeing the diff first.

Best practices for large teams

Split state by environment and by service/domain, not one file per org — this limits blast radius and lets teams operate independently:
```
environments/
  prod/
    network/    # own state
    compute/    # own state
    data/       # own state
```

Pin every version — providers, modules, and the Terraform CLI itself:

terraform {
  required_version = ">= 1.7.0, < 2.0.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

Enforce fmt, validate, linting (tflint/tfsec/checkov), and plan review in CI before any merge that would trigger an apply.
Use a remote backend with locking, always, even for small teams — the moment more than one person touches a configuration, local state is a liability.
Keep modules small, composable, and independently versioned, with a clear, minimal interface (variables in, outputs out).
Require PR review for every change that can trigger apply, with mandatory approval gates for production, mirroring the rigor applied to application code.
Restrict console/manual access to Terraform-managed resources so drift can't creep in silently.

Interview-ready summary

Nearly every anti-pattern above boils down to treating infrastructure code with less rigor than application code — the fix is almost always "apply the same engineering discipline (review, versioning, testing, isolation) that you'd already insist on for a application codebase."

What are common Terraform anti-patterns and best practices for large teams?

Quick Answer

Detailed Answer

Common anti-patterns

Best practices for large teams

Interview-ready summary

Related Resources