What are common Terraform anti-patterns and best practices for large teams?

6 minadvancedterraformbest-practicesanti-patterns

Quick Answer

Anti-patterns: one giant monolithic state file for the entire org (a mistake in one team's resource can block or corrupt everyone's plan, and applies get slow); hardcoded values instead of variables/data sources; unpinned provider/module versions causing surprise breakage; secrets committed to `.tfvars`; manual console changes alongside Terraform-managed resources, causing drift. Best practices: split state per environment/service to limit blast radius, pin all versions, enforce `fmt`/`validate`/`plan` in CI before merge, use remote state with locking, keep modules small and composable, and require PR review for every `apply`-triggering change — treat infrastructure changes with the same rigor as application code.

Detailed Answer

Interviewers often ask this to see whether you've actually operated Terraform at scale versus only used it on a solo project — the failure modes here are specific and recurring.

Common anti-patterns

1. One monolithic state file for the entire organization.

# Anti-pattern: every team's resources in one root module / one state file
resource "aws_vpc" "shared" { ... }
resource "aws_eks_cluster" "team_a" { ... }
resource "aws_rds_cluster" "team_b" { ... }
# ...hundreds more, all sharing one terraform.tfstate

Every team's resources live in a single apply, so a mistake anywhere blocks (or corrupts) everyone, applies get slower as the resource count grows, and the blast radius of any single change is enormous.

2. Hardcoded values instead of variables/data sources.

# Anti-pattern
resource "aws_instance" "web" {
  ami = "ami-0abcdef1234567890"   # only valid in one region, one account
}

# Better
resource "aws_instance" "web" {
  ami = data.aws_ami.latest.id     # resolved per-environment via a data source
}

Account IDs, AMI IDs, and CIDR ranges baked directly into resource blocks make the same configuration impossible to reuse across environments and force copy-paste-and-edit instead of parameterization.

3. Unpinned provider/module versions.

# Anti-pattern: no ref, no version — silently tracks whatever is newest
module "vpc" {
  source = "git::https://github.com/my-org/modules.git//vpc"
}

# Better
module "vpc" {
  source  = "git::https://github.com/my-org/modules.git//vpc?ref=v2.3.0"
}

A bare source with no ?ref=, or no version constraint on a provider, means the next terraform init -upgrade can silently pull in breaking changes.

4. Secrets committed to .tfvars or hardcoded in .tf files. Permanently exposes credentials in git history — see the secrets-management question for the fix (pull from a secrets manager or inject via TF_VAR_* in CI).

5. Manual console changes alongside Terraform-managed resources. Causes drift that erodes trust in plan output over time (see the drift-detection question).

6. No plan review step — applying directly from a local machine without anyone else seeing the diff first.

Best practices for large teams

  • Split state by environment and by service/domain, not one file per org — this limits blast radius and lets teams operate independently:
    environments/
      prod/
        network/    # own state
        compute/    # own state
        data/       # own state
    
  • Pin every version — providers, modules, and the Terraform CLI itself:
    terraform {
      required_version = ">= 1.7.0, < 2.0.0"
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 5.0"
        }
      }
    }
    
  • Enforce fmt, validate, linting (tflint/tfsec/checkov), and plan review in CI before any merge that would trigger an apply.
  • Use a remote backend with locking, always, even for small teams — the moment more than one person touches a configuration, local state is a liability.
  • Keep modules small, composable, and independently versioned, with a clear, minimal interface (variables in, outputs out).
  • Require PR review for every change that can trigger apply, with mandatory approval gates for production, mirroring the rigor applied to application code.
  • Restrict console/manual access to Terraform-managed resources so drift can't creep in silently.

Interview-ready summary

Nearly every anti-pattern above boils down to treating infrastructure code with less rigor than application code — the fix is almost always "apply the same engineering discipline (review, versioning, testing, isolation) that you'd already insist on for a application codebase."

Related Resources