Terraform Fundamentals & HCL

Difficulty

Before Infrastructure as Code, provisioning meant clicking through a cloud console (or running one-off scripts) to create a VPC, a few servers, a load balancer, and some IAM roles. That approach — often called ClickOps — has several structural problems that show up as soon as more than one person or environment is involved:

  1. No source of truth. The only record of "how staging is configured" is whatever's currently live in the console. If someone changes a setting by hand, there's no diff, no history, and often no idea who changed it or why.
  2. Non-reproducible environments. Standing up a second identical environment (a new region, a disaster-recovery copy, a fresh dev sandbox) means manually repeating dozens of clicks and hoping nothing was missed. In practice, environments drift apart over time ("works in staging, breaks in prod").
  3. No review process. A risky change to production infrastructure — deleting a security group rule, resizing a database — happens directly, with no equivalent of a pull request or code review before it takes effect.
  4. Tribal knowledge. Institutional knowledge about why something is configured a certain way lives in people's heads or old Slack threads, not in a durable, searchable artifact.

Terraform addresses this by making infrastructure declarative and versioned:

resource "aws_instance" "web" {
  ami           = "ami-0abcdef1234567890"
  instance_type = "t3.micro"

  tags = {
    Name = "web-server"
  }
}
  • You describe the desired end state in HCL files that live in git, alongside application code.
  • terraform plan computes exactly what would change before anything happens — the equivalent of a diff/code review for infrastructure.
  • terraform apply executes that plan and records the result in a state file, so Terraform always knows what it manages and can detect drift later.
  • The same configuration can be reused across environments (dev/staging/prod) with different variable values, guaranteeing structural consistency.
  • Because it's just text, it goes through the same PR review, CI checks, and audit trail as application code.

The net effect: infrastructure changes become reviewable, repeatable, and auditable — the same guarantees you already expect from application code, applied to the servers, networks, and services that code runs on.

Related Resources

It's easy to lump Terraform in with Ansible, Chef, and Puppet as "DevOps automation tools," but they solve different layers of the problem, and understanding the boundary is a common interview probe.

Provisioning vs. configuration management

  • Terraform (provisioning) talks to cloud/platform APIs to create the resources themselves: a VPC, an EC2 instance, an RDS database, a Kubernetes cluster, a DNS record. It answers "what infrastructure exists?"
  • Ansible/Chef/Puppet (configuration management) operate inside a machine that already exists: installing packages, writing config files, starting services, managing users. They answer "what software is running on this box, and how is it configured?"

Other structural differences

TerraformAnsible
ModelDeclarative, state-trackedPrimarily imperative task lists (though idempotent by convention)
TargetCloud/platform APIsMachines (via SSH/WinRM) or APIs
StateMaintains a state file mapping config → real resourcesStateless — re-runs tasks and relies on each task being idempotent
Typical unitA resource (VM, subnet, bucket)A task (install package, template a file)

How they're used together

A common pattern: Terraform provisions the VM/cluster and passes it a minimal bootstrap (cloud-init or a startup script), then Ansible (or Chef/Puppet) takes over to configure the software stack running inside it. In containerized/Kubernetes-first shops, this split often disappears entirely — Terraform provisions the cluster and Kubernetes manifests (or Helm) replace the configuration-management layer.

resource "aws_instance" "app" {
  ami           = var.ami_id
  instance_type = "t3.medium"
  user_data     = file("bootstrap.sh")   # hands off to a config tool or just starts the app
}

The interview-ready summary: Terraform creates the machine; Ansible configures what's on it. They're complementary, not competing.

Related Resources

Terraform's day-to-day usage boils down to a five-step loop. Understanding what each step actually does (not just its name) is essential.

1. Write

You author .tf files describing resources, variables, and outputs:

resource "aws_s3_bucket" "assets" {
  bucket = "my-app-assets"
}

2. terraform init

Prepares the working directory:

  • Downloads the providers declared in required_providers (and records their exact versions/checksums in .terraform.lock.hcl).
  • Downloads any modules referenced by source.
  • Configures the backend (where state will be stored — local file, S3, Terraform Cloud, etc.).

You re-run init whenever you add a provider, add a module, or change the backend config.

3. terraform plan

  • Refreshes Terraform's view of real infrastructure (unless disabled).
  • Diffs the current state against your configuration.
  • Produces an execution plan: which resources will be created, updated in place, or destroyed and recreated — without making any changes.
Plan: 2 to add, 1 to change, 0 to destroy.

This is the "code review" step — plan output is what gets read/approved before anything happens.

4. terraform apply

  • Re-runs (or reuses, if given a saved plan file) the plan.
  • Prompts for confirmation (yes), unless run non-interactively with -auto-approve (typically only in CI after a review gate).
  • Executes the necessary provider API calls in dependency order.
  • Writes the results back into the state file.

5. terraform destroy

  • Computes a plan that destroys every resource the current configuration manages.
  • Used to tear down throwaway environments (feature branches, temporary test infra) — rarely run against production directly.

The loop in practice

In real projects this isn't a strict five-step waterfall — you cycle write → plan → apply continuously as you iterate, with init re-run only when dependencies change, and destroy reserved for cleanup. The discipline of always running plan before apply (and reading its output) is what makes Terraform changes safe and predictable.

Related Resources

Terraform's core binary knows nothing about AWS, Azure, GCP, Kubernetes, or any other platform. All of that platform-specific knowledge lives in providers — separate plugin binaries that Terraform Core talks to over a well-defined RPC protocol.

Declaring a provider

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}
  • source is the provider's registry address (<namespace>/<name>, defaulting to the public Terraform Registry).
  • version is a constraint (~> 5.0 means "5.x, but not 6.0"), preventing an unreviewed major-version upgrade from silently changing behavior.

What terraform init does with this

  1. Resolves the version constraint against what's available in the registry (or a private/mirrored registry).
  2. Downloads the matching provider binary into .terraform/providers/.
  3. Records the exact version and a checksum of the binary in .terraform.lock.hcl — this lock file gets committed to git so every teammate and every CI run installs byte-for-byte the same provider, the same way package-lock.json pins npm dependencies.

How the plugin model works at runtime

  • Each provider exposes a schema describing every resource type and data source it supports (aws_instance, aws_s3_bucket, google_compute_instance, etc.), including which arguments exist, their types, and whether changing them forces resource replacement.
  • When you run plan/apply, Terraform Core spawns the provider as a subprocess and communicates over gRPC: "here's the desired config for this resource, here's the last known state — tell me what changed and then perform the create/update/destroy."
  • The provider is what actually calls the cloud's REST API (e.g., the AWS SDK) under the hood.

Why this design matters

Because everything platform-specific is isolated behind this plugin boundary, the same Terraform language, workflow, and state model works identically whether you're managing AWS resources, a Kubernetes cluster, a Datadog monitor, or a GitHub repository — you just declare a different provider. It's also why community and third-party providers can exist for almost any API with a stable schema, without ever touching Terraform's core codebase.

Related Resources

These two block types look similar (both reference a provider's schema and expose attributes), but they mean opposite things for who owns the object's lifecycle.

resource — Terraform manages it

resource "aws_vpc" "main" {
  cidr_block = "10.0.0.0/16"
}
  • Terraform creates this VPC, tracks it in state, and will update or destroy it if the configuration changes or the block is removed.
  • It appears in terraform plan as something Terraform will act on.

data — Terraform only reads it

data "aws_ami" "latest_amazon_linux" {
  most_recent = true
  owners      = ["amazon"]

  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

resource "aws_instance" "web" {
  ami           = data.aws_ami.latest_amazon_linux.id
  instance_type = "t3.micro"
}
  • Terraform performs a read-only lookup against the provider on every plan/apply — it does not create, modify, or delete the underlying object.
  • Common uses: looking up the latest AMI, referencing a VPC/subnet created by another team or another Terraform configuration, or pulling a secret's value from a secrets manager.

Why the distinction matters

  • Blast radius: removing a resource block plans a destroy of real infrastructure; removing a data block just stops referencing something — nothing is deleted.
  • Ownership boundaries: data sources are the standard way to consume infrastructure owned by a different configuration/team without taking on responsibility for its lifecycle (avoiding one team's apply accidentally modifying another team's resources).
  • Dependency graph: both participate in Terraform's dependency graph — referencing data.aws_ami.latest_amazon_linux.id still creates an implicit dependency, it just resolves via a read instead of a write.

A good rule of thumb for interviews: if Terraform should be able to delete it, it's a resource; if Terraform should only ever look it up, it's a data source.

Related Resources