Scenario Advanced Aws AWS IaC / Terraform

Handling Terraform State Drift in Production

AWS resources changed outside of Terraform — via Console, CLI, or another tool. Detect the drift, reconcile state, and prevent it from happening again.

January 20, 2025 4 min read ~25 min to complete DB
The Situation

A developer manually changed an RDS parameter group through the AWS Console last week to fix a production issue. Terraform doesn't know about it. Today, a routine `terraform apply` is about to revert that change — taking down the database parameter that's keeping the app stable. How do you detect drift before it causes damage, fix it properly, and prevent it from happening again?

5 Steps
5 Services Used
~25 min Duration
Advanced Difficulty

The Problem

Drift is when your live AWS infrastructure no longer matches what Terraform’s state file describes. It happens when:

  • Engineers make emergency changes via the Console
  • Another team’s automation modifies shared resources
  • AWS makes service-side changes (e.g., auto-updating default parameter groups)
  • A Terraform apply partially succeeds and leaves orphaned resources

Drift is invisible until you run terraform apply — and by then, it may be too late.

Step 1: Detect Drift — terraform plan With Refresh

The most direct way to see drift is a plan with refresh enabled:

# Refresh reads live AWS state and compares to your .tf files
terraform plan -refresh=true -detailed-exitcode

# Exit codes:
# 0 = no changes (no drift)
# 1 = error
# 2 = changes detected (drift exists)
echo "Exit code: $?"
# For a specific resource only
terraform plan -target=aws_db_parameter_group.prod -refresh=true

The plan output shows exactly what Terraform would change. Before running apply, validate that the “change” is drift (real infra differs from .tf file), not a genuine desired change.

Step 2: Automated Drift Detection Pipeline

Running manual plans isn’t scalable. Set up a scheduled drift detection job:

# EventBridge rule: run drift detection every 6 hours
# ↓ triggers CodeBuild project ↓ which runs terraform plan

# buildspec-drift-check.yml
version: 0.2
phases:
  build:
    commands:
      - |
        for module in networking compute databases security; do
          cd environments/prod/$module
          terraform init -input=false
          terraform plan -refresh=true -detailed-exitcode -no-color > /tmp/$module-plan.txt 2>&1
          EXIT=$?
          if [ $EXIT -eq 2 ]; then
            echo "DRIFT DETECTED in $module module"
            aws sns publish \
              --topic-arn $DRIFT_ALERT_TOPIC \
              --subject "Terraform Drift: prod/$module" \
              --message "$(cat /tmp/$module-plan.txt)"
          fi
          cd ../../../
        done

AWS Config Rule for resource-level detection:

# Lambda-backed Config rule: detect untagged or manually created resources
def evaluate_compliance(configuration_item, rule_parameters):
    # Check if resource was created outside Terraform
    # (Terraform always adds a "ManagedBy=Terraform" tag)
    tags = configuration_item.get('configuration', {}).get('tags', {})
    if 'ManagedBy' not in tags or tags['ManagedBy'] != 'Terraform':
        return 'NON_COMPLIANT'
    return 'COMPLIANT'

Step 3: Reconcile Drift — Import or Update

Once drift is detected, you have two options:

Option A: Import the Drift Into State

Use this when the manual change was correct and you want Terraform to adopt it:

# Import the manually-modified RDS parameter group
terraform import aws_db_parameter_group.prod my-prod-pg14

# Now update your .tf file to match the actual parameters
# (terraform plan should show no changes after)
terraform plan  # verify: 0 changes

Option B: Override Drift — Apply Back to Desired State

Use this when the manual change was wrong or unauthorized:

# Review exactly what will change
terraform plan -refresh=true

# Apply to bring infrastructure back to the declared state
terraform apply -refresh=true -target=aws_db_parameter_group.prod
Always Review Before Applying
Blindly running terraform apply to fix drift can destroy legitimate configuration. Always read the plan output carefully. If in doubt, import the change into Terraform first, discuss with the team, then decide whether to keep or revert it.

Step 4: terraform refresh — Sync State Only

If you want to update Terraform’s state file to match reality without making any changes:

# Update state to match real AWS (doesn't touch infrastructure)
terraform refresh

# Then check what your .tf files declare vs the refreshed state
terraform plan  # now shows what WOULD change if you applied

This is useful when another team legitimately owns part of a shared resource.

Step 5: Prevent Drift — IaC-Only Enforcement

Tag all Terraform resources and use AWS Config to alert on untagged changes:

# In your Terraform provider configuration
provider "aws" {
  default_tags {
    tags = {
      ManagedBy   = "Terraform"
      Environment = var.environment
      Module      = var.module_name
    }
  }
}

SCP to deny console modifications on tagged production resources:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "DenyConsoleModifyTerraformResources",
    "Effect": "Deny",
    "Action": [
      "ec2:ModifyInstanceAttribute",
      "rds:ModifyDBInstance",
      "rds:ModifyDBParameterGroup"
    ],
    "Resource": "*",
    "Condition": {
      "StringEquals": {
        "aws:ResourceTag/ManagedBy": "Terraform",
        "aws:ResourceTag/Environment": "production"
      },
      "StringNotEquals": {
        "aws:PrincipalArn": [
          "arn:aws:iam::*:role/terraform-pipeline-role"
        ]
      }
    }
  }]
}

Drift Resolution Workflow

Drift Detected (plan shows changes)
         │
         ▼
Is the manual change correct/intentional?
   │                    │
  YES                   NO
   │                    │
   ▼                    ▼
terraform import    terraform apply
+ update .tf file   -target=<resource>
+ git commit        (reverts to declared state)
+ PR review

Summary

ScenarioCommandWhen to Use
Check for driftterraform plan -refresh=trueBefore every apply
Adopt a manual changeterraform import + update .tfChange was valid
Revert manual changeterraform apply -target=...Change was wrong
Sync state onlyterraform refreshAnother team owns the resource
Continuous monitoringAWS Config + EventBridge + LambdaCatch drift automatically
Interview Angle
Mention the cultural fix: drift usually happens because engineers feel they can’t use Terraform fast enough in an emergency. The answer is a pre-approved “break-glass” runbook: emergency Console changes are allowed but must be documented and immediately imported into Terraform within 24 hours.
Services Used
AWS ConfigEventBridgeLambdaS3IAM
Prerequisites
  • Understanding of Terraform state files and `terraform.tfstate`
  • Basic familiarity with AWS Config rules
What You Learned
  • How to detect drift between Terraform state and real AWS resources
  • The `terraform import` workflow for reconciling state
  • How to use AWS Config for continuous drift detection
  • How to enforce IaC-only changes using SCPs and tagging

Have a similar scenario to share?

Production incidents are the best teachers. Submit your real-world scenario and help others learn.

Open Google Form

Related Scenarios