EC2 Instance Communicating with Malicious IP — Incident Response

The Problem

A running EC2 instance may be compromised — malware installed, data being exfiltrated, or the instance being used as a pivot to attack other systems. The tension is between containing the threat quickly and preserving forensic evidence needed to understand how it happened.

GuardDuty Finding

Finding: UnauthorizedAccess:EC2/MaliciousIPCaller.Custom
Instance: i-0abc123def456 (prod-order-service-3)
Communication: Outbound TCP 443 to 203.0.113.100
Threat Intelligence: IP associated with Lazarus Group C2 infrastructure
Time: 2024-01-15 03:15:22 UTC

Step 1: Containment — Isolate the Instance (First 10 Minutes)

Do not terminate the instance — that destroys evidence. Instead, attach a quarantine security group that blocks all traffic while keeping the instance running for forensic analysis.

# Create a quarantine security group (no inbound, no outbound)
aws ec2 create-security-group \
  --group-name sg-quarantine \
  --description "Quarantine - blocks all traffic" \
  --vpc-id vpc-0abc123

# Attach it to the compromised instance, REPLACING all existing SGs
aws ec2 modify-instance-attribute \
  --instance-id i-0abc123def456 \
  --groups sg-quarantine-id

# Take an EBS volume snapshot for forensic analysis
aws ec2 create-snapshot \
  --volume-id $(aws ec2 describe-instances \
    --instance-ids i-0abc123def456 \
    --query 'Reservations[0].Instances[0].BlockDeviceMappings[0].Ebs.VolumeId' \
    --output text) \
  --description "Forensics-$(date +%Y%m%d-%H%M)-i-0abc123def456"

# Stop (NOT terminate) the instance — preserves state
aws ec2 stop-instances --instance-ids i-0abc123def456

Step 2: Analyze VPC Flow Logs

Query VPC Flow Logs to understand the full scope of communication with the malicious IP:

# Find all traffic to/from the malicious IP (CloudWatch Logs Insights)
aws logs start-query \
  --log-group-name /aws/vpc/flowlogs \
  --start-time $(date -d '24 hours ago' +%s) \
  --end-time $(date +%s) \
  --query-string '
    fields @timestamp, srcAddr, dstAddr, srcPort, dstPort, bytes, action
    | filter srcAddr="10.0.1.55" or dstAddr="203.0.113.100"
    | sort @timestamp asc
    | limit 1000
  '

Look for:

Data volume: High bytes outbound = exfiltration
Port patterns: TCP 443/80 = C2 over HTTPS; unusual ports = custom malware
Frequency: Regular beaconing (every 60s, 300s) = C2 beacon pattern
Other destinations: Did the instance communicate with any other unusual IPs?

Step 3: CloudTrail — What API Calls Did This Instance Make?

The EC2 instance had an IAM role. Find out if the attacker used those credentials:

# Get the instance profile ARN
ROLE=$(aws ec2 describe-instances \
  --instance-ids i-0abc123def456 \
  --query 'Reservations[0].Instances[0].IamInstanceProfile.Arn' \
  --output text)

# Look up API calls made by this role in the last 24 hours
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=ResourceName,AttributeValue=i-0abc123def456 \
  --start-time $(date -d '24 hours ago' -u +%Y-%m-%dT%H:%M:%SZ) \
  --query 'Events[*].{Time:EventTime,Event:EventName,User:Username,IP:CloudTrailEvent}'

Red flags to look for:

ListBuckets / GetObject from S3 → data exfiltration
DescribeInstances across regions → reconnaissance
CreateUser or CreateAccessKey → persistence creation
AssumeRole to other roles → lateral movement

Step 4: GuardDuty Full Investigation

DETECTOR_ID=$(aws guardduty list-detectors --query 'DetectorIds[0]' --output text)

# Get all findings for this instance
aws guardduty list-findings \
  --detector-id $DETECTOR_ID \
  --finding-criteria '{
    "Criterion": {
      "resource.instanceDetails.instanceId": {
        "Equals": ["i-0abc123def456"]
      }
    }
  }'

# Get finding details
aws guardduty get-findings \
  --detector-id $DETECTOR_ID \
  --finding-ids <finding-id-from-above>

Step 5: Eradication

After forensic analysis is complete:

# 1. Terminate the compromised instance
aws ec2 terminate-instances --instance-ids i-0abc123def456

# 2. Revoke all permissions on the instance role
aws iam put-role-policy \
  --role-name order-service-instance-role \
  --policy-name EmergencyRevoke \
  --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*"}]}'

# 3. Rotate any secrets the instance had access to
aws secretsmanager rotate-secret \
  --secret-id prod/order-service/db-password

# 4. Block the malicious IP at the network level
aws ec2 create-network-acl-entry \
  --network-acl-id acl-prod \
  --rule-number 50 \
  --protocol "-1" \
  --egress \
  --cidr-block 203.0.113.100/32 \
  --rule-action deny

Step 6: Recovery and Hardening

Launch a replacement instance from your golden AMI (hardened, patched, known-good state):

aws ec2 run-instances \
  --image-id ami-golden-hardened \
  --instance-type m5.large \
  --subnet-id subnet-private \
  --security-group-ids sg-order-service \
  --iam-instance-profile Name=order-service-profile \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=order-service-replacement}]'

Harden for the future:

# Lambda: auto-quarantine any GuardDuty HIGH/CRITICAL finding
import boto3

def auto_quarantine(event, context):
    detail = event['detail']
    if detail['severity'] >= 7.0:  # HIGH or CRITICAL
        instance_id = detail['resource']['instanceDetails']['instanceId']
        ec2 = boto3.client('ec2')
        
        # Attach quarantine SG
        ec2.modify_instance_attribute(
            InstanceId=instance_id,
            Groups=['sg-quarantine-id']
        )
        
        # Snapshot for forensics
        volumes = ec2.describe_instances(InstanceIds=[instance_id])
        # ... create snapshots
        
        # Alert security team
        sns = boto3.client('sns')
        sns.publish(
            TopicArn='arn:aws:sns:us-east-1:...:security-alerts',
            Subject=f'Instance {instance_id} auto-quarantined',
            Message=str(detail)
        )

Wire this Lambda to an EventBridge rule on GuardDuty Finding events.

Post-Incident Summary

Phase	Actions	Tools
Contain	Quarantine SG, EBS snapshot, stop instance	EC2, GuardDuty
Investigate	VPC Flow Logs, CloudTrail, GuardDuty findings	CloudWatch Logs Insights
Eradicate	Terminate instance, revoke IAM, rotate secrets	IAM, Secrets Manager
Recover	Launch from golden AMI, run hardening playbook	EC2, SSM
Prevent	Auto-quarantine Lambda, Network Firewall threat intel	Lambda, EventBridge

Interview Angle

Interviewers often ask: “Why not just terminate the instance immediately?” The answer: you lose the ability to understand how the attacker got in, which means you can’t close the hole. Stop first, investigate, then terminate. Always preserve forensic artifacts.

EC2 Instance Communicating with Malicious IP — Incident Response

The Problem

Step 1: Containment — Isolate the Instance (First 10 Minutes)

Step 2: Analyze VPC Flow Logs

Step 3: CloudTrail — What API Calls Did This Instance Make?

Step 4: GuardDuty Full Investigation

Step 5: Eradication

Step 6: Recovery and Hardening

Post-Incident Summary

Have a similar scenario to share?

Related Scenarios

Developer Pushed AWS Credentials to Public GitHub — Incident Response

Single AZ Failure Took Down Black Friday — Root Cause & Fix

AWS Cloud Engineer Learning Path