Scenario Advanced Aws AWS Migration

Migrate a Java Monolith From On-Premises to AWS With Minimal Downtime

Plan and execute a phased migration of a Java on-premises monolith to AWS using re-platform strategy, AWS DMS for database migration, and a 30-minute cutover window.

January 20, 2025 6 min read ~40 min to complete DB
The Situation

Your company runs a 10-year-old Java monolith on bare-metal servers in a data center. The data center contract expires in 6 months. You need to migrate to AWS with minimal downtime — the application processes orders 24/7 and the business allows a maximum 30-minute maintenance window. The application uses Oracle DB (200GB), serves 500 concurrent users, and has 15 external integrations.

6 Steps
8 Services Used
~40 min Duration
Advanced Difficulty

The Problem

A 6-month deadline forces choices. You don’t have time to re-architect the monolith into microservices. You need to move the application to AWS, keep it working, and modernize incrementally over the following 12 months.

Step 1: Discovery and Assessment (Weeks 1-2)

Before writing a line of Terraform, understand what you’re moving:

# Deploy AWS Application Discovery Service agents on all on-prem servers
# (done via console: Migration Hub → Discover → Data Collectors)

# After 2 weeks of collection, export the dependency map
aws discovery describe-export-tasks
aws discovery list-servers \
  --query 'servers[*].{Name:serverInfo.networkInterfaceInfo[0].ipAddress,CPU:serverInfo.cpuType,RAM:serverInfo.ramInMB,OS:osInfo.type}'

Key questions the discovery answers:

  • Which servers communicate with each other? (Dependencies to migrate together)
  • What external IPs does the app reach? (Firewall rules to replicate)
  • What are the peak CPU/memory hours? (Right-size EC2 instances)
  • Which processes run as scheduled jobs? (Cron → EventBridge / Lambda)

Apply the 6 Rs framework:

Service6R DecisionRationale
Java App ServerRe-platform → EC2 (then ECS/EKS)Same app, managed infra, faster migration
Oracle DBRe-platform → RDS Oracle → AuroraManaged, but convert to PostgreSQL next year
File ServerRe-host → EFSLift and shift shared file system
Active DirectoryRe-platform → AWS Managed ADKeep AD for auth, eliminate on-prem AD
Batch Jobs (cron)Re-architect → EventBridge + LambdaEasy win during migration

Step 2: Foundation (Weeks 3-6)

# Set up Landing Zone via Control Tower
# Minimum 3 accounts:
#   - Management (billing)
#   - Production (app workload)
#   - Shared Services (ECR, logging)

# Deploy network foundation with Terraform
# VPC with 3-tier architecture
module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.0"

  name = "prod-vpc"
  cidr = "10.0.0.0/16"

  azs             = ["us-east-1a", "us-east-1b", "us-east-1c"]
  private_subnets = ["10.0.10.0/24", "10.0.11.0/24", "10.0.12.0/24"]
  public_subnets  = ["10.0.0.0/24", "10.0.1.0/24", "10.0.2.0/24"]
  database_subnets = ["10.0.20.0/24", "10.0.21.0/24", "10.0.22.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = false   # HA: one NAT per AZ
}
# Set up AWS Direct Connect (private 1Gbps path for data migration)
# - Avoids internet bandwidth limits during 200GB database transfer
# - Required for continuous DMS replication
# Provisioning takes 4-8 weeks — start this FIRST

Step 3: Database Migration With AWS DMS (Weeks 7-10)

DMS enables zero-downtime database migration with continuous replication:

Phase 1: Full Load
  On-Prem Oracle ────────────────────────────► RDS Oracle
  (takes hours, app still writes to on-prem)

Phase 2: Ongoing Replication (CDC)
  On-Prem Oracle ──► Capture changes (CDC) ──► RDS Oracle
  (< 1 second lag, runs for weeks before cutover)

Phase 3: Cutover (30-minute window)
  1. Stop writes to on-prem (maintenance mode)
  2. Wait for DMS lag to reach 0
  3. Validate row counts and checksums
  4. Redirect app to RDS
  5. Verify app works on AWS
  6. Done
# Create DMS replication instance
aws dms create-replication-instance \
  --replication-instance-identifier prod-migration-dms \
  --replication-instance-class dms.r5.2xlarge \
  --allocated-storage 500 \
  --multi-az \
  --vpc-security-group-ids sg-dms

# Create source endpoint (on-prem Oracle via Direct Connect)
aws dms create-endpoint \
  --endpoint-identifier source-oracle-onprem \
  --endpoint-type source \
  --engine-name oracle \
  --server-name 10.100.0.50 \
  --port 1521 \
  --database-name PRODDB \
  --username dms_user \
  --password $ORACLE_PASSWORD

# Create target endpoint (RDS Oracle)
aws dms create-endpoint \
  --endpoint-identifier target-rds-oracle \
  --endpoint-type target \
  --engine-name oracle \
  --server-name prod-oracle.us-east-1.rds.amazonaws.com \
  --port 1521 \
  --database-name PRODDB \
  --username admin \
  --password $RDS_PASSWORD

# Create replication task (full load + CDC)
aws dms create-replication-task \
  --replication-task-identifier prod-oracle-migration \
  --source-endpoint-arn $SOURCE_ARN \
  --target-endpoint-arn $TARGET_ARN \
  --replication-instance-arn $REPLICATION_INSTANCE_ARN \
  --migration-type full-load-and-cdc \
  --table-mappings '{
    "rules": [{
      "rule-type": "selection",
      "rule-id": "1",
      "rule-name": "include-all",
      "object-locator": {
        "schema-name": "PRODSCHEMA",
        "table-name": "%"
      },
      "rule-action": "include"
    }]
  }'

Validate data consistency:

-- On-prem Oracle
SELECT COUNT(*) FROM orders;  -- e.g., 4,823,419

-- RDS Oracle (should match after full load)
SELECT COUNT(*) FROM orders;  -- 4,823,419 ✓

-- More thorough: checksum validation
SELECT SUM(ORA_HASH(order_id || total_amount)) FROM orders;

Step 4: Application Migration (Weeks 7-14, Parallel)

While DMS runs, set up the EC2 environment:

# Launch EC2 instance matching on-prem server specs
aws ec2 run-instances \
  --image-id ami-java-app-server \   # Custom AMI with Java + Tomcat pre-installed
  --instance-type m5.4xlarge \       # Sized from Compute Optimizer recommendations
  --subnet-id subnet-private-az1 \
  --security-group-ids sg-app-server \
  --iam-instance-profile Name=app-server-role \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=prod-app-1}]'
# Deploy application on EC2 using existing Ansible playbooks
# (adapt on-prem playbooks to use RDS connection strings)
ansible-playbook deploy-app.yml \
  -i aws-inventory.ini \
  -e "db_host=prod-oracle.us-east-1.rds.amazonaws.com" \
  -e "environment=aws-staging"

Parallel validation: Run the AWS instance in “shadow mode” — receiving real traffic but not serving responses — to compare behavior with on-prem.

Step 5: Cutover (30-Minute Maintenance Window)

Prerequisites before starting the window:

  • DMS replication lag < 5 seconds (CDC is caught up)
  • AWS environment validated with load tests
  • Rollback plan ready (DNS flip back to on-prem)
  • All stakeholders notified
# T-0: Enable maintenance mode on the on-prem load balancer
# (returns HTTP 503 with "Scheduled maintenance" page)

# T+2: Verify DMS lag has reached 0
aws dms describe-replication-tasks \
  --filters Name=replication-task-arn,Values=$TASK_ARN \
  --query 'ReplicationTasks[0].ReplicationTaskStats.CDCLatencyTarget'
# Must show: 0

# T+5: Run final validation query
psql -h prod-oracle.us-east-1.rds.amazonaws.com -c "SELECT COUNT(*) FROM orders;"
# Must match on-prem count

# T+7: Stop DMS replication task
aws dms stop-replication-task --replication-task-arn $TASK_ARN

# T+10: Update application config to point to RDS (deploy via SSM Parameter Store)
aws ssm put-parameter \
  --name /prod/app/db-host \
  --value prod-oracle.us-east-1.rds.amazonaws.com \
  --type SecureString \
  --overwrite

# T+12: Restart app servers to pick up new config
aws ssm send-command \
  --document-name AWS-RunShellScript \
  --targets Key=tag:Name,Values=prod-app-* \
  --parameters commands=["sudo systemctl restart tomcat"]

# T+15: Update DNS to point to AWS ALB
aws route53 change-resource-record-sets \
  --hosted-zone-id $ZONE_ID \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "app.company.com",
        "Type": "CNAME",
        "TTL": 60,
        "ResourceRecords": [{"Value": "prod-alb.us-east-1.elb.amazonaws.com"}]
      }
    }]
  }'

# T+18: Verify health checks green on AWS ALB
aws elbv2 describe-target-health --target-group-arn $TG_ARN

# T+25: Remove maintenance mode on-prem (traffic now flowing to AWS)
# T+30: Monitor error rates, latency, and database performance

Step 6: Post-Migration Optimization (Months 4-6)

After the migration, modernize incrementally:

MonthActionBenefit
Month 4Rightsize EC2 with Compute Optimizer20-30% cost reduction
Month 4Convert Oracle → Aurora PostgreSQL (Schema Conversion Tool)60-70% license cost saving
Month 5Extract stateless batch jobs → Lambda + EventBridgeEliminate EC2 for scheduled tasks
Month 6Containerize app → ECS FargateEliminate EC2 management
# Use AWS Schema Conversion Tool for Oracle → PostgreSQL
# (GUI-based tool — identifies incompatible SQL constructs)
aws sct --source oracle --target aurora-postgresql \
  --source-endpoint prod-oracle.us-east-1.rds.amazonaws.com
Interview Angle
The most important thing to emphasize: DMS continuous replication is what makes the 30-minute window possible. Without CDC replication running for weeks beforehand, you’d need a 4-8 hour window to copy 200GB and validate data. The business won’t accept that. Starting DMS early (parallel to app migration) is what collapses the maintenance window from hours to minutes.
Services Used
AWS DMSAWS Schema Conversion ToolDirect ConnectEC2RDSRoute 53EFSAWS Application Discovery Service
Prerequisites
  • Understanding of AWS networking fundamentals
  • Familiarity with database migration concepts
  • Basic knowledge of EC2 and RDS
What You Learned
  • How to apply the 6 Rs framework to choose the right migration strategy
  • How to use AWS DMS for continuous replication during migration
  • The difference between a migration cutover and a migration go-live
  • How to use Application Discovery Service to map dependencies
  • How Direct Connect provides a private high-bandwidth path during migration

Have a similar scenario to share?

Production incidents are the best teachers. Submit your real-world scenario and help others learn.

Open Google Form

Related Scenarios