boto3 Retry Decorator with Exponential Backoff for ThrottlingException
Python decorator that adds automatic retry logic with exponential backoff and jitter to any boto3 API call, handling ThrottlingException and transient AWS errors.
Production resilience — AWS throttles API calls per account per region. Batch scripts that call describe_instances 500 times will hit limits. Automatic retry with backoff is essential.
Problem Statement
Your compliance script calls describe_instances in a loop across 50 regions and 20 accounts. After 30 seconds, you start getting ThrottlingException: Rate exceeded. Without retry logic, your script crashes and you have incomplete data. With exponential backoff, it slows down automatically and completes successfully.
Why Exponential Backoff + Jitter?
Naive retry (bad): retry immediately → still throttled → fail
Fixed delay (better): wait 1s → retry → wait 1s → retry
Exponential (good): wait 1s → 2s → 4s → 8s → 16s (backs off)
Exponential + Jitter (best): 0.8s → 1.7s → 3.9s → 7.2s (avoids thundering herd)
Jitter adds random variation so that 100 concurrent threads don’t all wake up and retry at the same moment — which would just cause another wave of throttling.
Complete Decorator
import boto3
import time
import random
import functools
import logging
from botocore.exceptions import ClientError
from botocore.config import Config
logger = logging.getLogger(__name__)
# ── Decorator: add retry to any function ─────────────────────────
def aws_retry(
max_retries: int = 5,
base_delay: float = 0.5,
max_delay: float = 30.0,
jitter: bool = True,
retryable_errors: set = None,
):
"""
Decorator that wraps any function with automatic retry logic.
max_retries: total number of retry attempts (not counting the first try)
base_delay: initial wait before first retry (seconds)
max_delay: cap the wait time at this many seconds
jitter: add random variation to prevent thundering herd
retryable_errors: set of AWS error codes to retry on
Exponential backoff formula:
delay = min(base_delay × 2^attempt, max_delay)
with jitter: delay × uniform(0.75, 1.25)
functools.wraps(func) copies the original function's __name__,
__doc__, __module__, __qualname__ and __annotations__ to the
wrapper — essential for debugging and introspection.
"""
if retryable_errors is None:
retryable_errors = {
"ThrottlingException",
"RequestLimitExceeded",
"TooManyRequestsException",
"ServiceUnavailable",
"InternalServerError",
"RequestTimeout",
"ProvisionedThroughputExceededException",
"LimitExceededException",
"RequestExpired",
"Throttling", # Some services use just "Throttling"
"SlowDown", # S3 uses this for throttling
}
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
last_exception = None
for attempt in range(max_retries + 1): # +1: attempt 0 is the initial try
try:
return func(*args, **kwargs)
except ClientError as e:
error_code = e.response["Error"]["Code"]
error_msg = e.response["Error"]["Message"]
# Non-retryable: re-raise immediately
if error_code not in retryable_errors:
raise
last_exception = e
# We've exhausted all retries
if attempt == max_retries:
logger.error(
f"{func.__name__} failed after {max_retries} retries. "
f"Last error: {error_code}: {error_msg}"
)
raise
# Calculate wait time
delay = min(base_delay * (2 ** attempt), max_delay)
if jitter:
# Multiply by random value between 0.75 and 1.25
# This spreads retries across time instead of bunching them
delay *= (0.75 + random.random() * 0.5)
logger.warning(
f"{func.__name__} attempt {attempt + 1}/{max_retries} failed "
f"({error_code}). Retrying in {delay:.2f}s..."
)
time.sleep(delay)
raise last_exception # Should never reach here, but satisfies type checkers
return wrapper
return decorator
# ── Usage: function-level decorator ──────────────────────────────
@aws_retry(max_retries=5, base_delay=1.0)
def list_all_instances(region: str) -> list:
"""List all EC2 instances in a region with auto-retry on throttle."""
ec2 = boto3.client("ec2", region_name=region)
instances = []
paginator = ec2.get_paginator("describe_instances")
for page in paginator.paginate():
for reservation in page["Reservations"]:
instances.extend(reservation["Instances"])
return instances
@aws_retry(max_retries=3, base_delay=0.5)
def get_secret(secret_name: str) -> dict:
"""Retrieve secret from Secrets Manager with retry."""
import json
sm = boto3.client("secretsmanager")
response = sm.get_secret_value(SecretId=secret_name)
return json.loads(response["SecretString"])
# ── Usage: class-based wrapper (all methods auto-retry) ───────────
class AWSClientWithRetry:
"""
Wraps a boto3 client so that EVERY method call is automatically
retried on throttling. Useful when you use a single client extensively.
__getattr__ is called when Python can't find an attribute on the object.
We intercept it to wrap any callable (boto3 method) with the retry decorator.
"""
def __init__(
self,
service: str,
region: str = "us-east-1",
max_retries: int = 5,
**boto_kwargs,
):
self._client = boto3.client(service, region_name=region, **boto_kwargs)
self._max_retries = max_retries
def __getattr__(self, name: str):
"""
Called when accessing any attribute not found on this object.
Returns the boto3 method wrapped with the retry decorator.
"""
attr = getattr(self._client, name)
if callable(attr):
return aws_retry(max_retries=self._max_retries)(attr)
return attr
# ── Method 3: botocore built-in retry config (simpler) ───────────
def get_client_with_builtin_retry(service: str, region: str = "us-east-1"):
"""
botocore has built-in retry logic via Config.
retry.mode options:
"legacy" — default, 3 retries with fixed delays
"standard" — 3 retries with exponential backoff
"adaptive" — dynamic retry with token bucket algorithm (best for throttling)
max_attempts includes the initial attempt + retries.
So max_attempts=5 means 1 initial + 4 retries.
"""
config = Config(
retries={
"mode": "adaptive", # Adaptive token bucket algorithm
"max_attempts": 10, # Up to 9 retries
},
connect_timeout=5,
read_timeout=30,
)
return boto3.client(service, region_name=region, config=config)
# ── Combining approaches ──────────────────────────────────────────
if __name__ == "__main__":
# Approach 1: Function decorator (best for specific functions)
instances = list_all_instances("ap-south-1")
print(f"Found {len(instances)} instances")
# Approach 2: Class wrapper (best when reusing a client heavily)
ec2 = AWSClientWithRetry("ec2", region="ap-south-1", max_retries=5)
# All these calls will auto-retry on ThrottlingException:
response = ec2.describe_vpcs()
sgs = ec2.describe_security_groups()
subnets = ec2.describe_subnets()
print(f"VPCs: {len(response['Vpcs'])}")
# Approach 3: botocore adaptive mode (simplest — built-in)
s3 = get_client_with_builtin_retry("s3")
buckets = s3.list_buckets()["Buckets"]
print(f"S3 buckets: {len(buckets)}")
Retry Timing Comparison
| Attempt | Base(0.5s) | ×2^n | With Jitter |
|---|---|---|---|
| 1st retry | 0.5s | 0.5 | 0.38–0.63s |
| 2nd retry | 1.0s | 1.0 | 0.75–1.25s |
| 3rd retry | 2.0s | 2.0 | 1.50–2.50s |
| 4th retry | 4.0s | 4.0 | 3.00–5.00s |
| 5th retry | 8.0s | 8.0 | 6.00–10.00s |
| Max cap | 30.0s | — | 22.5–37.5s |
Key Commands Explained
| Command | What it does |
|---|---|
@functools.wraps(func) | Copies original function metadata to the wrapper (preserves __name__, __doc__) |
e.response["Error"]["Code"] | The AWS error type string (e.g., "ThrottlingException") |
e.response["Error"]["Message"] | Human-readable error description |
min(base_delay * (2 ** attempt), max_delay) | Exponential backoff capped at max_delay |
random.random() * 0.5 | Random value 0.0–0.5, added to 0.75 to get 0.75–1.25 multiplier |
time.sleep(delay) | Block the current thread for the calculated delay |
Config(retries={"mode": "adaptive"}) | botocore built-in adaptive retry with token bucket |
__getattr__(self, name) | Python dunder called for attribute misses — used for transparent method wrapping |
Common Issues
Decorator not retrying — Check that the error code in your ClientError matches one in retryable_errors. Print e.response["Error"]["Code"] to see the exact value.
Jitter causing very long delays — With high max_delay and jitter multiplier > 1, delays can exceed max_delay. The formula delay × (0.75 + random × 0.5) keeps jitter between ×0.75 and ×1.25.
Don’t wrap write operations blindly — Retrying create_security_group on a transient error can create duplicate resources. Add idempotency checks (e.g., check if the resource exists before creating).
🔍 Line-by-Line Code Walkthrough
Imports
| Line | Why It’s Used |
|---|---|
import functools | Provides functools.wraps — the key tool for writing proper decorators |
from botocore.exceptions import ClientError | The exception class for all AWS API errors. Has .response["Error"]["Code"] to identify the error type |
from botocore.config import Config | Allows configuring retry behavior, timeouts, and connection pooling at the client level |
aws_retry(max_retries, base_delay, max_delay, jitter, retryable_errors) — The Outer Decorator Factory
def aws_retry(
max_retries: int = 5,
base_delay: float = 0.5,
max_delay: float = 30.0,
jitter: bool = True,
retryable_errors: set = None,
):
| Parameter | Explanation |
|---|---|
max_retries=5 | Total retry attempts AFTER the first try. So 5 retries = 6 total attempts |
base_delay=0.5 | Wait 0.5 seconds before the first retry. Each subsequent retry doubles this |
max_delay=30.0 | Cap the backoff at 30 seconds — prevents waiting 512s on attempt 10 |
jitter=True | Multiplies the delay by a random factor (0.75–1.25). Prevents 100 threads all retrying at the same moment (“thundering herd”) |
retryable_errors: set = None | Which AWS error codes trigger a retry. None means use the built-in set of throttling/transient codes |
if retryable_errors is None:
retryable_errors = {
"ThrottlingException",
"RequestLimitExceeded",
"SlowDown",
...
}
| Line | Explanation |
|---|---|
retryable_errors is None | We use None as default (not set()) because mutable default arguments in Python are shared across calls — a subtle bug. None + this check is the safe pattern |
{"ThrottlingException", ...} | A set for O(1) lookup. When error_code not in retryable_errors is checked on every exception, set lookup is faster than list search |
decorator(func) and wrapper(*args, **kwargs) — The Closure
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
| Line | Explanation |
|---|---|
def decorator(func): | aws_retry(...) returns decorator. This is the two-level pattern required when a decorator takes arguments. @aws_retry(max_retries=5) calls aws_retry() first, then applies the returned decorator to the function |
@functools.wraps(func) | Copies func.__name__, func.__doc__, func.__module__, func.__qualname__, func.__annotations__ to wrapper. Without this, your wrapped function’s __name__ would be "wrapper" — breaking logging, tracebacks, and introspection |
def wrapper(*args, **kwargs): | Accepts any arguments the original function takes and passes them through. The retry logic is completely transparent to callers |
The Retry Loop
for attempt in range(max_retries + 1): # +1: attempt 0 is the initial try
try:
return func(*args, **kwargs)
except ClientError as e:
error_code = e.response["Error"]["Code"]
error_msg = e.response["Error"]["Message"]
if error_code not in retryable_errors:
raise
| Line | Explanation |
|---|---|
range(max_retries + 1) | If max_retries=5, this is range(6) → attempts 0,1,2,3,4,5. Attempt 0 is the initial call, attempts 1-5 are retries |
return func(*args, **kwargs) | On success, immediately returns the result. The loop stops here — no more retry overhead |
e.response["Error"]["Code"] | boto3 puts the AWS error code in e.response["Error"]["Code"]. Examples: "ThrottlingException", "AccessDenied", "NoSuchBucket" |
e.response["Error"]["Message"] | Human-readable description: "Rate exceeded", "Access Denied", etc. |
if error_code not in retryable_errors: raise | For non-retryable errors (e.g., "AccessDeniedException", "NoSuchBucket"), re-raise immediately — retrying would be pointless and wasteful |
Exponential Backoff Formula
delay = min(base_delay * (2 ** attempt), max_delay)
if jitter:
delay *= (0.75 + random.random() * 0.5)
time.sleep(delay)
| Line | Explanation |
|---|---|
base_delay * (2 ** attempt) | Exponential growth: attempt 0 → ×1, attempt 1 → ×2, attempt 2 → ×4, attempt 3 → ×8, etc. |
min(..., max_delay) | Caps the delay. Without this, attempt 10 would wait 0.5 × 2^10 = 512 seconds |
random.random() | Returns a float in [0.0, 1.0). Multiply by 0.5 gives [0.0, 0.5). Add 0.75 gives [0.75, 1.25) |
delay *= (0.75 + random.random() * 0.5) | Jitter: random multiplier between 0.75× and 1.25×. Each thread gets a different delay, spreading retries across time |
time.sleep(delay) | Blocks the current thread. In a Lambda or single-threaded script, this is fine. In async code (asyncio), you’d use await asyncio.sleep(delay) instead |
AWSClientWithRetry.__getattr__ — Transparent Method Wrapping
def __getattr__(self, name: str):
attr = getattr(self._client, name)
if callable(attr):
return aws_retry(max_retries=self._max_retries)(attr)
return attr
| Line | Explanation |
|---|---|
__getattr__(self, name) | Python calls __getattr__ only when the attribute is NOT found through normal lookup. Since describe_instances is not defined on AWSClientWithRetry, Python calls this method with name="describe_instances" |
getattr(self._client, name) | Gets the actual method from the underlying boto3 client |
if callable(attr) | callable() returns True for functions and methods, False for properties, strings, etc. |
return aws_retry(...)(attr) | aws_retry(max_retries=5) returns decorator. Calling decorator(attr) returns the wrapped method. We return it without calling it — the caller will call it |
return attr | For non-callable attributes (like meta), return them as-is — no wrapping needed |
get_client_with_builtin_retry — botocore’s Built-In Retry
config = Config(
retries={"mode": "adaptive", "max_attempts": 10},
connect_timeout=5,
read_timeout=30,
)
| Field | Explanation |
|---|---|
"mode": "adaptive" | Uses a token bucket algorithm. Tokens are consumed with each retry. If the bucket empties, it waits to replenish. This dynamically adapts to the actual throttle rate |
"mode": "standard" | Fixed exponential backoff — 3 retries. Simpler but less smart |
"mode": "legacy" | Original boto3 retry (3 retries, fixed delay). The default if you don’t set mode |
"max_attempts": 10 | Total attempts including the initial call. So 10 = 1 initial + 9 retries |
connect_timeout=5 | Give up connecting to the AWS API endpoint after 5 seconds. Catches DNS failures and network partitions quickly |
read_timeout=30 | Give up waiting for the response body after 30 seconds. Some operations (like large S3 copies) need longer |
- Python functools.wraps decorator pattern
- botocore ClientError structure
- Exponential backoff formula
- Jitter to prevent thundering herd
- Adaptive retry mode in botocore
Have a similar scenario to share?
Production incidents are the best teachers. Submit your real-world scenario and help others learn.
Open Google FormRelated Scenarios
Generic boto3 Pagination Utility — Handle All Paginated AWS APIs
Problem Statement You write ec2.describe_instances() and it works in dev with 5 instances. In production with 1,200 instances, it silently …
Production-Grade Python Scripts for AWS — Best Practices & Patterns
The 8 Production Best Practices # Practice Why it matters 1 Structured JSON logging CloudWatch Logs Insights can query JSON fields 2 …
Auto Stop/Start EC2 Instances Using Schedule Tags with Python
Problem Statement Your team has 20 dev/staging EC2 instances that run 24/7 but are only used during business hours (8 AM – 8 PM). Each …