Scenario Intermediate Python Python AWS Scripting

Generic boto3 Pagination Utility — Handle All Paginated AWS APIs

Python utility function that handles pagination for any boto3 API call generically, with automatic NextToken handling, filtering support, and a lazy generator interface.

January 20, 2025 8 min read ~15 min to complete DB
The Situation

AWS API best practice — any API that can return more than 100 results is paginated. Forgetting to paginate means silently missing resources — a dangerous bug in security and compliance scripts.

4 Steps
5 Services Used
~15 min Duration
Intermediate Difficulty

Problem Statement

You write ec2.describe_instances() and it works in dev with 5 instances. In production with 1,200 instances, it silently returns only the first 1,000. Your security audit reports “no violations” — but 200 instances were never checked. Pagination is not optional.


Which APIs Are Paginated?

Almost all AWS list/describe APIs are paginated. Common ones:

APIDefault Page SizeResult Key
ec2:DescribeInstances100–1000Reservations
iam:ListUsers100Users
s3:ListObjectsV21000Contents
cloudtrail:LookupEvents50Events
rds:DescribeDBInstances100DBInstances

Complete Utility

import boto3
from typing import Generator, Any, Optional


# ── Method 1: Built-in paginator (preferred) ──────────────────────
def paginate_all(
    client,
    method_name: str,
    result_key: str,
    **kwargs,
) -> Generator[Any, None, None]:
    """
    Generic paginator for any boto3 API that supports pagination.

    client:      a boto3 service client (e.g., boto3.client("ec2"))
    method_name: the API method name as a string (e.g., "describe_instances")
    result_key:  the top-level dict key containing results (e.g., "Reservations")
    **kwargs:    any additional arguments to pass to the API (Filters, etc.)

    How boto3 paginators work:
    - client.get_paginator("method_name") returns a Paginator object.
    - paginator.paginate(**kwargs) returns a PageIterator.
    - Each iteration yields one page (a dict with the same structure as
      a single API call response).
    - boto3 automatically appends NextToken to each subsequent request
      and stops when there are no more pages.

    We use yield from to yield items one at a time — making this a
    lazy generator that never loads all results into memory at once.
    This is critical when you have millions of S3 objects.
    """
    try:
        # get_paginator() raises OperationNotPageable if the method
        # doesn't support pagination — we fall back to manual NextToken.
        paginator = client.get_paginator(method_name)
        for page in paginator.paginate(**kwargs):
            yield from page.get(result_key, [])

    except client.exceptions.ClientError:
        raise
    except Exception:
        # Fallback: manual NextToken loop for non-standard pagination
        yield from _manual_paginate(client, method_name, result_key, **kwargs)


def _manual_paginate(
    client,
    method_name: str,
    result_key: str,
    **kwargs,
) -> Generator[Any, None, None]:
    """
    Manual NextToken pagination for APIs that don't have a built-in paginator.
    Some older APIs use 'Marker' instead of 'NextToken'.
    """
    method = getattr(client, method_name)

    while True:
        response = method(**kwargs)
        yield from response.get(result_key, [])

        # Try NextToken first, then Marker (IAM uses Marker)
        next_token = response.get("NextToken") or response.get("Marker")
        if not next_token:
            break
        # Set the appropriate continuation token for the next call
        if "NextToken" in response:
            kwargs["NextToken"] = next_token
        else:
            kwargs["Marker"] = next_token


# ── Method 2: Collect all results into a list (convenience) ───────
def paginate_all_list(
    client,
    method_name: str,
    result_key: str,
    **kwargs,
) -> list:
    """
    Wrapper that collects all paginated results into a list.
    Use when you need to access results multiple times or check length.
    For very large result sets, prefer the generator version.
    """
    return list(paginate_all(client, method_name, result_key, **kwargs))


# ── Method 3: Paginate with a callback ────────────────────────────
def paginate_with_callback(
    client,
    method_name: str,
    result_key: str,
    callback,
    **kwargs,
) -> int:
    """
    Process each item with a callback function as pages arrive.
    Returns total items processed.
    Useful for writing results to a file/DB without buffering everything.
    """
    count = 0
    for item in paginate_all(client, method_name, result_key, **kwargs):
        callback(item)
        count += 1
    return count


# ── Usage examples ─────────────────────────────────────────────────
if __name__ == "__main__":
    ec2 = boto3.client("ec2", region_name="ap-south-1")
    s3  = boto3.client("s3")
    iam = boto3.client("iam")
    ct  = boto3.client("cloudtrail")

    # ── Example 1: List all running EC2 instances ─────────────────
    # Without pagination you'd call ec2.describe_instances() and risk
    # missing instances if there are more than the default page size.
    print("Running EC2 instances:")
    instance_count = 0
    for reservation in paginate_all(
        ec2,
        "describe_instances",
        "Reservations",
        Filters=[{"Name": "instance-state-name", "Values": ["running"]}],
    ):
        for instance in reservation["Instances"]:
            print(f"  {instance['InstanceId']}")
            instance_count += 1
    print(f"Total: {instance_count} running instances\n")

    # ── Example 2: List all IAM users ─────────────────────────────
    # iam.list_users() returns 100 users per page (max 1000 per call
    # when using PaginationConfig — but paginator handles it).
    all_users = paginate_all_list(iam, "list_users", "Users")
    print(f"Total IAM users: {len(all_users)}\n")

    # ── Example 3: List all S3 objects in a bucket ────────────────
    # S3 can have BILLIONS of objects — never load them all into a list.
    # Use the generator to process one at a time.
    bucket_name = "my-data-bucket"
    total_size = 0
    object_count = 0
    for obj in paginate_all(s3, "list_objects_v2", "Contents", Bucket=bucket_name):
        total_size += obj["Size"]
        object_count += 1
    print(f"Bucket {bucket_name}: {object_count:,} objects, {total_size / 1e9:.2f} GB\n")

    # ── Example 4: CloudTrail events with callback ─────────────────
    from datetime import datetime, timedelta
    login_events = []

    def collect_console_logins(event):
        if event.get("EventName") == "ConsoleLogin":
            login_events.append(event)

    count = paginate_with_callback(
        ct,
        "lookup_events",
        "Events",
        collect_console_logins,
        StartTime=datetime.utcnow() - timedelta(days=7),
        EndTime=datetime.utcnow(),
    )
    print(f"Processed {count} CloudTrail events, {len(login_events)} console logins")

    # ── Example 5: Paginate RDS instances ─────────────────────────
    rds = boto3.client("rds")
    db_instances = paginate_all_list(rds, "describe_db_instances", "DBInstances")
    print(f"\nTotal RDS instances: {len(db_instances)}")
    for db in db_instances:
        print(f"  {db['DBInstanceIdentifier']} ({db['DBInstanceStatus']})")

Why Paginators Beat Manual NextToken

# ❌ WRONG — silently misses resources beyond the first page
response = ec2.describe_instances()   # Returns ONLY the first page!
instances = response["Reservations"]  # May be incomplete

# ❌ FRAGILE — manual but verbose and easy to forget
response = ec2.describe_instances()
all_reservations = response["Reservations"]
while "NextToken" in response:
    response = ec2.describe_instances(NextToken=response["NextToken"])
    all_reservations.extend(response["Reservations"])

# ✅ CORRECT — paginator handles everything
paginator = ec2.get_paginator("describe_instances")
for page in paginator.paginate():
    for reservation in page["Reservations"]:
        process(reservation)

# ✅ BEST — use our generic utility
for reservation in paginate_all(ec2, "describe_instances", "Reservations"):
    process(reservation)

Key Commands Explained

CommandWhat it does
client.get_paginator("method_name")Returns a boto3 Paginator for the given API method
paginator.paginate(**kwargs)Returns a PageIterator — yields one page dict per iteration
page.get(result_key, [])Extracts the result list from each page — defaults to [] if key absent
yield from iterableDelegates iteration to the inner iterable (lazy generator composition)
response.get("NextToken")Returns None if no more pages (loop terminates)
getattr(client, method_name)Gets a method by name string — allows dynamic method dispatch
PaginationConfig={"MaxItems": 500}Limit total results across all pages

PaginationConfig Options

# Limit total results (useful for sampling or testing)
paginator = client.get_paginator("list_objects_v2")
for page in paginator.paginate(
    Bucket="my-bucket",
    PaginationConfig={
        "MaxItems":  100,    # Stop after 100 total items
        "PageSize":  50,     # 50 items per API call
        "StartingToken": None,  # Resume from a specific token
    }
):
    process(page["Contents"])

🔍 Line-by-Line Code Walkthrough

Imports

LineWhy It’s Used
import boto3AWS SDK — needed for creating service clients
from typing import Generator, Any, OptionalType hints. Generator[Any, None, None] declares that a function returns a generator that yields Any type values

paginate_all(client, method_name, result_key, **kwargs)

def paginate_all(client, method_name: str, result_key: str, **kwargs) -> Generator[Any, None, None]:
PartExplanation
clientAny boto3 service client (e.g., boto3.client("ec2"), boto3.client("s3"))
method_name: strThe API method name as a string (e.g., "describe_instances", "list_objects_v2") — allows this function to work with ANY paginated API
result_key: strThe key in each page response that contains the list of results (e.g., "Reservations", "Contents", "Users")
**kwargsAny additional arguments to pass through to the underlying API (e.g., Filters=[...], Bucket="my-bucket")
-> Generator[Any, None, None]Return type hint: this is a generator function. Any = items can be any type. First None = no values are sent into the generator. Second None = no return value
try:
    paginator = client.get_paginator(method_name)
    for page in paginator.paginate(**kwargs):
        yield from page.get(result_key, [])
LineExplanation
client.get_paginator(method_name)Dynamically creates a Paginator for the named method. boto3 knows which response key to use for NextToken automatically
paginator.paginate(**kwargs)Returns a PageIterator. Each iteration yields one full API response dict (one page)
yield from page.get(result_key, [])yield from delegates iteration — yields each item in the list one at a time to the caller. This is the key to making this a lazy generator (memory efficient). page.get(result_key, []) defaults to [] if the key is absent (some pages may have no results)
except Exception:
    yield from _manual_paginate(client, method_name, result_key, **kwargs)
LineExplanation
except ExceptionCatches OperationNotPageable (raised when the method doesn’t have a built-in paginator) and any other error from get_paginator
yield from _manual_paginate(...)Falls back to the manual NextToken implementation. yield from inside a try/except is valid in Python 3.3+

_manual_paginate(client, method_name, result_key, **kwargs)

method = getattr(client, method_name)
while True:
    response = method(**kwargs)
    yield from response.get(result_key, [])
    next_token = response.get("NextToken") or response.get("Marker")
    if not next_token:
        break
    if "NextToken" in response:
        kwargs["NextToken"] = next_token
    else:
        kwargs["Marker"] = next_token
LineExplanation
getattr(client, method_name)Gets a method by name string. getattr(ec2_client, "describe_instances") returns the describe_instances method object. This enables dynamic dispatch
while True:Infinite loop — continues until we break when there are no more pages
response = method(**kwargs)Calls the API. **kwargs passes all accumulated parameters including any pagination tokens
yield from response.get(result_key, [])Yields all items from this page to the caller
response.get("NextToken") or response.get("Marker")Tries NextToken first (modern APIs), then Marker (older APIs like IAM use Marker). The or ensures we get whichever is present
if not next_token: breakNone (key absent) or "" (empty string) both evaluate to falsy — stops the loop
kwargs["NextToken"] = next_tokenInjects the continuation token into kwargs so the next method(**kwargs) call fetches the next page

paginate_all_list(...)

def paginate_all_list(client, method_name, result_key, **kwargs) -> list:
    return list(paginate_all(client, method_name, result_key, **kwargs))
LineExplanation
list(paginate_all(...))Consumes the entire generator and stores all results in a list. Use when you need random access (results[5]), length check (len(results)), or need to iterate multiple times
When to prefer generator vs list?Generator = memory efficient, process as data arrives. List = needed when you must check len(), sort, or iterate multiple times

paginate_with_callback(client, method_name, result_key, callback, **kwargs)

count = 0
for item in paginate_all(client, method_name, result_key, **kwargs):
    callback(item)
    count += 1
return count
LineExplanation
callback(item)Calls the user-provided function with each item. The callback can write to a database, file, or process data without buffering everything
count += 1Tracks total items processed. Returned for logging or reporting
Use caseStreaming processing — e.g., processing 1 million S3 objects without loading all their metadata into RAM first

Usage Example — Why paginate_all Instead of Direct Call

# ❌ WRONG — silently misses resources beyond the first page
response = ec2.describe_instances()
instances = response["Reservations"]   # May be incomplete!

# ✅ CORRECT — never misses a result
for reservation in paginate_all(ec2, "describe_instances", "Reservations"):
    process(reservation)
PointExplanation
The silent failure dangerdescribe_instances() without pagination returns the first page only (up to 1000 instances). In a small test account it looks correct. In production it silently drops instances
No error is raisedAWS doesn’t error when there are more results — it just silently omits them. The response includes "NextToken" but if you don’t check for it, you never know more data exists
Services Used
EC2IAMS3CloudTrailboto3
Prerequisites
  • Python 3.8+
  • boto3
What You Learned
  • boto3 paginator objects
  • get_paginator vs manual NextToken
  • Python generators for memory efficiency
  • Type hints for API utilities

Have a similar scenario to share?

Production incidents are the best teachers. Submit your real-world scenario and help others learn.

Open Google Form

Related Scenarios