Generic boto3 Pagination Utility — Handle All Paginated AWS APIs

Problem Statement

You write ec2.describe_instances() and it works in dev with 5 instances. In production with 1,200 instances, it silently returns only the first 1,000. Your security audit reports “no violations” — but 200 instances were never checked. Pagination is not optional.

Which APIs Are Paginated?

Almost all AWS list/describe APIs are paginated. Common ones:

API	Default Page Size	Result Key
`ec2:DescribeInstances`	100–1000	`Reservations`
`iam:ListUsers`	100	`Users`
`s3:ListObjectsV2`	1000	`Contents`
`cloudtrail:LookupEvents`	50	`Events`
`rds:DescribeDBInstances`	100	`DBInstances`

Complete Utility

import boto3
from typing import Generator, Any, Optional


# ── Method 1: Built-in paginator (preferred) ──────────────────────
def paginate_all(
    client,
    method_name: str,
    result_key: str,
    **kwargs,
) -> Generator[Any, None, None]:
    """
    Generic paginator for any boto3 API that supports pagination.

    client:      a boto3 service client (e.g., boto3.client("ec2"))
    method_name: the API method name as a string (e.g., "describe_instances")
    result_key:  the top-level dict key containing results (e.g., "Reservations")
    **kwargs:    any additional arguments to pass to the API (Filters, etc.)

    How boto3 paginators work:
    - client.get_paginator("method_name") returns a Paginator object.
    - paginator.paginate(**kwargs) returns a PageIterator.
    - Each iteration yields one page (a dict with the same structure as
      a single API call response).
    - boto3 automatically appends NextToken to each subsequent request
      and stops when there are no more pages.

    We use yield from to yield items one at a time — making this a
    lazy generator that never loads all results into memory at once.
    This is critical when you have millions of S3 objects.
    """
    try:
        # get_paginator() raises OperationNotPageable if the method
        # doesn't support pagination — we fall back to manual NextToken.
        paginator = client.get_paginator(method_name)
        for page in paginator.paginate(**kwargs):
            yield from page.get(result_key, [])

    except client.exceptions.ClientError:
        raise
    except Exception:
        # Fallback: manual NextToken loop for non-standard pagination
        yield from _manual_paginate(client, method_name, result_key, **kwargs)


def _manual_paginate(
    client,
    method_name: str,
    result_key: str,
    **kwargs,
) -> Generator[Any, None, None]:
    """
    Manual NextToken pagination for APIs that don't have a built-in paginator.
    Some older APIs use 'Marker' instead of 'NextToken'.
    """
    method = getattr(client, method_name)

    while True:
        response = method(**kwargs)
        yield from response.get(result_key, [])

        # Try NextToken first, then Marker (IAM uses Marker)
        next_token = response.get("NextToken") or response.get("Marker")
        if not next_token:
            break
        # Set the appropriate continuation token for the next call
        if "NextToken" in response:
            kwargs["NextToken"] = next_token
        else:
            kwargs["Marker"] = next_token


# ── Method 2: Collect all results into a list (convenience) ───────
def paginate_all_list(
    client,
    method_name: str,
    result_key: str,
    **kwargs,
) -> list:
    """
    Wrapper that collects all paginated results into a list.
    Use when you need to access results multiple times or check length.
    For very large result sets, prefer the generator version.
    """
    return list(paginate_all(client, method_name, result_key, **kwargs))


# ── Method 3: Paginate with a callback ────────────────────────────
def paginate_with_callback(
    client,
    method_name: str,
    result_key: str,
    callback,
    **kwargs,
) -> int:
    """
    Process each item with a callback function as pages arrive.
    Returns total items processed.
    Useful for writing results to a file/DB without buffering everything.
    """
    count = 0
    for item in paginate_all(client, method_name, result_key, **kwargs):
        callback(item)
        count += 1
    return count


# ── Usage examples ─────────────────────────────────────────────────
if __name__ == "__main__":
    ec2 = boto3.client("ec2", region_name="ap-south-1")
    s3  = boto3.client("s3")
    iam = boto3.client("iam")
    ct  = boto3.client("cloudtrail")

    # ── Example 1: List all running EC2 instances ─────────────────
    # Without pagination you'd call ec2.describe_instances() and risk
    # missing instances if there are more than the default page size.
    print("Running EC2 instances:")
    instance_count = 0
    for reservation in paginate_all(
        ec2,
        "describe_instances",
        "Reservations",
        Filters=[{"Name": "instance-state-name", "Values": ["running"]}],
    ):
        for instance in reservation["Instances"]:
            print(f"  {instance['InstanceId']}")
            instance_count += 1
    print(f"Total: {instance_count} running instances\n")

    # ── Example 2: List all IAM users ─────────────────────────────
    # iam.list_users() returns 100 users per page (max 1000 per call
    # when using PaginationConfig — but paginator handles it).
    all_users = paginate_all_list(iam, "list_users", "Users")
    print(f"Total IAM users: {len(all_users)}\n")

    # ── Example 3: List all S3 objects in a bucket ────────────────
    # S3 can have BILLIONS of objects — never load them all into a list.
    # Use the generator to process one at a time.
    bucket_name = "my-data-bucket"
    total_size = 0
    object_count = 0
    for obj in paginate_all(s3, "list_objects_v2", "Contents", Bucket=bucket_name):
        total_size += obj["Size"]
        object_count += 1
    print(f"Bucket {bucket_name}: {object_count:,} objects, {total_size / 1e9:.2f} GB\n")

    # ── Example 4: CloudTrail events with callback ─────────────────
    from datetime import datetime, timedelta
    login_events = []

    def collect_console_logins(event):
        if event.get("EventName") == "ConsoleLogin":
            login_events.append(event)

    count = paginate_with_callback(
        ct,
        "lookup_events",
        "Events",
        collect_console_logins,
        StartTime=datetime.utcnow() - timedelta(days=7),
        EndTime=datetime.utcnow(),
    )
    print(f"Processed {count} CloudTrail events, {len(login_events)} console logins")

    # ── Example 5: Paginate RDS instances ─────────────────────────
    rds = boto3.client("rds")
    db_instances = paginate_all_list(rds, "describe_db_instances", "DBInstances")
    print(f"\nTotal RDS instances: {len(db_instances)}")
    for db in db_instances:
        print(f"  {db['DBInstanceIdentifier']} ({db['DBInstanceStatus']})")

Why Paginators Beat Manual NextToken

# ❌ WRONG — silently misses resources beyond the first page
response = ec2.describe_instances()   # Returns ONLY the first page!
instances = response["Reservations"]  # May be incomplete

# ❌ FRAGILE — manual but verbose and easy to forget
response = ec2.describe_instances()
all_reservations = response["Reservations"]
while "NextToken" in response:
    response = ec2.describe_instances(NextToken=response["NextToken"])
    all_reservations.extend(response["Reservations"])

# ✅ CORRECT — paginator handles everything
paginator = ec2.get_paginator("describe_instances")
for page in paginator.paginate():
    for reservation in page["Reservations"]:
        process(reservation)

# ✅ BEST — use our generic utility
for reservation in paginate_all(ec2, "describe_instances", "Reservations"):
    process(reservation)

Key Commands Explained

Command	What it does
`client.get_paginator("method_name")`	Returns a boto3 Paginator for the given API method
`paginator.paginate(**kwargs)`	Returns a PageIterator — yields one page dict per iteration
`page.get(result_key, [])`	Extracts the result list from each page — defaults to `[]` if key absent
`yield from iterable`	Delegates iteration to the inner iterable (lazy generator composition)
`response.get("NextToken")`	Returns `None` if no more pages (loop terminates)
`getattr(client, method_name)`	Gets a method by name string — allows dynamic method dispatch
`PaginationConfig={"MaxItems": 500}`	Limit total results across all pages

PaginationConfig Options

# Limit total results (useful for sampling or testing)
paginator = client.get_paginator("list_objects_v2")
for page in paginator.paginate(
    Bucket="my-bucket",
    PaginationConfig={
        "MaxItems":  100,    # Stop after 100 total items
        "PageSize":  50,     # 50 items per API call
        "StartingToken": None,  # Resume from a specific token
    }
):
    process(page["Contents"])

🔍 Line-by-Line Code Walkthrough

Imports

Line	Why It’s Used
`import boto3`	AWS SDK — needed for creating service clients
`from typing import Generator, Any, Optional`	Type hints. `Generator[Any, None, None]` declares that a function returns a generator that yields `Any` type values

`paginate_all(client, method_name, result_key, **kwargs)`

def paginate_all(client, method_name: str, result_key: str, **kwargs) -> Generator[Any, None, None]:

Part	Explanation
`client`	Any boto3 service client (e.g., `boto3.client("ec2")`, `boto3.client("s3")`)
`method_name: str`	The API method name as a string (e.g., `"describe_instances"`, `"list_objects_v2"`) — allows this function to work with ANY paginated API
`result_key: str`	The key in each page response that contains the list of results (e.g., `"Reservations"`, `"Contents"`, `"Users"`)
`**kwargs`	Any additional arguments to pass through to the underlying API (e.g., `Filters=[...]`, `Bucket="my-bucket"`)
`-> Generator[Any, None, None]`	Return type hint: this is a generator function. `Any` = items can be any type. First `None` = no values are sent into the generator. Second `None` = no return value

try:
    paginator = client.get_paginator(method_name)
    for page in paginator.paginate(**kwargs):
        yield from page.get(result_key, [])

Line	Explanation
`client.get_paginator(method_name)`	Dynamically creates a Paginator for the named method. boto3 knows which response key to use for `NextToken` automatically
`paginator.paginate(**kwargs)`	Returns a `PageIterator`. Each iteration yields one full API response dict (one page)
`yield from page.get(result_key, [])`	`yield from` delegates iteration — yields each item in the list one at a time to the caller. This is the key to making this a lazy generator (memory efficient). `page.get(result_key, [])` defaults to `[]` if the key is absent (some pages may have no results)

except Exception:
    yield from _manual_paginate(client, method_name, result_key, **kwargs)

Line	Explanation
`except Exception`	Catches `OperationNotPageable` (raised when the method doesn’t have a built-in paginator) and any other error from `get_paginator`
`yield from _manual_paginate(...)`	Falls back to the manual NextToken implementation. `yield from` inside a `try/except` is valid in Python 3.3+

`_manual_paginate(client, method_name, result_key, **kwargs)`

method = getattr(client, method_name)
while True:
    response = method(**kwargs)
    yield from response.get(result_key, [])
    next_token = response.get("NextToken") or response.get("Marker")
    if not next_token:
        break
    if "NextToken" in response:
        kwargs["NextToken"] = next_token
    else:
        kwargs["Marker"] = next_token

Line	Explanation
`getattr(client, method_name)`	Gets a method by name string. `getattr(ec2_client, "describe_instances")` returns the `describe_instances` method object. This enables dynamic dispatch
`while True:`	Infinite loop — continues until we `break` when there are no more pages
`response = method(**kwargs)`	Calls the API. `**kwargs` passes all accumulated parameters including any pagination tokens
`yield from response.get(result_key, [])`	Yields all items from this page to the caller
`response.get("NextToken") or response.get("Marker")`	Tries `NextToken` first (modern APIs), then `Marker` (older APIs like IAM use `Marker`). The `or` ensures we get whichever is present
`if not next_token: break`	`None` (key absent) or `""` (empty string) both evaluate to falsy — stops the loop
`kwargs["NextToken"] = next_token`	Injects the continuation token into kwargs so the next `method(**kwargs)` call fetches the next page

`paginate_all_list(...)`

def paginate_all_list(client, method_name, result_key, **kwargs) -> list:
    return list(paginate_all(client, method_name, result_key, **kwargs))

Line	Explanation
`list(paginate_all(...))`	Consumes the entire generator and stores all results in a list. Use when you need random access (`results[5]`), length check (`len(results)`), or need to iterate multiple times
When to prefer generator vs list?	Generator = memory efficient, process as data arrives. List = needed when you must check `len()`, sort, or iterate multiple times

`paginate_with_callback(client, method_name, result_key, callback, **kwargs)`

count = 0
for item in paginate_all(client, method_name, result_key, **kwargs):
    callback(item)
    count += 1
return count

Line	Explanation
`callback(item)`	Calls the user-provided function with each item. The callback can write to a database, file, or process data without buffering everything
`count += 1`	Tracks total items processed. Returned for logging or reporting
Use case	Streaming processing — e.g., processing 1 million S3 objects without loading all their metadata into RAM first

Usage Example — Why `paginate_all` Instead of Direct Call

# ❌ WRONG — silently misses resources beyond the first page
response = ec2.describe_instances()
instances = response["Reservations"]   # May be incomplete!

# ✅ CORRECT — never misses a result
for reservation in paginate_all(ec2, "describe_instances", "Reservations"):
    process(reservation)

Point	Explanation
The silent failure danger	`describe_instances()` without pagination returns the first page only (up to 1000 instances). In a small test account it looks correct. In production it silently drops instances
No error is raised	AWS doesn’t error when there are more results — it just silently omits them. The response includes `"NextToken"` but if you don’t check for it, you never know more data exists

Generic boto3 Pagination Utility — Handle All Paginated AWS APIs

Problem Statement

Which APIs Are Paginated?

Complete Utility

Why Paginators Beat Manual NextToken

Key Commands Explained

PaginationConfig Options

🔍 Line-by-Line Code Walkthrough

Imports

`paginate_all(client, method_name, result_key, **kwargs)`

`_manual_paginate(client, method_name, result_key, **kwargs)`

`paginate_all_list(...)`

`paginate_with_callback(client, method_name, result_key, callback, **kwargs)`

Usage Example — Why `paginate_all` Instead of Direct Call

Have a similar scenario to share?

Related Scenarios

Auto Stop/Start EC2 Instances Using Schedule Tags with Python

Production-Grade Python Scripts for AWS — Best Practices & Patterns

boto3 Retry Decorator with Exponential Backoff for ThrottlingException

Generic boto3 Pagination Utility — Handle All Paginated AWS APIs

Problem Statement

Which APIs Are Paginated?

Complete Utility

Why Paginators Beat Manual NextToken

Key Commands Explained

PaginationConfig Options

🔍 Line-by-Line Code Walkthrough

Imports

paginate_all(client, method_name, result_key, **kwargs)

_manual_paginate(client, method_name, result_key, **kwargs)

paginate_all_list(...)

paginate_with_callback(client, method_name, result_key, callback, **kwargs)

Usage Example — Why paginate_all Instead of Direct Call

Have a similar scenario to share?

Related Scenarios

Auto Stop/Start EC2 Instances Using Schedule Tags with Python

Production-Grade Python Scripts for AWS — Best Practices & Patterns

boto3 Retry Decorator with Exponential Backoff for ThrottlingException

`paginate_all(client, method_name, result_key, **kwargs)`

`_manual_paginate(client, method_name, result_key, **kwargs)`

`paginate_all_list(...)`

`paginate_with_callback(client, method_name, result_key, callback, **kwargs)`

Usage Example — Why `paginate_all` Instead of Direct Call