A complete guide to Kubernetes Storage — covering Volumes, Persistent Volumes, PVCs, StorageClasses, StatefulSets, Cloud Storage, CSI, Backup strategies, and hands-on labs with real-world troubleshooting examples.

Module 6 — Kubernetes Storage

Introduction to Kubernetes Storage
Kubernetes Volumes
Persistent Storage Concepts
Storage Classes
Stateful Applications
Cloud Storage Integration
CSI (Container Storage Interface)
Backup and Data Protection
Troubleshooting Kubernetes Storage
Hands-On Labs

1. Introduction to Kubernetes Storage

Why Storage is Required in Kubernetes

Kubernetes is a container orchestration platform where workloads are designed to be distributed, scalable, and self-healing. Containers, by their very nature, are stateless and ephemeral — when a container restarts or is rescheduled to another node, everything written inside the container’s filesystem is permanently lost.

This creates a fundamental challenge for real-world applications:

Without Storage:                     With Storage:
┌────────────────────┐               ┌────────────────────┐
│  Pod Crashes       │               │  Pod Crashes       │
│  ┌──────────────┐  │               │  ┌──────────────┐  │
│  │  Container   │  │               │  │  Container   │  │
│  │  /data ──X   │  │               │  │  /data ──────┼──┼──▶ Volume
│  └──────────────┘  │               │  └──────────────┘  │    (persists)
│   Data is LOST     │               │   Data SURVIVES    │
└────────────────────┘               └────────────────────┘

Storage is required for:

Requirement	Example
Data Persistence	Database files surviving Pod restarts
Data Sharing	Multiple Pods reading the same config file
Configuration Injection	Mounting ConfigMaps/Secrets as files
Stateful Workloads	MySQL, PostgreSQL, MongoDB, Kafka, Elasticsearch
Log Aggregation	Centralising container logs on a shared volume
Cache Persistence	Redis RDB/AOF files surviving restarts
ML Model Storage	Large model files shared across inference Pods

Stateless vs Stateful Applications

Understanding this distinction is the foundation of Kubernetes storage design.

┌──────────────────────────────────────────────────────────────────────┐
│                    STATELESS APPLICATION                             │
│                                                                      │
│  Request ─▶ Pod A ─▶ Response     All Pods are identical            │
│  Request ─▶ Pod B ─▶ Response     Any Pod can serve any request     │
│  Request ─▶ Pod C ─▶ Response     Pod death = zero data loss        │
│                                                                      │
│  Examples: REST APIs, Web servers, Microservices, Nginx              │
└──────────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────────┐
│                    STATEFUL APPLICATION                              │
│                                                                      │
│  Client ─▶ Pod A (primary DB)    Pods have unique identities        │
│  Client ─▶ Pod B (replica DB)    Each Pod has its own storage       │
│                                  Pod order and names matter          │
│                                  Pod death = must restore state      │
│                                                                      │
│  Examples: MySQL, PostgreSQL, Kafka, Zookeeper, Elasticsearch        │
└──────────────────────────────────────────────────────────────────────┘

Characteristic	Stateless	Stateful
Data persistence	Not required	Critical
Pod identity	Interchangeable	Unique (pod-0, pod-1…)
Scaling	Simple horizontal	Complex (order matters)
Storage	Ephemeral or none	Persistent volumes
Kubernetes resource	Deployment	StatefulSet
Failure impact	Replace immediately	Must maintain state

Ephemeral Storage in Containers

Every container gets a writable layer on top of its image. This writable layer is:

Tied to the container lifecycle — gone when the container is removed
Not shared between containers (even in the same Pod)
Local to the node — data cannot follow a rescheduled Pod
Counted against node disk — excessive writes can evict Pods

Container Filesystem Layers (Union Mount):
┌─────────────────────────────────────────┐
│  Writable Layer (ephemeral)             │  ← Container writes here
│  /app/logs, /tmp, /var/cache            │     LOST on container death
├─────────────────────────────────────────┤
│  Image Layer 3 (read-only)             │
│  /app/config.json                       │
├─────────────────────────────────────────┤
│  Image Layer 2 (read-only)             │
│  /usr/local/bin/node                    │
├─────────────────────────────────────────┤
│  Image Layer 1 (read-only)             │
│  /etc, /usr, /lib                       │
└─────────────────────────────────────────┘

Consequences of relying on ephemeral storage:

# Demonstrate data loss — run a container, write data, kill it
kubectl run ephemeral-demo --image=busybox -it --rm -- sh

# Inside container:
echo "Important data" > /tmp/mydata.txt
cat /tmp/mydata.txt
# Important data

# Now restart the pod (ctrl+d to exit, pod auto-deletes with --rm)
# If you run it again → /tmp/mydata.txt is gone!

Kubernetes provides Volumes to overcome this limitation.

2. Kubernetes Volumes

What are Volumes?

A Kubernetes Volume is a directory accessible to containers in a Pod. Unlike the container’s ephemeral writable layer, a Volume:

Survives container restarts within the same Pod (data persists as long as the Pod exists)
Can be shared between multiple containers in the same Pod
Supports many backends — local disk, NFS, cloud disks, ConfigMaps, Secrets, etc.
Is declared in the Pod spec — mounted into containers at specified paths

Pod Spec Structure:
┌─────────────────────────────────────────────────────────────────┐
│  Pod                                                            │
│  ┌──────────────────────┐   ┌──────────────────────┐           │
│  │  Container A          │   │  Container B          │           │
│  │  volumeMounts:        │   │  volumeMounts:        │           │
│  │    - /data → vol1    │   │    - /shared → vol1  │           │
│  └──────────────────────┘   └──────────────────────┘           │
│                                    │                            │
│  volumes:                          │                            │
│    - name: vol1 ───────────────────┘                           │
│      emptyDir: {}                                               │
└─────────────────────────────────────────────────────────────────┘

Volume Types Overview:

Type	Persists Pod restart?	Persists Pod deletion?	Shared across Pods?
`emptyDir`	✅ Yes	❌ No	❌ No
`hostPath`	✅ Yes	✅ Yes (on same node)	❌ No
`configMap`	✅ Yes	✅ Yes	✅ Yes (read-only)
`secret`	✅ Yes	✅ Yes	✅ Yes (read-only)
`persistentVolumeClaim`	✅ Yes	✅ Yes	Depends on AccessMode
`nfs`	✅ Yes	✅ Yes	✅ Yes

EmptyDir Volume

An emptyDir volume is created empty when a Pod is assigned to a Node. It exists as long as the Pod is running on that node. All containers in the Pod share the same emptyDir and can read/write to it.

Lifecycle: Pod scheduled → emptyDir created → Pod deleted → emptyDir deleted

Use Cases:

Scratch space for disk-based merge sort
Checkpoint files for long computations
Sharing files between a main container and a sidecar (e.g., log processor)
Cache directory shared between containers

# emptydir-example.yaml
apiVersion: v1
kind: Pod
metadata:
  name: emptydir-demo
spec:
  containers:
    - name: writer
      image: busybox
      command: ["/bin/sh", "-c"]
      args:
        - |
          while true; do
            echo "$(date): Writing data" >> /shared/output.log
            sleep 5
          done
      volumeMounts:
        - name: shared-data
          mountPath: /shared

    - name: reader
      image: busybox
      command: ["/bin/sh", "-c"]
      args:
        - |
          while true; do
            echo "=== Reading shared log ==="
            cat /shared/output.log 2>/dev/null || echo "File not yet created"
            sleep 10
          done
      volumeMounts:
        - name: shared-data
          mountPath: /shared   # Same volume, same path

  volumes:
    - name: shared-data
      emptyDir: {}             # Empty directory, lives with the Pod

EmptyDir with Memory-Backed Storage:

volumes:
  - name: cache-volume
    emptyDir:
      medium: Memory          # Stored in RAM (tmpfs) — faster, but counts against memory limit
      sizeLimit: 512Mi        # Limit size to 512 MB

Test it:

kubectl apply -f emptydir-example.yaml

# Check writer is producing data
kubectl exec emptydir-demo -c writer -- cat /shared/output.log

# Check reader can see the same data
kubectl exec emptydir-demo -c reader -- cat /shared/output.log

# Restart the writer container — data survives!
kubectl exec emptydir-demo -c writer -- kill 1
# (container restarts)
kubectl exec emptydir-demo -c writer -- cat /shared/output.log
# Previous data is still there ← emptyDir survived container restart

# Delete the Pod — data is lost
kubectl delete pod emptydir-demo

HostPath Volume

A hostPath volume mounts a file or directory from the host Node’s filesystem into the Pod. The data persists beyond the Pod’s lifetime but is tied to the specific node.

Use Cases:

Accessing Docker socket (/var/run/docker.sock) for container monitoring tools
Reading node-level log files (/var/log)
DaemonSet workloads that need node-local data (log collectors like Fluentd)
Development/testing where you need node-persistent storage

# hostpath-example.yaml
apiVersion: v1
kind: Pod
metadata:
  name: hostpath-demo
spec:
  containers:
    - name: app
      image: nginx:1.25
      volumeMounts:
        - name: host-logs
          mountPath: /var/log/nginx-host    # Inside container

        - name: docker-sock
          mountPath: /var/run/docker.sock   # Docker socket access

  volumes:
    - name: host-logs
      hostPath:
        path: /tmp/k8s-logs               # Path on the HOST node
        type: DirectoryOrCreate           # Create if it doesn't exist

    - name: docker-sock
      hostPath:
        path: /var/run/docker.sock
        type: Socket                      # Only if it's a socket file

HostPath Type Values:

Type	Behaviour
`""` (empty)	No checks — path is used as-is
`DirectoryOrCreate`	Create directory if not exists
`Directory`	Directory must already exist
`FileOrCreate`	Create file if not exists
`File`	File must already exist
`Socket`	Unix socket must exist
`BlockDevice`	Block device must exist

⚠️ Security Warning: hostPath gives containers access to the node filesystem. It should be used sparingly and avoided in multi-tenant clusters. Prefer PersistentVolumes for data persistence.

ConfigMap Volume

Mounts a ConfigMap as a directory of files inside a container. Each key in the ConfigMap becomes a filename; the value becomes the file content.

Use Cases:

Injecting application configuration files (nginx.conf, app.properties)
Providing environment-specific settings without rebuilding images
Storing non-sensitive configuration that can be updated at runtime

# 1. Create a ConfigMap with config file contents
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: default
data:
  app.properties: |
    server.port=8080
    db.host=postgres-service
    db.port=5432
    log.level=INFO
    cache.ttl=3600

  nginx.conf: |
    server {
        listen 80;
        location / {
            proxy_pass http://backend-service:3000;
            proxy_set_header Host $host;
        }
    }

  feature-flags.json: |
    {
      "newDashboard": true,
      "betaCheckout": false,
      "darkMode": true
    }
---
# 2. Mount ConfigMap as volume in a Pod
apiVersion: v1
kind: Pod
metadata:
  name: configmap-volume-demo
spec:
  containers:
    - name: app
      image: mycompany/backend:v1.0
      volumeMounts:
        - name: config-volume
          mountPath: /etc/app-config      # All ConfigMap keys appear as files here
          readOnly: true

        - name: nginx-config
          mountPath: /etc/nginx/conf.d
          readOnly: true

  volumes:
    - name: config-volume
      configMap:
        name: app-config                  # Reference the ConfigMap
        items:                            # Optional: select specific keys
          - key: app.properties
            path: application.properties  # Rename the file on mount

    - name: nginx-config
      configMap:
        name: app-config
        items:
          - key: nginx.conf
            path: default.conf

Verify the mount:

kubectl exec configmap-volume-demo -- ls /etc/app-config
# application.properties

kubectl exec configmap-volume-demo -- cat /etc/app-config/application.properties
# server.port=8080
# db.host=postgres-service
# ...

# ConfigMap updates propagate to the volume automatically (within ~1 minute)
kubectl edit configmap app-config
# Change log.level=DEBUG
# After ~60s:
kubectl exec configmap-volume-demo -- cat /etc/app-config/application.properties
# log.level=DEBUG  ← Updated without Pod restart!

Secret Volume

Mounts a Kubernetes Secret as files into a container. Similar to ConfigMap volumes but the data is base64-decoded and the volume is backed by tmpfs (in-memory) for security — secrets never touch the node disk.

Use Cases:

TLS certificates and private keys
Database passwords
API keys and tokens
SSH private keys

# 1. Create a Secret
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: default
type: Opaque
data:
  # Values must be base64 encoded: echo -n "value" | base64
  db-password: cGFzc3dvcmQxMjM=        # "password123"
  api-key: c2VjcmV0LWFwaS1rZXktMTIz   # "secret-api-key-123"
stringData:
  # stringData is auto-encoded by Kubernetes — no manual base64 needed
  db-url: "postgresql://user:password123@postgres:5432/mydb"
---
# 2. Create TLS Secret from files
# kubectl create secret tls tls-secret \
#   --cert=path/to/tls.crt \
#   --key=path/to/tls.key

# 3. Mount Secret as volume
apiVersion: v1
kind: Pod
metadata:
  name: secret-volume-demo
spec:
  containers:
    - name: app
      image: mycompany/backend:v1.0
      volumeMounts:
        - name: secret-volume
          mountPath: /etc/secrets
          readOnly: true

        - name: tls-certs
          mountPath: /etc/ssl/app
          readOnly: true

  volumes:
    - name: secret-volume
      secret:
        secretName: app-secrets
        defaultMode: 0400           # Restrictive file permissions (owner read-only)

    - name: tls-certs
      secret:
        secretName: tls-secret
        items:
          - key: tls.crt
            path: server.crt
          - key: tls.key
            path: server.key
            mode: 0400              # Extra-restrictive for private key

Verify and inspect:

kubectl exec secret-volume-demo -- ls -la /etc/secrets
# total 0
# -r-------- 1 root root 11 Jan 20 10:00 db-password
# -r-------- 1 root root 23 Jan 20 10:00 api-key
# -r-------- 1 root root 58 Jan 20 10:00 db-url

kubectl exec secret-volume-demo -- cat /etc/secrets/db-password
# password123  ← Already base64-decoded by Kubernetes!

# Secret files are stored in memory (tmpfs) — not on disk
kubectl exec secret-volume-demo -- df /etc/secrets
# tmpfs  ← Confirms in-memory storage

3. Persistent Storage Concepts

What is Persistent Volume (PV)?

A Persistent Volume (PV) is a piece of storage in the cluster that has been provisioned by an administrator (or dynamically by a StorageClass). It is a cluster-level resource — not tied to any namespace or Pod — and represents physical storage on a disk, NAS, cloud volume, NFS share, etc.

Persistent Volume = The actual storage resource
                    (like a hard drive in the cluster)

┌─────────────────────────────────────────────────────────────────┐
│                     Kubernetes Cluster                          │
│                                                                 │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │  Persistent Volume (PV) — Cluster Scoped                 │  │
│  │                                                          │  │
│  │  Name:         pv-postgres-data                          │  │
│  │  Capacity:     50Gi                                      │  │
│  │  AccessMode:   ReadWriteOnce                             │  │
│  │  StorageClass: fast-ssd                                  │  │
│  │  ReclaimPolicy: Retain                                   │  │
│  │  Source:       AWS EBS vol-0a1b2c3d4e5f                  │  │
│  └──────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Example PV manifest:

# persistent-volume.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-postgres-data
  labels:
    type: ssd
    environment: production
spec:
  capacity:
    storage: 50Gi                   # Total size of this volume

  accessModes:
    - ReadWriteOnce                 # Only one Node can mount read-write at a time

  persistentVolumeReclaimPolicy: Retain   # Keep data after PVC is deleted

  storageClassName: fast-ssd        # Must match PVC's storageClassName

  # Storage backend — choose ONE:

  # Option A: Local path (for testing/on-premise)
  hostPath:
    path: /mnt/data/postgres

  # Option B: NFS
  # nfs:
  #   server: nfs-server.company.com
  #   path: /exports/postgres-data

  # Option C: AWS EBS (static provisioning)
  # awsElasticBlockStore:
  #   volumeID: vol-0a1b2c3d4e5f6789
  #   fsType: ext4

What is Persistent Volume Claim (PVC)?

A Persistent Volume Claim (PVC) is a request for storage made by a user or application. It’s namespace-scoped and acts like a storage “order form” — specifying how much storage is needed, what access mode is required, and optionally which StorageClass to use.

Persistent Volume Claim = The storage request
                          (like ordering a hard drive)

Namespace: my-app
┌──────────────────────────────────────────────────────────────┐
│  PVC: pvc-postgres-claim                                     │
│  Request: 20Gi                                               │
│  AccessMode: ReadWriteOnce                                   │
│  StorageClass: fast-ssd                                      │
│  Status: Bound → bound to pv-postgres-data                  │
└──────────────────────────────────────────────────────────────┘

Example PVC manifest:

# persistent-volume-claim.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-postgres-claim
  namespace: production             # PVCs are namespace-scoped
spec:
  accessModes:
    - ReadWriteOnce                 # Must be compatible with the PV

  resources:
    requests:
      storage: 20Gi                 # Request 20Gi (PV must offer >= 20Gi)

  storageClassName: fast-ssd        # Must match PV's storageClassName
                                    # Or omit for dynamic provisioning

  # Optional: select a specific PV by labels
  selector:
    matchLabels:
      environment: production

How PV and PVC Work Together

The relationship between PV and PVC follows a bind-and-use lifecycle:

┌─────────────────────────────────────────────────────────────────────┐
│                      PV / PVC Lifecycle                             │
│                                                                     │
│  1. PROVISION                                                       │
│     Admin creates PV  ──▶  PV Status: Available                    │
│     (or StorageClass auto-provisions)                               │
│                                                                     │
│  2. BIND                                                            │
│     User creates PVC  ──▶  Control plane matches PVC to PV         │
│     PVC Status: Bound ◀──  PV Status: Bound                        │
│                                                                     │
│  3. USE                                                             │
│     Pod references PVC ──▶ Volume mounted into container           │
│     Data read/written ──▶  persists to backend storage             │
│                                                                     │
│  4. RELEASE                                                         │
│     Pod deleted ──▶  PVC still exists (data safe)                  │
│     PVC deleted ──▶  PV Status: Released                           │
│                                                                     │
│  5. RECLAIM (based on ReclaimPolicy)                                │
│     Retain  ──▶  PV stays, data intact, manual cleanup needed      │
│     Delete  ──▶  PV and underlying storage deleted automatically   │
│     Recycle ──▶  Data wiped, PV made Available again (deprecated)  │
└─────────────────────────────────────────────────────────────────────┘

Using a PVC in a Pod:

# pod-with-pvc.yaml
apiVersion: v1
kind: Pod
metadata:
  name: postgres-pod
  namespace: production
spec:
  containers:
    - name: postgres
      image: postgres:15
      env:
        - name: POSTGRES_DB
          value: "myapp"
        - name: POSTGRES_USER
          value: "admin"
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
      ports:
        - containerPort: 5432
      volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data  # Where PostgreSQL stores data

  volumes:
    - name: postgres-storage
      persistentVolumeClaim:
        claimName: pvc-postgres-claim          # Reference the PVC

Verify the binding:

# Check PV status
kubectl get pv pv-postgres-data
# NAME                STATUS   CLAIM                          STORAGECLASS   AGE
# pv-postgres-data    Bound    production/pvc-postgres-claim  fast-ssd       5m

# Check PVC status
kubectl get pvc pvc-postgres-claim -n production
# NAME                  STATUS   VOLUME              CAPACITY   ACCESS MODES
# pvc-postgres-claim    Bound    pv-postgres-data    50Gi       RWO

# Check Pod is using the PVC
kubectl describe pod postgres-pod -n production | grep -A5 Volumes
# Volumes:
#   postgres-storage:
#     Type:       PersistentVolumeClaim
#     ClaimName:  pvc-postgres-claim
#     ReadOnly:   false

Access Modes in PV

Access modes define how many Nodes can mount the volume simultaneously and in what mode.

Access Mode	Short	Description	Example Storage
`ReadWriteOnce`	RWO	One Node mounts read-write	AWS EBS, GCE PD, Azure Disk
`ReadOnlyMany`	ROX	Many Nodes mount read-only	NFS, CephFS
`ReadWriteMany`	RWX	Many Nodes mount read-write	NFS, CephFS, Azure Files
`ReadWriteOncePod`	RWOP	One Pod mounts read-write (k8s 1.22+)	CSI volumes

ReadWriteOnce (RWO):                ReadWriteMany (RWX):
┌────────────┐                      ┌────────────┐
│   Node 1   │◀─── Mounted RW       │   Node 1   │◀─── Mounted RW
│  (Pod A)   │                      │  (Pod A)   │
└────────────┘     ┌──────────┐     └────────────┘
                   │   PV     │
┌────────────┐     └──────────┘     ┌────────────┐
│   Node 2   │                      │   Node 2   │◀─── Mounted RW
│   (empty)  │ ✗ Cannot mount       │  (Pod B)   │
└────────────┘                      └────────────┘

⚠️ Important: Access modes describe what the storage supports, not what is currently active. A PV with RWX can still be mounted by just one node — but it allows many.

Reclaim Policies

When a PVC is deleted, the ReclaimPolicy on the PV determines what happens to the data:

spec:
  persistentVolumeReclaimPolicy: Retain   # or Delete, or Recycle

Policy	What Happens to PV	What Happens to Data	Use Case
Retain	PV stays in `Released` state	Data intact — manual admin action needed	Production databases
Delete	PV deleted automatically	Underlying storage deleted	Dynamic provisioning, cloud disks
Recycle	PV scrubbed (`rm -rf`) and made `Available`	Data wiped	Deprecated — use dynamic provisioning

Retain workflow (most important for production):

# 1. Delete PVC
kubectl delete pvc pvc-postgres-claim -n production

# 2. Check PV status — it becomes Released
kubectl get pv pv-postgres-data
# STATUS: Released  ← PV is not re-usable yet (claimRef still set)

# 3. Manual cleanup — remove claimRef to make PV Available again
kubectl patch pv pv-postgres-data -p '{"spec":{"claimRef": null}}'

# 4. Now PV is Available for a new PVC
kubectl get pv pv-postgres-data
# STATUS: Available

4. Storage Classes

What is StorageClass?

A StorageClass defines a class or tier of storage and how it should be dynamically provisioned. Think of it like a storage catalogue — administrators create StorageClasses describing different storage offerings (fast SSD, slow HDD, replicated NFS, etc.), and users reference them in PVCs without needing to know the underlying infrastructure.

Without StorageClass:            With StorageClass:
Admin manually creates PVs  ─▶  PVs auto-created on demand
User waits for admin         ─▶  User creates PVC → PV auto-provisioned
Static, slow process         ─▶  Dynamic, instant storage

StorageClass anatomy:

# storage-class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"  # Not the default

provisioner: kubernetes.io/aws-ebs      # Which plugin provisions the storage
                                         # or ebs.csi.aws.com for CSI

parameters:
  type: gp3                              # AWS EBS volume type
  iops: "3000"
  throughput: "125"
  fsType: ext4
  encrypted: "true"
  kmsKeyId: "arn:aws:kms:us-east-1:..."

reclaimPolicy: Delete                    # Delete PV when PVC is deleted
allowVolumeExpansion: true              # Allow PVC resize (kubectl edit pvc)
volumeBindingMode: WaitForFirstConsumer  # Delay PV creation until Pod is scheduled
                                         # (vs Immediate — create PV on PVC creation)
mountOptions:
  - debug
  - discard

Dynamic Provisioning

Dynamic provisioning automatically creates a PV when a PVC is created that references a StorageClass. No admin pre-provisioning required.

Dynamic Provisioning Flow:

User creates PVC with StorageClass "fast-ssd"
         │
         ▼
Control plane calls StorageClass provisioner plugin
         │
         ▼
Provisioner creates the physical volume
(e.g., calls AWS API to create a new EBS volume)
         │
         ▼
PV is automatically created in Kubernetes
         │
         ▼
PV is bound to PVC automatically
         │
         ▼
Pod mounts the PVC — storage ready!

Dynamic provisioning example:

# 1. StorageClass (admin sets up once)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard-ssd
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
# 2. PVC (user creates — no PV needed!)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard-ssd   # References the StorageClass
---
# 3. Pod using the dynamically provisioned PVC
apiVersion: v1
kind: Pod
metadata:
  name: app-with-dynamic-storage
spec:
  containers:
    - name: app
      image: nginx
      volumeMounts:
        - name: app-data
          mountPath: /data
  volumes:
    - name: app-data
      persistentVolumeClaim:
        claimName: dynamic-pvc

# Apply PVC — triggers dynamic provisioning
kubectl apply -f dynamic-pvc.yaml

# Watch the PV get created automatically
kubectl get pv -w
# NAME                                       CAPACITY   STATUS
# pvc-a1b2c3d4-e5f6-7890-abcd-ef1234567890  10Gi       Bound

# The PV was auto-created by the provisioner!
kubectl get pvc dynamic-pvc
# STATUS: Bound  ← Ready immediately

Default StorageClass

When a PVC doesn’t specify a storageClassName, Kubernetes uses the default StorageClass (if one is configured).

# View all StorageClasses and find the default
kubectl get storageclass
# NAME                 PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      DEFAULT
# standard (default)   rancher.io/local-path   Delete          WaitForFirstConsumer   ← DEFAULT
# fast-ssd             ebs.csi.aws.com         Delete          WaitForFirstConsumer

# Set a StorageClass as default
kubectl patch storageclass fast-ssd \
  -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# Unset the old default
kubectl patch storageclass standard \
  -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"false"}}}'

PVC without storageClassName (uses default):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: auto-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  # storageClassName omitted → uses default StorageClass

Storage Provisioners

A provisioner is the plugin that handles the actual storage creation. The provisioner field in StorageClass tells Kubernetes which plugin to call.

Provisioner	Storage Backend	Environment
`ebs.csi.aws.com`	AWS EBS	AWS
`disk.csi.azure.com`	Azure Managed Disk	Azure
`pd.csi.storage.gke.io`	Google Persistent Disk	GCP
`file.csi.azure.com`	Azure Files (NFS)	Azure
`nfs.csi.k8s.io`	NFS Server	On-premise/Any
`rancher.io/local-path`	Local node path	Local/Dev
`docker.io/hostpath`	Host path	Docker Desktop
`rook-ceph.rbd.csi.ceph.com`	Ceph RBD	On-premise
`linstor.csi.linbit.com`	LINSTOR/DRBD	On-premise

5. Stateful Applications

Introduction to StatefulSets

A StatefulSet is a Kubernetes workload resource designed for stateful applications. Unlike Deployments where all Pods are interchangeable, StatefulSets provide:

Deployment Pods:                    StatefulSet Pods:
┌─────────────────────────┐         ┌─────────────────────────┐
│ web-7d9f8c-abc12        │         │ mysql-0  (Primary)      │
│ web-7d9f8c-def34        │         │ mysql-1  (Replica)      │
│ web-7d9f8c-ghi56        │         │ mysql-2  (Replica)      │
│                         │         │                         │
│ Random names            │         │ Stable, ordered names   │
│ Any order               │         │ Created 0→1→2           │
│ Any node                │         │ Deleted 2→1→0           │
│ Shared or no storage    │         │ Each has own PVC        │
└─────────────────────────┘         └─────────────────────────┘

StatefulSet guarantees:

Stable network identities — <statefulset>-<ordinal> (mysql-0, mysql-1)
Stable storage — Each Pod gets its own PVC that persists across rescheduling
Ordered deployment/scaling — Pods start/stop in a predictable sequence
Ordered rolling updates — Updates proceed in reverse ordinal order

Storage in StatefulSets

Each Pod in a StatefulSet gets its own dedicated PVC — not shared. When a Pod is rescheduled (even to a different node), it reattaches to the same PVC and thus the same data.

StatefulSet Storage Architecture:

mysql-0  ──binds to──▶  pvc-mysql-0  ──▶  PV (50Gi) ──▶  Actual Disk A
mysql-1  ──binds to──▶  pvc-mysql-1  ──▶  PV (50Gi) ──▶  Actual Disk B
mysql-2  ──binds to──▶  pvc-mysql-2  ──▶  PV (50Gi) ──▶  Actual Disk C

If mysql-1 is rescheduled to Node 3:
mysql-1  ──still binds to──▶  pvc-mysql-1  ──▶  PV (50Gi) ──▶  Same Disk B
(data unchanged!)

VolumeClaimTemplates

volumeClaimTemplates in a StatefulSet spec automatically creates a unique PVC for each Pod using a template pattern <template-name>-<pod-name>.

# statefulset-mysql.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: production
spec:
  serviceName: mysql-headless     # Required: headless service for DNS
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      initContainers:
        - name: init-mysql
          image: mysql:8.0
          command:
            - bash
            - "-c"
            - |
              # Assign server-id based on pod ordinal
              [[ $(hostname) =~ -([0-9]+)$ ]] || exit 1
              ordinal=${BASH_REMATCH[1]}
              echo [mysqld] > /mnt/conf.d/server-id.cnf
              echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
          volumeMounts:
            - name: conf
              mountPath: /mnt/conf.d

      containers:
        - name: mysql
          image: mysql:8.0
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: root-password
          ports:
            - containerPort: 3306
          volumeMounts:
            - name: data              # References volumeClaimTemplate name
              mountPath: /var/lib/mysql
            - name: conf
              mountPath: /etc/mysql/conf.d
          readinessProbe:
            exec:
              command: ["mysqladmin", "ping", "-u", "root", "-p$(MYSQL_ROOT_PASSWORD)"]
            initialDelaySeconds: 30
            periodSeconds: 10
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "4Gi"

      volumes:
        - name: conf
          emptyDir: {}

  # This is the key: unique PVC per Pod
  volumeClaimTemplates:
    - metadata:
        name: data                    # Creates: data-mysql-0, data-mysql-1, data-mysql-2
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 50Gi
---
# Headless Service (required for StatefulSet DNS)
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: production
spec:
  clusterIP: None                   # Headless — no virtual IP
  selector:
    app: mysql
  ports:
    - port: 3306

Verify StatefulSet storage:

kubectl apply -f statefulset-mysql.yaml

# Watch pods come up in order (0 → 1 → 2)
kubectl get pods -l app=mysql -w
# mysql-0   1/1   Running  0   30s
# mysql-1   1/1   Running  0   60s
# mysql-2   1/1   Running  0   90s

# Verify PVCs — one per pod, auto-named
kubectl get pvc -n production
# NAME           STATUS  VOLUME            CAPACITY  ACCESS MODES
# data-mysql-0   Bound   pvc-abc123...     50Gi      RWO
# data-mysql-1   Bound   pvc-def456...     50Gi      RWO
# data-mysql-2   Bound   pvc-ghi789...     50Gi      RWO

# DNS for each pod (via headless service):
# mysql-0.mysql-headless.production.svc.cluster.local
# mysql-1.mysql-headless.production.svc.cluster.local
# mysql-2.mysql-headless.production.svc.cluster.local

6. Cloud Storage Integration

AWS EBS with Kubernetes

AWS Elastic Block Store (EBS) provides block storage volumes for AWS EC2 instances. With the EBS CSI driver, Kubernetes can dynamically provision EBS volumes for PVCs.

Architecture:
┌────────────────────────────────────────────────────────┐
│                    AWS EKS Cluster                     │
│                                                        │
│  ┌──────────────────────────┐                          │
│  │  Pod                     │                          │
│  │  ┌────────────────────┐  │                          │
│  │  │ Container           │  │                          │
│  │  │ /var/lib/postgres  ─┼──┼──▶ PVC ──▶ EBS Volume  │
│  │  └────────────────────┘  │         (gp3, 50Gi)     │
│  └──────────────────────────┘                          │
└────────────────────────────────────────────────────────┘

Setup EBS CSI Driver (EKS):

# Install EBS CSI Driver using Helm
helm repo add aws-ebs-csi-driver \
  https://kubernetes-sigs.github.io/aws-ebs-csi-driver

helm install aws-ebs-csi-driver aws-ebs-csi-driver/aws-ebs-csi-driver \
  --namespace kube-system \
  --set controller.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=\
  arn:aws:iam::ACCOUNT:role/AmazonEKS_EBS_CSI_DriverRole

EBS StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-gp3
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer  # Critical for EBS (zone-aware)
allowVolumeExpansion: true

⚠️ EBS Limitation: EBS volumes are ReadWriteOnce only — they can only attach to one EC2 instance at a time. For ReadWriteMany, use EFS (Elastic File System) instead.

Azure Disk Storage

Azure Managed Disks provide block storage for Azure Kubernetes Service (AKS).

# Azure Disk StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-premium-ssd
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_LRS              # Premium SSD locally redundant
  # skuName: StandardSSD_LRS       # Standard SSD
  # skuName: Standard_LRS          # Standard HDD
  kind: Managed
  cachingMode: ReadOnly             # None, ReadOnly, ReadWrite
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Azure Files (ReadWriteMany):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: azure-files-premium
provisioner: file.csi.azure.com
parameters:
  skuName: Premium_LRS
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
mountOptions:
  - dir_mode=0777
  - file_mode=0777
  - uid=0
  - gid=0
  - mfsymlinks
  - cache=strict

Google Persistent Disk

Google Persistent Disks are block storage for Google Kubernetes Engine (GKE).

# GKE StorageClass with pd-ssd
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gce-pd-ssd
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd                      # pd-standard, pd-ssd, pd-balanced, pd-extreme
  replication-type: regional-pd     # For regional redundancy
  disk-encryption-kms-key: projects/.../cryptoKeyVersions/...
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

GKE Filestore (ReadWriteMany):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: filestore-rwx
provisioner: filestore.csi.storage.gke.io
parameters:
  tier: standard                    # standard, premium, enterprise
  network: default
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

NFS Storage

NFS (Network File System) is a popular on-premise storage solution that supports ReadWriteMany access mode — multiple Pods on multiple Nodes can mount the same NFS share simultaneously.

Setup NFS CSI Driver:

# Install NFS CSI Driver
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs \
  --namespace kube-system

NFS StorageClass:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storage
provisioner: nfs.csi.k8s.io
parameters:
  server: nfs-server.company.com    # NFS server hostname or IP
  share: /exports/k8s-volumes       # NFS export path
  subDir: ${pvc.metadata.name}      # Create subdirectory per PVC
reclaimPolicy: Delete
volumeBindingMode: Immediate
mountOptions:
  - nfsvers=4.1
  - hard
  - timeo=600
  - retrans=3

NFS PVC (ReadWriteMany):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-data-pvc
spec:
  accessModes:
    - ReadWriteMany                 # Multiple Pods can write simultaneously
  storageClassName: nfs-storage
  resources:
    requests:
      storage: 100Gi

Multi-Pod NFS usage (shared model):

# Three Pods all writing to the same NFS volume
apiVersion: apps/v1
kind: Deployment
metadata:
  name: log-processors
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: processor
          image: mycompany/log-processor:v1.0
          volumeMounts:
            - name: shared-logs
              mountPath: /shared/logs
      volumes:
        - name: shared-logs
          persistentVolumeClaim:
            claimName: shared-data-pvc    # All 3 pods share the same PVC

7. CSI (Container Storage Interface)

What is CSI?

The Container Storage Interface (CSI) is a standardised API that enables storage vendors to write one driver that works across any container orchestration system (Kubernetes, Mesos, Nomad, etc.) without modifying the orchestrator’s core code.

Before CSI (in-tree plugins):          After CSI:
┌──────────────────────────────┐        ┌──────────────────────────────┐
│     Kubernetes Core          │        │     Kubernetes Core          │
│  ┌────────┐ ┌────────┐       │        │  ┌─────────────────────────┐│
│  │AWS EBS │ │GCE PD  │       │        │  │   CSI Interface (stable) ││
│  │plugin  │ │plugin  │       │        │  └───────────────┬─────────┘│
│  └────────┘ └────────┘       │        └──────────────────┼──────────┘
│  ┌────────┐ ┌────────┐       │                           │
│  │Azure   │ │Ceph    │       │        ┌──────────────────┼──────────┐
│  │plugin  │ │plugin  │       │        │  CSI Driver Pods │          │
│  └────────┘ └────────┘       │        │  ┌─────────┐ ┌──┴──────┐   │
│  (compiled into k8s binary!) │        │  │AWS EBS  │ │GCE PD   │   │
└──────────────────────────────┘        │  │CSI      │ │CSI      │   │
                                        │  └─────────┘ └─────────┘   │
                                        │  (deployed independently!)  │
                                        └────────────────────────────┘

CSI Drivers

A CSI driver consists of Kubernetes-deployed components:

CSI Driver Architecture:
┌────────────────────────────────────────────────────────────┐
│                 CSI Driver Deployment                      │
│                                                            │
│  ┌─────────────────────────────────────────────────────┐  │
│  │  Controller Plugin (Deployment)                     │  │
│  │  - CreateVolume / DeleteVolume                      │  │
│  │  - ControllerPublishVolume / UnpublishVolume        │  │
│  │  - CreateSnapshot / DeleteSnapshot                  │  │
│  └─────────────────────────────────────────────────────┘  │
│                                                            │
│  ┌─────────────────────────────────────────────────────┐  │
│  │  Node Plugin (DaemonSet — runs on every Node)       │  │
│  │  - NodeStageVolume (format/mount on node)           │  │
│  │  - NodePublishVolume (bind-mount into Pod)          │  │
│  │  - NodeUnpublishVolume (unmount from Pod)           │  │
│  └─────────────────────────────────────────────────────┘  │
└────────────────────────────────────────────────────────────┘

Popular CSI Drivers:

Driver	Storage	Install Command
AWS EBS CSI	Amazon EBS	`helm install aws-ebs-csi-driver`
Azure Disk CSI	Azure Managed Disk	Built-in AKS
GCE PD CSI	Google PD	Built-in GKE
NFS CSI	NFS Servers	`helm install csi-driver-nfs`
Rook/Ceph CSI	Ceph cluster	`helm install rook-ceph`
Longhorn	Distributed block	`helm install longhorn`
OpenEBS	Local/Distributed	`helm install openebs`

# List installed CSI drivers in your cluster
kubectl get csidrivers
# NAME                   ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY
# ebs.csi.aws.com        true             false            false
# efs.csi.aws.com        false            false            false
# nfs.csi.k8s.io         false            false            false

Benefits of CSI

Benefit	Description
Vendor-agnostic	One standard for all storage vendors
Out-of-tree drivers	Drivers deployed independently, not compiled into Kubernetes
Independent versioning	Drivers update without Kubernetes upgrades
Rich feature set	Snapshots, cloning, expansion — all standardised
Simpler deprecation	In-tree plugins can be removed without breaking existing drivers
Faster innovation	Vendors release new features faster without waiting for Kubernetes release cycles

8. Backup and Data Protection

Volume Snapshots

Kubernetes VolumeSnapshots allow you to take a point-in-time copy of a PVC. Like StorageClasses for PVCs, VolumeSnapshotClasses define how snapshots are taken.

Setup:

# Install snapshot controller (if not pre-installed)
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/master/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

Create a VolumeSnapshotClass:

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: ebs-vsc
driver: ebs.csi.aws.com
deletionPolicy: Delete            # Delete or Retain the snapshot when VolumeSnapshot is deleted
parameters:
  tagSpecification_1: "backup-by=k8s-snapshot-controller"

Take a snapshot:

# volumesnapshot.yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-backup-2026-01-20
  namespace: production
spec:
  volumeSnapshotClassName: ebs-vsc
  source:
    persistentVolumeClaimName: pvc-postgres-claim   # The PVC to snapshot

kubectl apply -f volumesnapshot.yaml

# Check snapshot status
kubectl get volumesnapshot -n production
# NAME                         READYTOUSE   SOURCEPVC              AGE
# postgres-backup-2026-01-20   true         pvc-postgres-claim     2m

kubectl describe volumesnapshot postgres-backup-2026-01-20 -n production
# Status:
#   Ready To Use:  true
#   Restore Size:  50Gi

Restore from snapshot:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restored
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: ebs-gp3
  dataSource:                               # Restore from snapshot
    name: postgres-backup-2026-01-20
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io

Backup Strategies

Strategy 1: Application-Level Backup (Recommended for Databases)

The most reliable approach for databases — use the database’s own backup tools:

# PostgreSQL logical backup using pg_dump
kubectl exec -it postgres-0 -n production -- \
  pg_dumpall -U postgres | gzip > /backup/postgres-$(date +%Y%m%d).sql.gz

# MySQL logical backup using mysqldump
kubectl exec -it mysql-0 -n production -- \
  mysqldump --all-databases -u root -p"${MYSQL_ROOT_PASSWORD}" | \
  gzip > /backup/mysql-$(date +%Y%m%d).sql.gz

Strategy 2: Volume Snapshot Backup (Block-Level)

# Automated snapshot script (run as CronJob)
kubectl apply -f - <<EOF
apiVersion: batch/v1
kind: CronJob
metadata:
  name: daily-snapshot
  namespace: production
spec:
  schedule: "0 2 * * *"          # Every day at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: snapshot-sa
          containers:
            - name: snapshot-creator
              image: bitnami/kubectl
              command:
                - /bin/sh
                - -c
                - |
                  DATE=$(date +%Y%m%d)
                  kubectl apply -f - <<SNAP
                  apiVersion: snapshot.storage.k8s.io/v1
                  kind: VolumeSnapshot
                  metadata:
                    name: auto-snapshot-\${DATE}
                    namespace: production
                  spec:
                    volumeSnapshotClassName: ebs-vsc
                    source:
                      persistentVolumeClaimName: pvc-postgres-claim
                  SNAP
          restartPolicy: OnFailure
EOF

Strategy 3: Velero — Full Cluster Backup

Velero is the industry-standard tool for backing up entire Kubernetes clusters including resources and volumes:

# Install Velero with AWS S3 backend
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.9.0 \
  --bucket my-k8s-backups \
  --secret-file ./credentials-velero \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1

# Create a backup of a namespace
velero backup create production-backup \
  --include-namespaces production \
  --wait

# Schedule daily backups
velero schedule create daily-backup \
  --schedule="0 1 * * *" \
  --include-namespaces production \
  --ttl 720h              # Keep backups for 30 days

# Restore from backup
velero restore create --from-backup production-backup

Disaster Recovery Basics

Recovery Time Objective (RTO):  How long can the system be down?
Recovery Point Objective (RPO): How much data loss is acceptable?

┌─────────────────────────────────────────────────────────────┐
│  Strategy           │  RPO          │  RTO        │  Cost  │
├─────────────────────┼───────────────┼─────────────┼────────┤
│  Manual backup      │  Hours/Days   │  Hours      │  Low   │
│  Daily snapshots    │  Up to 24h    │  30-60 min  │  Med   │
│  Hourly snapshots   │  Up to 1h     │  15-30 min  │  Med   │
│  Continuous repl.   │  Near-zero    │  Minutes    │  High  │
│  Active-active      │  Zero         │  Seconds    │  High  │
└─────────────────────────────────────────────────────────────┘

Multi-region DR pattern:

# Velero with cross-region replication
velero backup create dr-snapshot \
  --storage-location primary-us-east \
  --volume-snapshot-locations primary-us-east

# Copy backup to secondary region
velero backup download dr-snapshot --output /tmp/dr-backup.tar.gz
# Upload to secondary region S3
aws s3 cp /tmp/dr-backup.tar.gz s3://dr-bucket-us-west/

# Restore in secondary cluster
velero restore create --from-backup dr-snapshot

9. Troubleshooting Kubernetes Storage

Troubleshooting Decision Tree

Storage Problem Reported
        │
        ▼
Is the PVC in Pending or Bound state?
kubectl get pvc
        │
        ├── Pending ──▶  [Problem 1: PVC Pending Issues]
        │
        ├── Bound
        │     │
        │     ▼
        │  Is the Pod Running?
        │  kubectl get pods
        │     │
        │     ├── Pending/ContainerCreating ──▶  [Problem 2: Volume Mount Errors]
        │     │
        │     ├── CrashLoopBackOff
        │     │     │
        │     │     ▼
        │     │  kubectl logs / describe ──▶  [Problem 4: Permission Issues]
        │     │
        │     └── Running but app fails ──▶  Check app-level storage usage
        │
        └── Lost/Failed ──▶  [Problem 3: StorageClass Troubleshooting]

Problem 1: PVC Pending Issues

Symptom:

kubectl get pvc
# NAME           STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
# my-pvc         Pending                                      fast-ssd       5m

Real-Time Diagnosis — Case A: No matching PV (static provisioning):

# Check PVC events
kubectl describe pvc my-pvc
# Events:
#   Warning  FailedBinding  3m  persistentvolume-controller
#   no persistent volumes available for this claim and no storage class is set

# Check available PVs
kubectl get pv
# No resources found.  ← No PVs exist!

# Check if storageClassName matches
kubectl get pvc my-pvc -o jsonpath='{.spec.storageClassName}'
# fast-ssd

kubectl get pv -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.storageClassName}{"\n"}{end}'
# pv-001   slow-hdd   ← StorageClass doesn't match!

Fix:

# Option A: Create a PV with matching storageClassName
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-fast-ssd-001
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: fast-ssd    # ← Must match PVC
  hostPath:
    path: /mnt/fast-ssd-001
EOF

# Option B: Update PVC to use a StorageClass that exists
kubectl patch pvc my-pvc -p '{"spec":{"storageClassName":"slow-hdd"}}'
# Note: This only works if the PVC hasn't been bound yet

Real-Time Diagnosis — Case B: Dynamic provisioning failing:

kubectl describe pvc my-pvc
# Events:
#   Warning  ProvisioningFailed  2m  ebs.csi.aws.com
#   failed to provision volume: UnauthorizedOperation: You are not authorized
#   to perform this operation

# The CSI driver doesn't have IAM permissions!

# Check CSI driver pod status
kubectl get pods -n kube-system | grep ebs-csi
# ebs-csi-controller-xxx   4/6   Running   0   10m
# ← Only 4/6 containers running — something is wrong

kubectl logs -n kube-system ebs-csi-controller-xxx -c csi-provisioner
# Error: failed to assume role: AccessDenied

Fix:

# Add IAM policy to the EBS CSI driver service account
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
  --role-name AmazonEKS_EBS_CSI_DriverRole

# Restart the CSI driver pods
kubectl rollout restart deployment ebs-csi-controller -n kube-system

# Re-check PVC
kubectl get pvc my-pvc
# STATUS: Bound  ← Fixed!

Real-Time Diagnosis — Case C: WaitForFirstConsumer — PVC stays Pending until Pod exists:

kubectl describe pvc my-pvc
# Events:
#   Normal  WaitForFirstConsumer  1m  persistentvolume-controller
#   waiting for first consumer to be created before binding

# This is EXPECTED behaviour for WaitForFirstConsumer binding mode
# The PVC will bind once a Pod that uses it is scheduled
# This is correct — not an error!

# Solution: Create a Pod that uses the PVC
kubectl apply -f pod-with-pvc.yaml
# PVC will bind once the Pod is scheduled

Problem 2: Volume Mount Errors

Symptom:

kubectl get pods
# NAME        READY   STATUS              RESTARTS   AGE
# my-pod      0/1     ContainerCreating   0          3m

Real-Time Diagnosis:

# Check pod events
kubectl describe pod my-pod
# Events:
#   Warning  FailedMount  2m  kubelet
#   MountVolume.SetUp failed for volume "my-pvc" :
#   rpc error: code = Internal desc = Could not attach volume
#   "vol-0abc123" to node "ip-10-0-1-50":
#   attachment of disk "vol-0abc123" failed,
#   current node: "ip-10-0-1-50",
#   attachment node: "ip-10-0-1-99"

# The EBS volume is still attached to a different node!
# This happens when a Pod moves to a new node but the old node
# didn't fully detach the volume.

Fix:

# Step 1: Find which node the volume is still attached to
aws ec2 describe-volumes --volume-ids vol-0abc123 \
  --query 'Volumes[0].Attachments'
# [{"InstanceId": "i-0xyz...", "State": "attached"}]
# ← Still attached to old node!

# Step 2: Force detach from AWS (use with caution!)
aws ec2 detach-volume --volume-id vol-0abc123 --force

# Step 3: Wait and retry
sleep 30
kubectl delete pod my-pod
kubectl apply -f pod-with-pvc.yaml

# Monitor
kubectl get pod my-pod -w
# STATUS: Running  ← Fixed

Symptom 2: FailedMount — Wrong filesystem or corrupted volume:

kubectl describe pod my-pod
# Events:
#   Warning  FailedMount  kubelet
#   MountVolume.MountDevice failed:
#   fsType "xfs" on "/dev/xvdbf": exit status 32
#   stderr: mount: /var/lib/kubelet/plugins/.../mount:
#   wrong fs type, bad option, bad superblock

# Volume was formatted as ext4 but StorageClass requests xfs

Fix:

# Check StorageClass fsType
kubectl get storageclass fast-ssd -o jsonpath='{.parameters.fsType}'
# xfs

# The volume was previously formatted as ext4.
# Options:
# A) Create a new PVC and migrate data
# B) Change StorageClass fsType to match existing volume: ext4

kubectl edit storageclass fast-ssd
# Change: fsType: xfs → fsType: ext4
# Note: This affects new volumes only; existing volumes are unchanged

Problem 3: StorageClass Troubleshooting

Symptom: Dynamic provisioning not working, no PV created.

Real-Time Diagnosis:

# Check if StorageClass exists
kubectl get storageclass
# No resources found.  ← StorageClass missing!

# Or: StorageClass exists but provisioner is wrong
kubectl get storageclass fast-ssd -o yaml | grep provisioner
# provisioner: kubernetes.io/aws-ebs   ← Old in-tree plugin (deprecated)
# Should be: ebs.csi.aws.com

# Check if the provisioner (CSI driver) is running
kubectl get pods -n kube-system | grep csi
# No resources found.  ← CSI driver not installed!

# Check CSI drivers registered
kubectl get csidrivers
# No resources found.  ← No CSI drivers installed

Fix:

# Install the EBS CSI driver
helm repo add aws-ebs-csi-driver \
  https://kubernetes-sigs.github.io/aws-ebs-csi-driver
helm install aws-ebs-csi-driver \
  aws-ebs-csi-driver/aws-ebs-csi-driver \
  --namespace kube-system

# Update StorageClass to use CSI provisioner
kubectl delete storageclass fast-ssd
cat <<EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: ebs.csi.aws.com    # ← Updated to CSI
parameters:
  type: gp3
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
EOF

Problem 4: Permission Issues

Symptom:

kubectl logs my-pod
# Error: EACCES: permission denied, open '/data/app.db'
# or:
# mkdir: cannot create directory '/data/uploads': Permission denied

Real-Time Diagnosis:

# Check what user the container runs as
kubectl exec my-pod -- id
# uid=1001(appuser) gid=1001(appgroup)

# Check permissions on the mounted volume
kubectl exec my-pod -- ls -la /data
# drwxr-xr-x 2 root root 6 Jan 20 10:00 .
# ← Owned by root, app runs as uid 1001 — no write access!

Fix Option A: fsGroup in Pod Security Context (Recommended)

spec:
  securityContext:
    fsGroup: 1001                   # All volume files owned by group 1001
    runAsUser: 1001                 # Container runs as user 1001
    runAsGroup: 1001
  containers:
    - name: app
      image: mycompany/app:v1.0
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true   # Force read-only root (good security)
      volumeMounts:
        - name: data
          mountPath: /data

Fix Option B: Init Container to chmod/chown:

spec:
  initContainers:
    - name: volume-permissions
      image: busybox
      command: ["sh", "-c", "chown -R 1001:1001 /data && chmod 755 /data"]
      volumeMounts:
        - name: data
          mountPath: /data
      securityContext:
        runAsUser: 0               # Run init as root to change permissions
  containers:
    - name: app
      securityContext:
        runAsUser: 1001
      volumeMounts:
        - name: data
          mountPath: /data

Fix Option C: StorageClass with fsType and mount options:

# For NFS volumes, set permissions in mountOptions
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-storage
provisioner: nfs.csi.k8s.io
mountOptions:
  - dir_mode=0777                 # World-writable directory
  - file_mode=0666                # World-writable files
  - uid=1001
  - gid=1001

Quick Troubleshooting Cheat Sheet

# === PVC INSPECTION ===
kubectl get pvc -A                              # All PVCs all namespaces
kubectl get pvc -n <ns>                        # PVCs in namespace
kubectl describe pvc <name> -n <ns>            # Full PVC details + events
kubectl get pv                                 # All Persistent Volumes
kubectl describe pv <name>                     # Full PV details

# === STORAGECLASS ===
kubectl get storageclass                       # All storage classes
kubectl describe storageclass <name>           # StorageClass details
kubectl get csidrivers                         # Installed CSI drivers

# === POD STORAGE ===
kubectl describe pod <name>                    # Volume mount events
kubectl exec <pod> -- df -h                    # Disk usage inside pod
kubectl exec <pod> -- ls -la /mountpath        # File permissions
kubectl exec <pod> -- mount | grep <path>      # Verify volume is mounted

# === VOLUME SNAPSHOTS ===
kubectl get volumesnapshot -A                  # All snapshots
kubectl get volumesnapshotcontent              # Underlying snapshot content
kubectl describe volumesnapshot <name>         # Snapshot details

# === EVENTS (most useful for storage debugging) ===
kubectl get events -n <ns> --sort-by='.lastTimestamp' | grep -i volume
kubectl get events -n <ns> --sort-by='.lastTimestamp' | grep -i pvc
kubectl get events -n <ns> --sort-by='.lastTimestamp' | grep -i mount

# === CAPACITY ===
kubectl get pvc -A -o custom-columns=\
'NAMESPACE:.metadata.namespace,NAME:.metadata.name,STATUS:.status.phase,\
CAPACITY:.status.capacity.storage,STORAGECLASS:.spec.storageClassName'

10. Hands-On Labs

Lab 1: Create EmptyDir Volume

Objective: Create a Pod with two containers sharing an EmptyDir volume.

# Step 1: Create the Pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: lab1-emptydir
spec:
  containers:
    - name: producer
      image: busybox
      command: ["sh", "-c", "while true; do date >> /shared/timestamps.txt; sleep 3; done"]
      volumeMounts:
        - name: shared
          mountPath: /shared
    - name: consumer
      image: busybox
      command: ["sh", "-c", "while true; do echo '--- File contents ---'; cat /shared/timestamps.txt 2>/dev/null; sleep 5; done"]
      volumeMounts:
        - name: shared
          mountPath: /shared
  volumes:
    - name: shared
      emptyDir: {}
EOF

# Step 2: Verify both containers are running
kubectl get pod lab1-emptydir
kubectl describe pod lab1-emptydir | grep -A2 "Containers:"

# Step 3: Watch the consumer output
kubectl logs lab1-emptydir -c consumer -f

# Step 4: Verify shared data from producer side
kubectl exec lab1-emptydir -c producer -- cat /shared/timestamps.txt

# Step 5: Simulate container restart — data survives
kubectl exec lab1-emptydir -c producer -- kill 1
# (producer restarts)
kubectl exec lab1-emptydir -c producer -- cat /shared/timestamps.txt
# Previous timestamps still there!

# Step 6: Delete pod — data is lost
kubectl delete pod lab1-emptydir
kubectl run verify-gone --image=busybox --rm -it -- sh
# No /shared directory exists here — ephemeral as expected

echo "✅ Lab 1 Complete"

Lab 2: Create Persistent Volume

Objective: Manually create a static PV backed by a hostPath.

# Step 1: Create the directory on the node (Minikube)
minikube ssh -- sudo mkdir -p /mnt/lab2-data
minikube ssh -- sudo chmod 777 /mnt/lab2-data

# Step 2: Create the PV
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: lab2-pv
  labels:
    lab: lab2
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: lab-storage
  hostPath:
    path: /mnt/lab2-data
    type: DirectoryOrCreate
EOF

# Step 3: Verify PV is Available
kubectl get pv lab2-pv
# STATUS: Available  ← Ready to be claimed

kubectl describe pv lab2-pv

echo "✅ Lab 2 Complete"

Lab 3: Create Persistent Volume Claim

Objective: Create a PVC that binds to the PV from Lab 2.

# Step 1: Create PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lab3-pvc
  namespace: default
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 500Mi
  storageClassName: lab-storage
  selector:
    matchLabels:
      lab: lab2
EOF

# Step 2: Check binding
kubectl get pvc lab3-pvc
# STATUS: Bound  ← PVC is now bound to lab2-pv

kubectl get pv lab2-pv
# STATUS: Bound  ← PV is now claimed

# Step 3: Inspect the binding details
kubectl describe pvc lab3-pvc
# Volume: lab2-pv  ← Shows which PV it's bound to

echo "✅ Lab 3 Complete"

Lab 4: Attach PVC to Pod

Objective: Mount the PVC from Lab 3 into a Pod and verify data persistence.

# Step 1: Create a Pod using the PVC
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: lab4-pod
spec:
  containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 3600"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: lab3-pvc
EOF

# Step 2: Wait for pod to be running
kubectl wait --for=condition=ready pod/lab4-pod --timeout=60s

# Step 3: Write data to the persistent volume
kubectl exec lab4-pod -- sh -c "echo 'Hello Persistent World!' > /data/test.txt"
kubectl exec lab4-pod -- cat /data/test.txt
# Hello Persistent World!

# Step 4: Delete and recreate the Pod — data must survive!
kubectl delete pod lab4-pod

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: lab4-pod-v2
spec:
  containers:
    - name: app
      image: busybox
      command: ["sh", "-c", "sleep 3600"]
      volumeMounts:
        - name: persistent-storage
          mountPath: /data
  volumes:
    - name: persistent-storage
      persistentVolumeClaim:
        claimName: lab3-pvc    # Same PVC!
EOF

kubectl wait --for=condition=ready pod/lab4-pod-v2 --timeout=60s

# Step 5: Verify data persisted across pod deletion
kubectl exec lab4-pod-v2 -- cat /data/test.txt
# Hello Persistent World!  ← Data survived Pod deletion! ✅

echo "✅ Lab 4 Complete"

Lab 5: Dynamic Provisioning Example

Objective: Use a StorageClass (Minikube’s default) to dynamically provision a PVC.

# Step 1: Check the default StorageClass in Minikube
kubectl get storageclass
# NAME                 PROVISIONER                RECLAIMPOLICY
# standard (default)   rancher.io/local-path      Delete

# Step 2: Create a PVC without specifying a PV — dynamic provisioning!
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: lab5-dynamic-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 200Mi
  # storageClassName omitted → uses default (standard)
EOF

# Step 3: Check — PVC and PV should both be created automatically
kubectl get pvc lab5-dynamic-pvc
# STATUS: Bound  ← Immediately bound!

kubectl get pv
# A new PV was automatically created by the provisioner!
# NAME                                       CAPACITY   STATUS
# pvc-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx  200Mi      Bound

# Step 4: Use the dynamically provisioned volume in a Pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: lab5-dynamic-pod
spec:
  containers:
    - name: app
      image: nginx
      volumeMounts:
        - name: dynamic-storage
          mountPath: /usr/share/nginx/html
  volumes:
    - name: dynamic-storage
      persistentVolumeClaim:
        claimName: lab5-dynamic-pvc
EOF

kubectl wait --for=condition=ready pod/lab5-dynamic-pod --timeout=60s
kubectl exec lab5-dynamic-pod -- df -h /usr/share/nginx/html
# Filesystem      Size  Used Avail  Mounted on
# /dev/...        200M  1.5M  198M  /usr/share/nginx/html

echo "✅ Lab 5 Complete"

Lab 6: StatefulSet with Persistent Storage

Objective: Deploy a StatefulSet and verify each Pod gets its own unique PVC.

# Step 1: Create the StatefulSet with volumeClaimTemplates
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
  name: lab6-headless
spec:
  clusterIP: None
  selector:
    app: lab6-stateful
  ports:
    - port: 80
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: lab6-stateful
spec:
  serviceName: lab6-headless
  replicas: 3
  selector:
    matchLabels:
      app: lab6-stateful
  template:
    metadata:
      labels:
        app: lab6-stateful
    spec:
      containers:
        - name: app
          image: busybox
          command: ["sh", "-c", "echo Pod \$(hostname) started > /data/pod-identity.txt && sleep 3600"]
          volumeMounts:
            - name: data
              mountPath: /data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 100Mi
EOF

# Step 2: Watch pods come up in order
kubectl get pods -l app=lab6-stateful -w
# lab6-stateful-0   Running  ← First
# lab6-stateful-1   Running  ← Second
# lab6-stateful-2   Running  ← Third (ordered!)

# Step 3: Verify 3 separate PVCs were created (one per pod)
kubectl get pvc | grep lab6
# data-lab6-stateful-0   Bound   100Mi
# data-lab6-stateful-1   Bound   100Mi
# data-lab6-stateful-2   Bound   100Mi

# Step 4: Verify each pod wrote to ITS OWN volume
kubectl exec lab6-stateful-0 -- cat /data/pod-identity.txt
# Pod lab6-stateful-0 started

kubectl exec lab6-stateful-1 -- cat /data/pod-identity.txt
# Pod lab6-stateful-1 started

kubectl exec lab6-stateful-2 -- cat /data/pod-identity.txt
# Pod lab6-stateful-2 started

# Step 5: Delete a pod and verify it reattaches to its own PVC
kubectl delete pod lab6-stateful-1
kubectl get pods -l app=lab6-stateful -w
# lab6-stateful-1 is recreated automatically

kubectl exec lab6-stateful-1 -- cat /data/pod-identity.txt
# Pod lab6-stateful-1 started  ← Same data! PVC reattached!

# Step 6: Clean up
kubectl delete statefulset lab6-stateful
kubectl delete svc lab6-headless
# Note: PVCs are NOT auto-deleted when StatefulSet is deleted (by design)
kubectl delete pvc data-lab6-stateful-0 data-lab6-stateful-1 data-lab6-stateful-2

echo "✅ Lab 6 Complete — All Labs Done!"

Summary

Concept	Key Takeaway
Ephemeral Storage	Container writable layer — lost on container restart
EmptyDir	Shared scratch space within a Pod — lost on Pod deletion
HostPath	Node filesystem mount — persists beyond Pod, tied to a node
ConfigMap Volume	Injects config files; auto-updates in ~1 min without restart
Secret Volume	Injects secrets as files; stored in memory (tmpfs)
PersistentVolume (PV)	Cluster-wide storage resource — the actual storage
PersistentVolumeClaim (PVC)	Namespace-scoped storage request — binds to a PV
Access Modes	RWO (one node), ROX (many read), RWX (many write)
Reclaim Policies	Retain (keep data), Delete (remove storage), Recycle (deprecated)
StorageClass	Storage tier definition; enables dynamic provisioning
Dynamic Provisioning	Auto-creates PV when PVC is created — no manual admin work
StatefulSet	Ordered Pods with stable identity and per-Pod PVCs
VolumeClaimTemplates	Auto-creates PVC per StatefulSet Pod using a template
CSI	Standard interface for storage drivers — vendor-agnostic
VolumeSnapshots	Point-in-time copies of PVCs for backup and restore

Previous: ← Module 5 - Kubernetes Networking
Next Up: Module 7 - Kubernetes Security → — Learn about RBAC, Network Policies, Pod Security Standards, and Secrets management best practices.

Module 6 - Kubernetes Storage

Module 6 — Kubernetes Storage

Table of Contents

1. Introduction to Kubernetes Storage

Why Storage is Required in Kubernetes

Stateless vs Stateful Applications

Ephemeral Storage in Containers

2. Kubernetes Volumes

What are Volumes?

EmptyDir Volume

HostPath Volume

ConfigMap Volume

Secret Volume

3. Persistent Storage Concepts

What is Persistent Volume (PV)?

What is Persistent Volume Claim (PVC)?

How PV and PVC Work Together

Access Modes in PV

Reclaim Policies

4. Storage Classes

What is StorageClass?

Dynamic Provisioning

Default StorageClass

Storage Provisioners

5. Stateful Applications

Introduction to StatefulSets

Storage in StatefulSets

VolumeClaimTemplates

6. Cloud Storage Integration

AWS EBS with Kubernetes

Azure Disk Storage

Google Persistent Disk

NFS Storage

7. CSI (Container Storage Interface)

What is CSI?

CSI Drivers

Benefits of CSI

8. Backup and Data Protection

Volume Snapshots

Backup Strategies

Disaster Recovery Basics

9. Troubleshooting Kubernetes Storage

Troubleshooting Decision Tree

Problem 1: PVC Pending Issues

Problem 2: Volume Mount Errors

Problem 3: StorageClass Troubleshooting

Problem 4: Permission Issues

Quick Troubleshooting Cheat Sheet

10. Hands-On Labs

Lab 1: Create EmptyDir Volume

Lab 2: Create Persistent Volume

Lab 3: Create Persistent Volume Claim

Lab 4: Attach PVC to Pod

Lab 5: Dynamic Provisioning Example

Lab 6: StatefulSet with Persistent Storage

Summary