Interview Q&A All Levels Kubernetes

Kubernetes Interview Questions & Answers (2026) part 04

50+ Kubernetes interview questions and answers from basic to advanced — covering Pods, Deployments, Services, Networking, RBAC, Helm, Autoscaling, Security, and real-world troubleshooting scenarios.

May 18, 2025 89 min read 30 Questions DB
Level:

80+ scenario-based questions · Real-world answers · Production patterns · YAML manifests


🎯 Scenario: A developer joining your team asks: “What is Kubernetes and how does it actually work under the hood?”

Answer:

Kubernetes is an open-source container orchestration platform that automates deployment, scaling, and management of containerized applications.

Control Plane (Master) components:

ComponentRole
kube-apiserverFront door to K8s — all REST calls go here; validates and persists to etcd
etcdDistributed key-value store — single source of truth for all cluster state
kube-schedulerWatches for unscheduled pods and assigns them to best-fit nodes
kube-controller-managerRuns reconciliation loops (ReplicaSet, Node, Endpoints, etc.)
cloud-controller-managerInteracts with cloud provider APIs (AWS, GCP, Azure) for LBs, volumes, routes

Worker Node components:

ComponentRole
kubeletAgent on each node — ensures containers match the desired spec
kube-proxyMaintains iptables/IPVS rules for pod-to-pod and service traffic
Container RuntimeRuns containers (containerd, CRI-O)
┌──────────────────────────────────────────────────────────┐
│                      CONTROL PLANE                       │
│  ┌────────────┐  ┌──────┐  ┌───────────┐  ┌──────────┐   │
│  │ APIServer  │  │ etcd │  │ Scheduler │  │Controller│   │
│  └────────────┘  └──────┘  └───────────┘  └──────────┘   │
└──────────────────────────────────────────────────────────┘
         │                  │                  │
  ┌──────▼──────┐   ┌───────▼─────┐   ┌───────▼─────┐
  │ Worker Node │   │ Worker Node │   │ Worker Node │
  │  kubelet    │   │  kubelet    │   │  kubelet    │
  │  kube-proxy │   │  kube-proxy │   │  kube-proxy │
  │  [Pods...]  │   │  [Pods...]  │   │  [Pods...]  │
  └─────────────┘   └─────────────┘   └─────────────┘

💡 Key insight: The API server is the only component that reads/writes to etcd. All others communicate exclusively through the API server. This makes the API server the single source of authority.

🎯 Scenario: An interviewer asks you to walk through the complete internal flow when applying a manifest.

Answer:

Step 1: kubectl serializes YAML → HTTP POST to kube-apiserver

Step 2: API Server processing pipeline
  ├── Authentication  (who are you? — cert, bearer token, OIDC)
  ├── Authorization   (are you allowed? — RBAC check)
  ├── Admission Controllers
  │     ├── Mutating  (webhooks that can modify the object)
  │     └── Validating (webhooks that can reject the object)
  └── Persists desired state to etcd

Step 3: Deployment Controller (part of kube-controller-manager)
  → Detects new Deployment in etcd via watch
  → Creates a ReplicaSet object

Step 4: ReplicaSet Controller
  → Detects new ReplicaSet
  → Creates Pod objects (just spec stored in etcd — not running yet)

Step 5: Scheduler
  → Watches for pods with no nodeName assigned
  → Filters nodes: resource fit, taints/tolerations, affinity rules
  → Scores remaining nodes (least requested, node affinity weight, etc.)
  → Writes chosen nodeName to Pod spec in etcd (binding)

Step 6: kubelet on target node
  → Watches API server for pods assigned to its node
  → Pulls container image via container runtime (containerd)
  → Sets up networking (calls CNI plugin: Calico/Cilium/Flannel)
  → Starts containers, runs probes
  → Reports pod status back to API server

Step 7: kube-proxy
  → Detects new pod endpoints
  → Updates iptables/IPVS rules on all nodes
  → Pod is now reachable via Service ClusterIP

Admission controllers run BEFORE etcd persistence — this is where OPA/Gatekeeper, PodSecurity, LimitRanger, and ResourceQuota are enforced. If any validating webhook rejects the object, it never reaches etcd.

🎯 Scenario: Your colleague is confused about which workload type to use. Explain with examples.

Answer:

ResourcePurposeUse When
PodSmallest unit; one or more containersAlmost never directly — use higher-level objects
ReplicaSetEnsures N copies of a Pod are runningRarely directly — Deployment manages this
DeploymentManages ReplicaSets; rolling updates, rollbackStateless apps (web servers, APIs)
StatefulSetOrdered, stable pod identities + persistent storage per podDatabases, message queues, Elasticsearch
DaemonSetOne pod per node (auto)Log collectors, monitoring agents, CNI plugins
JobRun-to-completion taskBatch processing, migrations
CronJobScheduled JobBackups, reports, cleanup tasks
# Deployment — stateless web app
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: nginx:1.25
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 256Mi
# StatefulSet — PostgreSQL with stable identity
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres-headless   # Required: headless service name
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: pg-secret
              key: password
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  # Each pod gets its own dedicated PVC
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: fast-ssd
      resources:
        requests:
          storage: 20Gi

💡 StatefulSet key properties: Pods get stable DNS names (postgres-0, postgres-1, postgres-2), ordered startup (0 → 1 → 2), ordered shutdown (2 → 1 → 0), and each gets its own PersistentVolumeClaim.

🎯 Scenario: Your company is onboarding 5 teams onto a shared Kubernetes cluster. How do you isolate them properly?

Answer:

Namespaces provide logical isolation — separate RBAC, resource quotas, and network policies per team or environment.

# Namespace per team+environment
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha-prod
  labels:
    team: alpha
    environment: production
    cost-center: "cc-1234"
# ResourceQuota — cap total resources per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha-prod
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    services: "10"
    persistentvolumeclaims: "20"
    count/deployments.apps: "20"
# LimitRange — set default container resource requests/limits
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha-prod
spec:
  limits:
  - type: Container
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    default:
      cpu: 500m
      memory: 256Mi
    max:
      cpu: "4"
      memory: 4Gi
    min:
      cpu: 50m
      memory: 64Mi
# Set default namespace for current session
kubectl config set-context --current --namespace=team-alpha-prod

# Cross-namespace DNS format
# <service>.<namespace>.svc.cluster.local
curl http://db-service.team-alpha-prod.svc.cluster.local:5432

Best practice: Use <team>-<environment> naming (e.g., alpha-prod, alpha-staging, alpha-dev). Never use the default namespace for production workloads.

🎯 Scenario: One node of your production etcd cluster fails. What is the impact and how do you recover?

Answer:

etcd is a distributed key-value store using the Raft consensus algorithm. It stores ALL cluster state: pods, services, secrets, configmaps, RBAC, and custom resources.

Impact by failure scenario:

ScenarioImpact
1 of 3 nodes downCluster fully operational — quorum maintained (2 of 3)
2 of 3 nodes downRead-only — no changes possible, existing workloads keep running
All nodes downComplete outage — control plane unresponsive
Data corruptionMust restore from snapshot backup
# Check etcd health
kubectl exec -n kube-system etcd-master -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint health

# Check etcd member list
etcdctl member list --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Take a snapshot backup
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-$(date +%Y%m%d-%H%M%S).db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify snapshot
etcdctl snapshot status /backup/etcd-20240101.db --write-out=table

# Restore from snapshot
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-20240101.db \
  --data-dir=/var/lib/etcd-restored \
  --name=master \
  --initial-cluster=master=https://127.0.0.1:2380 \
  --initial-advertise-peer-urls=https://127.0.0.1:2380

⚠️ Production requirement: Always run etcd with 3 or 5 nodes (odd number for quorum). Automate etcd snapshots to S3 every 30 minutes. Test restores regularly.

🎯 Scenario: A worker node in your cluster suddenly becomes unreachable. Walk through what Kubernetes does automatically.

Answer:

T+0s     Node stops sending heartbeats to API server
T+40s    Node-monitor-grace-period expires → node marked "NotReady"
         (controlled by --node-monitor-grace-period on controller-manager)
T+5min   pod-eviction-timeout expires → pods on node get "Terminating"
         Scheduler places replacement pods on healthy nodes
T+5min+  New pods start on healthy nodes, pass readiness probes
         Service endpoints updated → traffic routed to new pods
T+...    If node recovers: kubelet reconciles and removes stale pods
# PodDisruptionBudget — prevents too many pods being evicted at once
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
  namespace: production
spec:
  minAvailable: 2          # Always keep at least 2 pods running
  # OR: maxUnavailable: 1  # Allow at most 1 pod down at a time
  selector:
    matchLabels:
      app: web-app
# Node maintenance workflow
kubectl cordon node-1         # Mark unschedulable — no new pods land here
kubectl drain node-1 \
  --ignore-daemonsets \       # Ignore DaemonSet pods (can't be moved)
  --delete-emptydir-data \    # Delete pods using emptyDir volumes
  --grace-period=60           # Give pods 60s to terminate gracefully
# Do maintenance...
kubectl uncordon node-1       # Mark schedulable again

# Check node conditions
kubectl describe node node-1 | grep -A 20 "Conditions:"

💡 PodDisruptionBudgets prevent Kubernetes from evicting too many pods at once during voluntary disruptions (node drains, cluster upgrades). Always define PDBs for production workloads with replicas > 1.

🎯 Scenario: A junior engineer asks when to use which command. Explain with examples.

Answer:

CommandBehaviorIdempotentUse Case
kubectl createCreates; fails if existsOne-time creation
kubectl applyCreates OR updates declarativelyGitOps, CI/CD pipelines
kubectl replaceReplaces entire resource; fails if missingForce full replacement
kubectl patchPartially updates a resourceQuick targeted edits
kubectl editOpens resource in editorManual one-off changes
# apply — idempotent, tracks last-applied-configuration annotation
kubectl apply -f deployment.yaml

# create — fails if already exists
kubectl create -f deployment.yaml

# replace — deletes and recreates (loses rollout history)
kubectl replace -f deployment.yaml
kubectl replace --force -f deployment.yaml   # Delete first, then create

# patch — merge patch
kubectl patch deployment web-app \
  -p '{"spec":{"replicas":5}}'

# patch — JSON patch (precise)
kubectl patch deployment web-app \
  --type=json \
  -p='[{"op":"replace","path":"/spec/replicas","value":5}]'

# patch — strategic merge patch (aware of K8s object structure)
kubectl patch deployment web-app \
  --type=strategic \
  -p '{"spec":{"template":{"spec":{"containers":[{"name":"web","image":"nginx:1.26"}]}}}}'

# Diff before applying
kubectl diff -f deployment.yaml

Always use kubectl apply in production pipelines. It is declarative, idempotent, and enables accurate drift detection via the kubectl.kubernetes.io/last-applied-configuration annotation.

🎯 Scenario: Your Service is not routing traffic to pods. The pods are running but endpoints are empty. What’s happening?

Answer:

Labels are key-value metadata on objects. Selectors filter objects by labels. Services route traffic only to pods whose labels match the Service selector.

# Deployment — pods get these labels
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  selector:
    matchLabels:
      app: web-app        # ReplicaSet selects pods with this label
      tier: frontend
  template:
    metadata:
      labels:
        app: web-app      # Pod label — MUST match selector above
        tier: frontend
        version: v2.0
        env: production
# Service — routes to pods matching selector
apiVersion: v1
kind: Service
metadata:
  name: web-app-svc
spec:
  selector:
    app: web-app          # Matches pods with this label
    tier: frontend        # AND this label
  ports:
  - port: 80
    targetPort: 8080
# Diagnose empty endpoints
kubectl get endpoints web-app-svc
# NAME          ENDPOINTS   AGE
# web-app-svc   <none>      5m  ← PROBLEM

# Check service selector
kubectl get svc web-app-svc -o jsonpath='{.spec.selector}'
# {"app":"web-app","tier":"frontend"}

# Check what labels pods actually have
kubectl get pods --show-labels | grep web-app
# web-app-xxxx  Running  app=web-app,tier=backend  ← MISMATCH!

# Fix: update pod labels to match service selector
kubectl label pod web-app-xxxx tier=frontend --overwrite

# Other label operations
kubectl get pods -l 'app=web-app,env=production'   # Select by multiple labels
kubectl get pods -l 'version in (v1.0, v2.0)'      # Set-based selector
kubectl get pods -l 'env notin (dev, staging)'      # Exclusion selector
kubectl label pod web-app-xxxx version-             # Remove a label

🔵 Pods & Workloads

🎯 Scenario: You deployed a new application and pods show CrashLoopBackOff. Production is down. What is your step-by-step debugging process?

Answer:

# Step 1: Get overview
kubectl get pods -n production
# NAME           READY   STATUS             RESTARTS   AGE
# app-xxx-yyy    0/1     CrashLoopBackOff   8          12m

# Step 2: Describe pod — read Events section carefully
kubectl describe pod app-xxx-yyy -n production
# Look for:
#   Exit Code (137=OOM, 1=app error, 139=segfault, 143=SIGTERM)
#   Last State: reason, exitCode, finishedAt
#   Events: Failed to pull image, Failed to mount volume, etc.

# Step 3: Check CURRENT logs
kubectl logs app-xxx-yyy -n production

# Step 4: Check PREVIOUS container logs (before crash — most useful!)
kubectl logs app-xxx-yyy -n production --previous

# Step 5: Check resource pressure
kubectl top pod app-xxx-yyy -n production
kubectl describe pod app-xxx-yyy | grep -A5 "Limits\|Requests"

# Step 6: Override entrypoint to keep container alive for debugging
kubectl run debug \
  --image=your-app-image:tag \
  --restart=Never \
  --command -- sleep 3600
kubectl exec -it debug -- /bin/sh

# Step 7: Check events across namespace
kubectl get events -n production --sort-by='.lastTimestamp' | tail -20

CrashLoopBackOff diagnosis table:

Exit CodeCauseFix
1Application error at startupCheck app logs, fix code
137OOMKilledIncrease memory limit
139Segmentation faultDebug application
143SIGTERM — graceful shutdownCheck liveness probe aggressiveness
126Command not executableFix file permissions
127Command not foundFix command/args, check image

🎯 Scenario: You need to update a production web app from v1.0 to v2.0 with zero service interruption and immediate rollback capability.

Answer:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1           # Create 1 extra pod above desired count
      maxUnavailable: 0     # Never go below desired count (zero-downtime)
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: myapp:v2.0
        # Readiness probe gates traffic — CRITICAL for zero-downtime
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          failureThreshold: 3
          successThreshold: 1
        # Liveness probe restarts deadlocked pods
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          periodSeconds: 10
          failureThreshold: 3
        # Give in-flight requests time to complete
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 15"]
      # Must be >= preStop sleep time
      terminationGracePeriodSeconds: 30
# Trigger update
kubectl set image deployment/web-app web=myapp:v2.0 -n production

# Watch real-time rollout progress
kubectl rollout status deployment/web-app -n production

# Inspect revision history
kubectl rollout history deployment/web-app -n production

# Instant rollback to previous version
kubectl rollout undo deployment/web-app -n production

# Rollback to specific revision
kubectl rollout undo deployment/web-app --to-revision=3 -n production

# Pause mid-rollout (canary-style manual gate)
kubectl rollout pause deployment/web-app -n production
# Check metrics, error rates...
kubectl rollout resume deployment/web-app -n production

Without a readiness probe, Kubernetes has no way to know if a new pod is actually serving traffic. New pods will receive traffic immediately upon startup — before they’re ready — causing errors.

🎯 Scenario: Your app container needs the database to be up and a config file generated before it starts. How do you handle this reliably?

Answer:

Init containers run sequentially, to completion, before the main container starts. If any init container fails, the pod restarts.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  initContainers:

  # 1. Wait for database to accept connections
  - name: wait-for-postgres
    image: busybox:1.35
    command:
    - sh
    - -c
    - |
      until nc -z postgres-service.db.svc.cluster.local 5432; do
        echo "Waiting for PostgreSQL..."
        sleep 3
      done
      echo "PostgreSQL is ready!"

  # 2. Run database migrations
  - name: run-migrations
    image: myapp:v2.0
    command: ["python", "manage.py", "migrate", "--no-input"]
    env:
    - name: DATABASE_URL
      valueFrom:
        secretKeyRef:
          name: db-secret
          key: url

  # 3. Fetch runtime config from S3
  - name: fetch-config
    image: amazon/aws-cli:latest
    command:
    - sh
    - -c
    - aws s3 cp s3://my-bucket/config/app.json /config/app.json
    volumeMounts:
    - name: config-vol
      mountPath: /config

  containers:
  - name: app
    image: myapp:v2.0
    volumeMounts:
    - name: config-vol
      mountPath: /app/config
      readOnly: true
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5

  volumes:
  - name: config-vol
    emptyDir: {}

💡 Init containers can have different images from the main container — useful for running tools (aws-cli, curl, migrate) that you don’t want in your production image. They share the same network and volumes as the main container.

🎯 Scenario: Pods are getting killed unnecessarily during slow startup, and traffic is being sent to pods that aren’t ready yet.

Answer:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    image: myapp:v2.0

    # STARTUP PROBE — Checked first. Disables liveness/readiness until it passes.
    # Use for apps with slow/variable startup times.
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      failureThreshold: 30   # 30 * 10s = 5 minutes max startup time
      periodSeconds: 10
      # If this fails 30 times, container is killed and restarted

    # LIVENESS PROBE — Is the app alive? Kills and restarts if failing.
    # Don't call slow external dependencies here (DB, external APIs)
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      periodSeconds: 10
      failureThreshold: 3       # Kill after 3 consecutive failures (30s)
      timeoutSeconds: 5
      successThreshold: 1

    # READINESS PROBE — Is the app ready to serve traffic?
    # Removes pod from Service endpoints when failing.
    readinessProbe:
      httpGet:
        path: /health/ready    # Can check DB connectivity here
        port: 8080
      periodSeconds: 5
      failureThreshold: 3      # Remove from LB after 15s of failures
      successThreshold: 2      # Re-add only after 2 consecutive passes
      initialDelaySeconds: 0   # startupProbe handles initial delay

    # TCP socket probe (databases, non-HTTP services)
    # livenessProbe:
    #   tcpSocket:
    #     port: 5432
    #   periodSeconds: 10

    # Exec probe (custom health check command)
    # livenessProbe:
    #   exec:
    #     command: ["redis-cli", "ping"]
    #   periodSeconds: 10
ProbePurposeOn Failure
startupProbeApp finished initializingContainer killed and restarted
livenessProbeApp is not deadlocked/brokenContainer killed and restarted
readinessProbeApp is ready to accept trafficRemoved from Service endpoints

⚠️ Common mistake: Setting livenessProbe without a startupProbe on slow-starting apps. The liveness probe fires too soon, kills the app before it finishes starting, and you get a crash loop. Always use startupProbe for apps that take more than 30 seconds to start.

🎯 Scenario: You want to route 10% of production traffic to v2.0 to validate it, then gradually increase to 100%.

Answer:

# Stable Deployment — 9 replicas = 90% traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-stable
  namespace: production
spec:
  replicas: 9
  selector:
    matchLabels:
      app: web-app
      track: stable
  template:
    metadata:
      labels:
        app: web-app     # ← shared label
        track: stable
    spec:
      containers:
      - name: web
        image: myapp:v1.0
---
# Canary Deployment — 1 replica = 10% traffic
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-canary
  namespace: production
spec:
  replicas: 1
  selector:
    matchLabels:
      app: web-app
      track: canary
  template:
    metadata:
      labels:
        app: web-app     # ← same shared label
        track: canary
    spec:
      containers:
      - name: web
        image: myapp:v2.0
---
# Service selects ALL pods with app=web-app
# Traffic split is proportional to replica count (9:1 = 90%:10%)
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app         # Matches BOTH stable and canary pods
  ports:
  - port: 80
    targetPort: 8080
# Step 1: Deploy canary at 10%
kubectl apply -f canary-deployment.yaml

# Step 2: Monitor error rate and latency
kubectl logs -l track=canary --tail=200 -n production
kubectl top pods -l track=canary -n production

# Step 3a: Promote — gradually increase canary, decrease stable
kubectl scale deployment/web-app-canary --replicas=3   # 30%
kubectl scale deployment/web-app-stable --replicas=7   # 70%
# ... eventually
kubectl scale deployment/web-app-canary --replicas=10  # 100%
kubectl scale deployment/web-app-stable --replicas=0

# Step 3b: Rollback if issues found
kubectl scale deployment/web-app-canary --replicas=0
kubectl delete deployment/web-app-canary

💡 For header/cookie-based traffic splitting, use NGINX Ingress canary annotations or a service mesh (Istio, Linkerd). The replica-ratio approach splits randomly which is fine for simple cases.

🎯 Scenario: Your ops team needs a Datadog agent and a log collector running on every node, including new nodes that join automatically.

Answer:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: datadog-agent
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: datadog-agent
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: datadog-agent
    spec:
      # Required to run on control-plane nodes too
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoExecute

      # Only run on Linux nodes
      nodeSelector:
        kubernetes.io/os: linux

      containers:
      - name: agent
        image: datadog/agent:latest
        env:
        - name: DD_API_KEY
          valueFrom:
            secretKeyRef:
              name: datadog-secret
              key: api-key
        - name: DD_KUBERNETES_KUBELET_HOST
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 500m
            memory: 512Mi
        volumeMounts:
        - name: dockersocket
          mountPath: /var/run/docker.sock
        - name: procdir
          mountPath: /host/proc
          readOnly: true
        - name: cgroups
          mountPath: /host/sys/fs/cgroup
          readOnly: true

      volumes:
      - name: dockersocket
        hostPath:
          path: /var/run/docker.sock
      - name: procdir
        hostPath:
          path: /proc
      - name: cgroups
        hostPath:
          path: /sys/fs/cgroup

💡 DaemonSet common use cases: Log collectors (Fluentd, Promtail), metrics agents (Datadog, node-exporter), network plugins (CNI), storage daemons (Ceph), security scanners, GPU device plugins.

🎯 Scenario: You need to run a data processing job that processes 1000 items in parallel chunks, retrying on failure, and a nightly cleanup job.

Answer:

# Parallel batch Job — process items in parallel
apiVersion: batch/v1
kind: Job
metadata:
  name: data-processor
  namespace: production
spec:
  completions: 10          # Total tasks to complete
  parallelism: 3           # Run 3 pods simultaneously
  backoffLimit: 4          # Retry up to 4 times per pod
  activeDeadlineSeconds: 3600   # Kill job after 1 hour
  ttlSecondsAfterFinished: 86400  # Auto-delete after 24h
  template:
    spec:
      restartPolicy: OnFailure   # OnFailure or Never (not Always)
      containers:
      - name: processor
        image: data-processor:v1.0
        command:
        - python
        - process.py
        - --chunk-index=$(JOB_COMPLETION_INDEX)
        env:
        - name: JOB_COMPLETION_INDEX
          valueFrom:
            fieldRef:
              fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
        resources:
          requests:
            cpu: 500m
            memory: 1Gi
# CronJob — nightly database cleanup at 2 AM UTC
apiVersion: batch/v1
kind: CronJob
metadata:
  name: nightly-cleanup
  namespace: production
spec:
  schedule: "0 2 * * *"          # Cron: minute hour dom month dow
  timeZone: "UTC"                 # K8s 1.27+
  concurrencyPolicy: Forbid       # Skip if previous job still running
  successfulJobsHistoryLimit: 3   # Keep last 3 successful job records
  failedJobsHistoryLimit: 3       # Keep last 3 failed job records
  startingDeadlineSeconds: 300    # If missed window, only retry within 5 min
  jobTemplate:
    spec:
      backoffLimit: 2
      activeDeadlineSeconds: 1800
      template:
        spec:
          restartPolicy: OnFailure
          serviceAccountName: cleanup-sa
          containers:
          - name: cleanup
            image: myapp:v2.0
            command: ["python", "cleanup.py", "--days=30"]
            resources:
              requests:
                cpu: 100m
                memory: 256Mi
# Manually trigger a CronJob immediately
kubectl create job --from=cronjob/nightly-cleanup manual-run-$(date +%s)

# Monitor Job progress
kubectl get jobs -n production
kubectl describe job data-processor -n production
kubectl logs -l job-name=data-processor --prefix=true

🎯 Scenario: You need to add request logging, TLS termination, and secrets injection to your app without modifying the application code.

Answer:

apiVersion: v1
kind: Pod
metadata:
  name: app-with-sidecars
spec:
  # Shared volume for inter-container communication
  volumes:
  - name: shared-logs
    emptyDir: {}
  - name: nginx-config
    configMap:
      name: nginx-config

  containers:
  # Main application (HTTP on localhost:8080)
  - name: app
    image: myapp:v2.0
    ports:
    - containerPort: 8080
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app

  # Sidecar 1: NGINX reverse proxy / TLS terminator
  - name: nginx-proxy
    image: nginx:1.25-alpine
    ports:
    - containerPort: 443
    volumeMounts:
    - name: nginx-config
      mountPath: /etc/nginx/conf.d
    - name: tls-certs
      mountPath: /etc/nginx/certs
      readOnly: true

  # Sidecar 2: Log shipper
  - name: log-shipper
    image: fluent/fluent-bit:latest
    volumeMounts:
    - name: shared-logs
      mountPath: /var/log/app
      readOnly: true

  # Sidecar 3: Vault Agent (secrets injection)
  - name: vault-agent
    image: hashicorp/vault:latest
    command: ["vault", "agent", "-config=/vault/config/agent.hcl"]
    volumeMounts:
    - name: vault-config
      mountPath: /vault/config
    - name: secrets-vol
      mountPath: /vault/secrets

  initContainers:
  # Init: Set up Vault auth token before main containers start
  - name: vault-init
    image: hashicorp/vault:latest
    command: ["sh", "-c", "vault login -method=kubernetes role=my-app"]

  volumes:
  - name: tls-certs
    secret:
      secretName: app-tls-cert
  - name: vault-config
    configMap:
      name: vault-agent-config
  - name: secrets-vol
    emptyDir:
      medium: Memory  # Store secrets in RAM, not disk

💡 Istio service mesh uses this pattern at scale — automatically injecting an Envoy proxy sidecar into every pod for mTLS, traffic management, and observability without changing application code.

🎯 Scenario: Your pods are being killed abruptly during deployments, causing dropped HTTP connections and failed requests.

Answer:

Kubernetes pod termination sequence:

1. Pod moves to Terminating state
2. Pod removed from Service endpoints (stops receiving new traffic)
   ← BUT there's a propagation delay! iptables/IPVS rules take ~1-2s to update
3. preStop hook executes (if defined)
4. SIGTERM sent to container process (PID 1)
5. terminationGracePeriodSeconds countdown begins
6. If process still running after grace period → SIGKILL
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      # Must be >= preStop sleep + app shutdown time
      terminationGracePeriodSeconds: 60

      containers:
      - name: app
        image: myapp:v2.0

        lifecycle:
          preStop:
            exec:
              # Sleep bridges the gap between endpoint removal
              # and actual traffic drain (~5-15s is typical)
              command: ["/bin/sh", "-c", "sleep 15"]

          # OR: call a graceful shutdown endpoint
          # preStop:
          #   httpGet:
          #     path: /shutdown
          #     port: 8080

        # Your app should handle SIGTERM gracefully:
        # - Stop accepting new connections
        # - Finish processing in-flight requests
        # - Close DB connections cleanly
        # - Exit with code 0
# Example: Python app handling SIGTERM
import signal, sys

def handle_sigterm(signum, frame):
    print("Received SIGTERM — finishing in-flight requests...")
    server.shutdown()  # Stop accepting new requests
    # Wait for in-flight requests to complete
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

⚠️ Without preStop sleep, new traffic can still arrive at the pod for 1-2 seconds after endpoint removal starts (due to iptables propagation lag). The preStop sleep ensures the pod stays alive long enough to drain.

🎯 Scenario: You have a web server and a metrics exporter that need to share a socket file and log directory.

Answer:

apiVersion: v1
kind: Pod
metadata:
  name: web-with-exporter
spec:
  volumes:
  # Shared socket directory between containers
  - name: socket-dir
    emptyDir: {}
  # Shared log directory
  - name: log-dir
    emptyDir: {}

  containers:
  # Main app: writes to /tmp/app.sock and /var/log/app/
  - name: web-server
    image: myapp:v2.0
    volumeMounts:
    - name: socket-dir
      mountPath: /tmp
    - name: log-dir
      mountPath: /var/log/app

  # Exporter: reads from the same socket/logs
  - name: metrics-exporter
    image: prom/statsd-exporter:latest
    args:
    - --statsd.listen-unixgram=/tmp/app.sock
    volumeMounts:
    - name: socket-dir
      mountPath: /tmp   # Same path — shares the socket file
    ports:
    - containerPort: 9102
      name: metrics
    readinessProbe:
      httpGet:
        path: /metrics
        port: 9102

  # Log rotator sidecar
  - name: log-rotator
    image: busybox:1.35
    command: ["/bin/sh", "-c"]
    args:
    - |
      while true; do
        find /var/log/app -name "*.log" -mtime +7 -delete
        sleep 3600
      done
    volumeMounts:
    - name: log-dir
      mountPath: /var/log/app

💡 Key multi-container rules: All containers in a pod share the same network namespace (same IP, same localhost), same IPC namespace, and can share volumes. But they have separate filesystem namespaces (different PID 1s).

🎯 Scenario: A developer asks when to use ClusterIP vs NodePort vs LoadBalancer.

Answer:

# ClusterIP (default) — cluster-internal only
apiVersion: v1
kind: Service
metadata:
  name: backend-api
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
  - port: 8080
    targetPort: 8080
  # DNS: backend-api.namespace.svc.cluster.local:8080
# NodePort — expose on every node's IP:port
apiVersion: v1
kind: Service
metadata:
  name: web-nodeport
spec:
  type: NodePort
  selector:
    app: web
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080    # Valid range: 30000-32767
  # Access: <any-node-ip>:30080
# LoadBalancer — provisions cloud load balancer (AWS NLB/ALB)
apiVersion: v1
kind: Service
metadata:
  name: web-lb
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - port: 443
    targetPort: 8080
    protocol: TCP
  # AWS provisions an NLB; service gets external DNS/IP
# Headless — no ClusterIP; DNS returns individual pod IPs
apiVersion: v1
kind: Service
metadata:
  name: postgres-headless
spec:
  clusterIP: None
  selector:
    app: postgres
  ports:
  - port: 5432
  # DNS: postgres-0.postgres-headless.ns.svc.cluster.local → pod-0 IP
  #       postgres-1.postgres-headless.ns.svc.cluster.local → pod-1 IP
TypeScopeBest For
ClusterIPInside clusterMicroservice communication
NodePortNode IPsDev, on-prem without cloud LB
LoadBalancerInternetProduction external services
HeadlessDirect pod DNSStatefulSets, Cassandra, Kafka
ExternalNameDNS aliasRoute to external service by name

🎯 Scenario: You have 5 microservices and want them at api.example.com/users, api.example.com/orders, etc. with auto-renewed HTTPS certificates.

Answer:

# Install NGINX Ingress Controller
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx --create-namespace \
  --set controller.metrics.enabled=true

# Install cert-manager
helm repo add jetstack https://charts.jetstack.io
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set installCRDs=true
# ClusterIssuer — Let's Encrypt certificate authority
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:
        ingress:
          class: nginx
# Ingress — path-based routing + automatic TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    # Rate limiting
    nginx.ingress.kubernetes.io/limit-rps: "100"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.example.com
    secretName: api-example-com-tls   # cert-manager creates this
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: users-service
            port:
              number: 8080
      - path: /orders
        pathType: Prefix
        backend:
          service:
            name: orders-service
            port:
              number: 8080
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend-service
            port:
              number: 3000
# Monitor certificate issuance
kubectl get certificate -n production
kubectl describe certificate api-example-com-tls -n production
kubectl get certificaterequest -n production

🎯 Scenario: Pods in your cluster can’t resolve service names. Requests fail with “connection refused” or DNS lookup failures.

Answer:

Kubernetes runs CoreDNS as a cluster DNS server. Every pod’s /etc/resolv.conf points to the CoreDNS ClusterIP.

DNS resolution hierarchy:
  my-service                          → searches: default.svc.cluster.local
  my-service.other-ns                 → searches: svc.cluster.local
  my-service.other-ns.svc            → searches: cluster.local
  my-service.other-ns.svc.cluster.local → full FQDN (resolved directly)
# Launch a debug pod with DNS tools
kubectl run dns-debug \
  --image=registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3 \
  --restart=Never -it -- bash

# Inside the pod:
# Check resolv.conf
cat /etc/resolv.conf
# nameserver 10.96.0.10         ← CoreDNS ClusterIP
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

# Test service DNS
nslookup kubernetes.default.svc.cluster.local
nslookup my-service.production.svc.cluster.local

# Test external DNS
nslookup google.com
# Debug CoreDNS
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns --prefix

# Check CoreDNS config
kubectl get configmap coredns -n kube-system -o yaml

# Common CoreDNS issues:
# 1. CoreDNS pods OOMKilled → increase memory limits
# 2. ndots:5 causing slow resolution → external lookups try 6 DNS queries
#    Fix: set ndots:1 in pod dnsConfig for external-heavy workloads
# 3. DNS cache poisoning → use separate upstream resolvers

# Override DNS per pod
apiVersion: v1
kind: Pod
spec:
  dnsConfig:
    options:
    - name: ndots
      value: "1"    # Reduces unnecessary search path queries
    nameservers:
    - 8.8.8.8       # Additional custom nameservers

🎯 Scenario: Security audit requires that only frontend can reach the API, only API can reach the database, and nothing else.

Answer:

# Step 1: Default-deny ALL ingress AND egress in the namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}     # Applies to ALL pods
  policyTypes:
  - Ingress
  - Egress
# Step 2: Allow frontend → API (port 8080 only)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080
# Step 3: Allow API → PostgreSQL (port 5432 only)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-postgres
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: postgres
  policyTypes:
  - Ingress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api-server
    ports:
    - protocol: TCP
      port: 5432
# Step 4: Allow all pods to query CoreDNS (essential!)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - protocol: UDP
      port: 53
    - protocol: TCP
      port: 53
# Step 5: Allow API egress to external services
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-external-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api-server
  policyTypes:
  - Egress
  egress:
  - ports:
    - protocol: TCP
      port: 443    # HTTPS to external APIs
  - to:
    - podSelector:
        matchLabels:
          app: postgres
    ports:
    - protocol: TCP
      port: 5432

⚠️ NetworkPolicies require a CNI plugin that supports them — Calico, Cilium, or Weave Net. The default flannel CNI does not enforce NetworkPolicies. Policies are additive: multiple policies combine with OR logic.

🎯 Scenario: Your app in Kubernetes needs to call an RDS database and a third-party payment API by logical names, not hardcoded endpoints.

Answer:

# ExternalName — DNS alias to external service
# App uses: postgres-prod.production.svc.cluster.local
# Resolves to: prod-db.abc123.us-east-1.rds.amazonaws.com
apiVersion: v1
kind: Service
metadata:
  name: postgres-prod
  namespace: production
spec:
  type: ExternalName
  externalName: prod-db.abc123.us-east-1.rds.amazonaws.com
  # No selector — no pods. Pure DNS CNAME alias.
# Endpoints + Service — point to external IP directly
# Use when external service doesn't have a hostname
apiVersion: v1
kind: Service
metadata:
  name: legacy-payment-api
  namespace: production
spec:
  ports:
  - port: 443
    targetPort: 443
---
apiVersion: v1
kind: Endpoints
metadata:
  name: legacy-payment-api   # Must match Service name
subsets:
- addresses:
  - ip: 10.20.30.40          # External server IP
  - ip: 10.20.30.41
  ports:
  - port: 443
# ServiceEntry (Istio) — fine-grained external service control
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
  name: stripe-api
spec:
  hosts:
  - api.stripe.com
  ports:
  - number: 443
    name: https
    protocol: HTTPS
  resolution: DNS
  location: MESH_EXTERNAL

🎯 Scenario: Your cluster has high traffic and you suspect kube-proxy iptables rules are becoming a performance bottleneck. What are your options?

Answer:

kube-proxy runs on every node and maintains network rules that redirect traffic from Service ClusterIPs to pod IPs.

Three modes:

ModeHow It WorksPerformanceNotes
userspaceProxy in userspace (old)SlowestDeprecated
iptablesKernel netfilter rulesGoodDefault; O(n) rules for n services
ipvsLinux Virtual ServerBestO(1) lookups; for 1000+ services
# Check current kube-proxy mode
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode

# Switch to IPVS mode
kubectl edit configmap kube-proxy -n kube-system
# Set: mode: "ipvs"

# Restart kube-proxy pods to apply
kubectl rollout restart daemonset/kube-proxy -n kube-system

# Verify IPVS rules
ipvsadm -Ln   # Run on a node

# Cilium — can replace kube-proxy entirely (eBPF)
# No iptables, no kube-proxy, direct eBPF kernel datapath
helm install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=true \
  --set k8sServiceHost=<api-server-ip>

💡 For clusters with > 500 services, switch to IPVS mode or use Cilium’s eBPF kube-proxy replacement. iptables rule evaluation is O(n) — performance degrades linearly with service count.

🎯 Scenario: Your compliance team requires all service-to-service communication to be encrypted and mutually authenticated.

Answer:

# Option 1: Istio service mesh — automatic mTLS
helm repo add istio https://istio-release.storage.googleapis.com/charts
helm install istio-base istio/base -n istio-system --create-namespace
helm install istiod istio/istiod -n istio-system

# Label namespace for automatic sidecar injection
kubectl label namespace production istio-injection=enabled

# Enforce strict mTLS across namespace
# Istio PeerAuthentication — enforce mTLS
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: enforce-mtls
  namespace: production
spec:
  mtls:
    mode: STRICT    # STRICT: only mTLS, no plaintext
    # PERMISSIVE: both mTLS and plaintext (migration mode)
    # DISABLE: no mTLS
# Istio AuthorizationPolicy — service-level access control
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: api-server-policy
  namespace: production
spec:
  selector:
    matchLabels:
      app: api-server
  action: ALLOW
  rules:
  - from:
    - source:
        principals:
        # Only allow calls from frontend service account
        - "cluster.local/ns/production/sa/frontend-sa"
    to:
    - operation:
        methods: ["GET", "POST"]
        paths: ["/api/*"]
# Verify mTLS is working
istioctl authn tls-check api-server-pod-xxx.production

# View certificate details
istioctl proxy-config secret api-server-pod-xxx.production

🎯 Scenario: You have a multi-region cluster and want traffic to prefer pods in the same zone to reduce latency and egress costs.

Answer:

# Topology Aware Routing (K8s 1.27+)
apiVersion: v1
kind: Service
metadata:
  name: web-app
  annotations:
    service.kubernetes.io/topology-mode: "Auto"
    # "Auto" — route to same-zone endpoints when possible
    # Fallback to cross-zone if no local endpoints are healthy
spec:
  selector:
    app: web-app
  ports:
  - port: 80
    targetPort: 8080
# TrafficPolicy in Istio (more fine-grained control)
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: web-app
  namespace: production
spec:
  host: web-app
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: UPGRADE
        idleTimeout: 10s
    loadBalancer:
      localityLbSetting:
        enabled: true
        distribute:
        - from: "us-east-1/us-east-1a/*"
          to:
            "us-east-1/us-east-1a/*": 80   # 80% same-AZ
            "us-east-1/us-east-1b/*": 20   # 20% different AZ
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 30s

🎯 Scenario: A developer needs persistent storage for their PostgreSQL pod that survives pod restarts and node failures.

Answer:

Storage abstraction layers:

StorageClass  → defines HOW storage is created (AWS EBS gp3, NFS, local SSD)
      ↓ (dynamic provisioning — auto-creates PV when PVC is created)
PersistentVolume (PV) → the actual storage resource
      ↓ (binding — K8s matches PVC to PV)
PersistentVolumeClaim (PVC) → app's request for storage
      ↓ (mount)
Pod → uses PVC as a volume
# StorageClass — tells K8s how to provision storage
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  iops: "3000"
  throughput: "125"
  encrypted: "true"
reclaimPolicy: Retain                    # Keep EBS volume when PVC deleted
allowVolumeExpansion: true               # Allow PVC resize
volumeBindingMode: WaitForFirstConsumer  # Create volume in same AZ as pod
# PVC — developer requests storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: production
spec:
  accessModes:
  - ReadWriteOnce      # RWO: one node read-write
  # ReadWriteMany      # RWX: multiple nodes read-write (NFS, EFS)
  # ReadOnlyMany       # ROX: multiple nodes read-only
  storageClassName: fast-ssd
  resources:
    requests:
      storage: 100Gi
# Pod using the PVC
apiVersion: v1
kind: Pod
spec:
  containers:
  - name: postgres
    image: postgres:15
    volumeMounts:
    - name: data
      mountPath: /var/lib/postgresql/data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: postgres-data
# Check PV/PVC binding status
kubectl get pv,pvc -n production

# STATUS lifecycle:
# PVC: Pending → Bound → Lost (if PV deleted)
# PV:  Available → Bound → Released → Reclaimed/Retained

🎯 Scenario: Your PostgreSQL PVC is 80% full and you need to expand from 100Gi to 200Gi without stopping the database.

Answer:

# Step 1: Verify StorageClass supports expansion
kubectl get storageclass fast-ssd -o jsonpath='{.allowVolumeExpansion}'
# Must return: true

# Step 2: Expand the PVC — just edit the request size
kubectl patch pvc postgres-data -n production \
  -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

# Step 3: Monitor expansion
kubectl get pvc postgres-data -n production -w
# Conditions will show:
# FileSystemResizePending → True (volume resized, waiting for fs resize)
# Then PVC capacity updates to 200Gi

# Step 4: For most CSI drivers — online expansion, no restart needed
# Verify inside the pod
kubectl exec -it postgres-pod -n production -- df -h /var/lib/postgresql/data
# If driver requires pod restart (older drivers):
# StatefulSet will automatically recreate the pod after deletion
kubectl delete pod postgres-0 -n production
# kubelet will resize filesystem during pod startup

⚠️ Volume expansion is one-directional — you can only increase, never decrease PVC size. The underlying cloud volume may charge for the new size immediately.

🎯 Scenario: You need a robust backup strategy for databases running in your cluster.

Answer:

# Option A: Application-level backup (recommended for databases)
# PostgreSQL nightly backup to S3
apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
  namespace: production
spec:
  schedule: "0 2 * * *"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: backup-sa
          restartPolicy: OnFailure
          containers:
          - name: pg-backup
            image: postgres:15-alpine
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: pg-secret
                  key: password
            - name: BACKUP_BUCKET
              value: s3://my-company-backups
            command:
            - /bin/sh
            - -c
            - |
              DATE=$(date +%Y%m%d_%H%M%S)
              pg_dump -h postgres-service -U postgres -d mydb \
                | gzip \
                | aws s3 cp - ${BACKUP_BUCKET}/postgres/${DATE}.sql.gz
              echo "Backup completed: ${DATE}"
# Option B: Velero — Kubernetes-native backup (resources + volumes)

# Install Velero with AWS plugin
velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.7.0 \
  --bucket my-velero-backups \
  --backup-location-config region=us-east-1 \
  --snapshot-location-config region=us-east-1 \
  --use-restic   # For PVC backup

# Create immediate backup
velero backup create prod-backup-$(date +%Y%m%d) \
  --include-namespaces production \
  --include-resources pods,deployments,statefulsets,pvc,secrets,configmaps

# Schedule automated backups
velero schedule create daily-prod \
  --schedule="0 3 * * *" \
  --include-namespaces production \
  --ttl 720h    # Keep for 30 days

# List backups
velero backup get

# Restore from backup
velero restore create --from-backup prod-backup-20240101

# Restore specific namespace only
velero restore create \
  --from-backup prod-backup-20240101 \
  --include-namespaces production

🎯 Scenario: Your team wants to use AWS EFS for shared storage (ReadWriteMany) across multiple pods.

Answer:

CSI (Container Storage Interface) is the standard API for storage plugins in Kubernetes. Cloud providers and storage vendors implement CSI drivers.

# Install AWS EFS CSI driver
helm repo add aws-efs-csi-driver https://kubernetes-sigs.github.io/aws-efs-csi-driver/
helm install aws-efs-csi-driver aws-efs-csi-driver/aws-efs-csi-driver \
  --namespace kube-system \
  --set controller.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789:role/efs-csi-role
# StorageClass for EFS (ReadWriteMany)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-0abc123def456     # Your EFS filesystem ID
  directoryPerms: "700"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"
# PVC using EFS — ReadWriteMany (multiple pods can write simultaneously)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-storage
  namespace: production
spec:
  accessModes:
  - ReadWriteMany      # Multiple pods across multiple nodes can write
  storageClassName: efs-sc
  resources:
    requests:
      storage: 100Gi
# Multiple pods sharing the same EFS PVC
apiVersion: apps/v1
kind: Deployment
metadata:
  name: media-processor
spec:
  replicas: 5
  template:
    spec:
      containers:
      - name: processor
        image: media-processor:v1
        volumeMounts:
        - name: shared-media
          mountPath: /data/media
      volumes:
      - name: shared-media
        persistentVolumeClaim:
          claimName: shared-storage  # All 5 replicas share this PVC

🎯 Scenario: You’re running a 3-node Elasticsearch cluster across 3 AZs. How do you ensure each node gets storage in the correct AZ?

Answer:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: search
spec:
  serviceName: elasticsearch-headless
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      # Spread pods evenly across AZs
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: elasticsearch

      containers:
      - name: elasticsearch
        image: elasticsearch:8.11.0
        env:
        - name: node.name
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: cluster.initial_master_nodes
          value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
        - name: ES_JAVA_OPTS
          value: "-Xms2g -Xmx2g"
        resources:
          requests:
            cpu: "1"
            memory: 4Gi
          limits:
            cpu: "2"
            memory: 4Gi
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data

  # Each pod gets its own PVC — provisioned in the same AZ as the pod
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: gp3-wait-for-consumer  # WaitForFirstConsumer binding!
      resources:
        requests:
          storage: 500Gi
# Critical: WaitForFirstConsumer binding mode
# Without this, PV is created in a random AZ — pod scheduling fails!
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: gp3-wait-for-consumer
provisioner: ebs.csi.aws.com
parameters:
  type: gp3
  encrypted: "true"
volumeBindingMode: WaitForFirstConsumer   # ← Waits for pod to schedule, then creates EBS in same AZ
allowVolumeExpansion: true
reclaimPolicy: Retain

🎯 Scenario: You need to migrate your PostgreSQL data from a gp2 EBS volume to gp3 without any data loss or extended downtime.

Answer:

# Method 1: Snapshot-based migration (preferred for large datasets)

# Step 1: Create a VolumeSnapshot of the source PVC
kubectl apply -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-data-snapshot
  namespace: production
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: postgres-data-old
EOF

# Step 2: Wait for snapshot to be ready
kubectl get volumesnapshot postgres-data-snapshot -n production -w

# Step 3: Create new PVC from snapshot
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data-new
  namespace: production
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: fast-ssd-gp3
  resources:
    requests:
      storage: 100Gi
  dataSource:
    name: postgres-data-snapshot
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
EOF
# Method 2: rsync migration (for smaller datasets or when snapshots unavailable)

# Step 1: Deploy a migration pod that mounts both PVCs
kubectl apply -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: pvc-migrator
  namespace: production
spec:
  restartPolicy: Never
  containers:
  - name: migrator
    image: alpine:latest
    command:
    - sh
    - -c
    - |
      apk add --no-cache rsync
      rsync -avz --progress /source/ /destination/
      echo "Migration complete!"
    volumeMounts:
    - name: source
      mountPath: /source
      readOnly: true
    - name: destination
      mountPath: /destination
  volumes:
  - name: source
    persistentVolumeClaim:
      claimName: postgres-data-old
  - name: destination
    persistentVolumeClaim:
      claimName: postgres-data-new
EOF

kubectl logs -f pvc-migrator -n production

🎯 Scenario: Your app needs both non-sensitive configuration (feature flags, timeouts) and sensitive data (DB credentials, API keys). How do you manage both?

Answer:

# ConfigMap — non-sensitive configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  LOG_LEVEL: "info"
  MAX_CONNECTIONS: "100"
  CACHE_TTL: "300"
  FEATURE_NEW_UI: "true"

  # Multi-line config file
  app.yaml: |
    server:
      port: 8080
      timeout: 30s
    database:
      pool_size: 20
      max_idle: 5

  nginx.conf: |
    upstream backend { server 127.0.0.1:8080; }
    server {
      listen 80;
      location / { proxy_pass http://backend; }
    }
# Secret — sensitive data (base64-encoded, but NOT encrypted by default)
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: production
type: Opaque
stringData:           # stringData auto-base64-encodes
  DB_PASSWORD: "super-secret-password"
  JWT_SIGNING_KEY: "very-long-random-secret-key"
  tls.crt: |
    -----BEGIN CERTIFICATE-----
    MIID...
    -----END CERTIFICATE-----
# Use in a Deployment
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      - name: app
        image: myapp:v2.0

        # Inject all ConfigMap keys as env vars
        envFrom:
        - configMapRef:
            name: app-config
        # Inject all Secret keys as env vars
        - secretRef:
            name: app-secrets

        # Or inject individual keys
        env:
        - name: DB_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: DB_PASSWORD

        # Mount ConfigMap as config files
        volumeMounts:
        - name: app-config-vol
          mountPath: /etc/app
          readOnly: true
        - name: secrets-vol
          mountPath: /etc/secrets
          readOnly: true

      volumes:
      - name: app-config-vol
        configMap:
          name: app-config
          items:                     # Mount only specific keys
          - key: app.yaml
            path: config.yaml       # Filename in container
      - name: secrets-vol
        secret:
          secretName: app-secrets
          defaultMode: 0400          # Owner read-only

⚠️ Kubernetes Secrets are only base64-encoded, NOT encrypted at rest! Anyone with RBAC access to read secrets can decode them. For production: enable EncryptionConfiguration on etcd AND use External Secrets Operator with AWS Secrets Manager, GCP Secret Manager, or HashiCorp Vault.

🎯 Scenario: Your security policy forbids storing secrets in Kubernetes. All secrets must live in AWS Secrets Manager and be synced automatically.

Answer:

# Install External Secrets Operator
helm repo add external-secrets https://charts.external-secrets.io
helm install external-secrets external-secrets/external-secrets \
  --namespace external-secrets \
  --create-namespace \
  --set installCRDs=true
# SecretStore — configure AWS Secrets Manager connection
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: aws-secretsmanager
  namespace: production
spec:
  provider:
    aws:
      service: SecretsManager
      region: us-east-1
      auth:
        jwt:
          serviceAccountRef:
            name: external-secrets-sa   # Uses IRSA — no static keys!
# ExternalSecret — sync a specific secret from AWS → K8s
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: database-credentials
  namespace: production
spec:
  refreshInterval: 1h         # Re-sync from AWS every hour
  secretStoreRef:
    name: aws-secretsmanager
    kind: SecretStore
  target:
    name: database-credentials  # K8s Secret that gets created
    creationPolicy: Owner        # ESO owns this secret
    template:
      type: Opaque
      engineVersion: v2
  data:
  - secretKey: DB_PASSWORD       # Key in K8s Secret
    remoteRef:
      key: prod/myapp/database   # AWS SM secret name
      property: password         # JSON field within secret
  - secretKey: DB_USERNAME
    remoteRef:
      key: prod/myapp/database
      property: username
  # Bulk import all fields from a JSON secret
  dataFrom:
  - extract:
      key: prod/myapp/all-secrets
# Verify sync status
kubectl get externalsecret database-credentials -n production
# STATUS column shows: SecretSynced or error

kubectl describe externalsecret database-credentials -n production

🎯 Scenario: Your database password is compromised. You need to rotate it immediately with zero application downtime.

Answer:

# Rotation strategy for env-var injected secrets (requires pod restart):

# Step 1: Update secret in database first (allow both old and new password)
# Step 2: Update the K8s secret
kubectl create secret generic db-secret \
  --from-literal=DB_PASSWORD=new-secure-password-v2 \
  --dry-run=client -o yaml | kubectl apply -f -

# Step 3: Rolling restart — zero-downtime with 4 replicas and maxUnavailable=0
kubectl rollout restart deployment/api-server -n production
kubectl rollout status deployment/api-server -n production

# Step 4: Remove old password from database after all pods are updated
# Step 5: Verify no old-password connections in DB
# For volume-mounted secrets — automatic file update (no restart needed)
# K8s automatically updates mounted secret files within ~1 minute
# App must watch for file changes and reload

# Check file was updated in pod
kubectl exec -it api-server-xxx -- cat /etc/secrets/DB_PASSWORD

# Force immediate update (before kubelet sync)
# Annotate pod to trigger kubelet reconcile
kubectl annotate pod api-server-xxx rotation-timestamp=$(date +%s)
# Best practice: version your secrets
apiVersion: v1
kind: Secret
metadata:
  name: db-secret-v2          # Include version in name
  namespace: production
  annotations:
    rotation-date: "2024-01-15"
    rotated-by: "security-team"

🎯 Scenario: Your base application YAML works for dev, but production needs 5 replicas, different resource limits, a different image tag, and production database URLs.

Answer:

Directory structure:
k8s/
├── base/
│   ├── deployment.yaml
│   ├── service.yaml
│   └── kustomization.yaml
└── overlays/
    ├── dev/
    │   ├── kustomization.yaml
    │   └── dev-patch.yaml
    ├── staging/
    │   └── kustomization.yaml
    └── production/
        ├── kustomization.yaml
        ├── prod-patch.yaml
        └── prod-configmap.yaml
# base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
commonLabels:
  app: web-api
  managed-by: kustomize
# base/deployment.yaml (minimal baseline)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: api
        image: myapp:latest
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
# overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
namespace: production
images:
- name: myapp
  newTag: "v2.1.0-prod"    # Override image tag
patches:
- path: prod-patch.yaml
configMapGenerator:
- name: app-config
  literals:
  - DB_URL=postgresql://prod-db.internal:5432/myapp
  - LOG_LEVEL=warn
  - REPLICAS=5
# overlays/production/prod-patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-api
spec:
  replicas: 5
  template:
    spec:
      containers:
      - name: api
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: "2"
            memory: 1Gi
# Preview production output
kubectl kustomize overlays/production/

# Apply production config
kubectl apply -k overlays/production/

# Apply dev config
kubectl apply -k overlays/dev/

🎯 Scenario: A security audit flags that Kubernetes Secrets are stored in plaintext in etcd. How do you fix this?

Answer:

# /etc/kubernetes/enc/encryption-config.yaml
# Create this on the control plane node

apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
  - secrets
  - configmaps         # Optionally encrypt ConfigMaps too
  providers:
  - aescbc:            # AES-CBC encryption
      keys:
      - name: key1
        secret: <base64-encoded-32-byte-key>  # openssl rand -base64 32
  - identity: {}       # Fallback: allows reading unencrypted secrets
                       # Remove this after re-encrypting all existing secrets
# Generate a 32-byte key
head -c 32 /dev/urandom | base64

# Add to kube-apiserver flags (in /etc/kubernetes/manifests/kube-apiserver.yaml)
# --encryption-provider-config=/etc/kubernetes/enc/encryption-config.yaml

# Mount the config file in the apiserver static pod
# volumeMounts:
# - name: enc
#   mountPath: /etc/kubernetes/enc
#   readOnly: true
# volumes:
# - name: enc
#   hostPath:
#     path: /etc/kubernetes/enc
#     type: DirectoryOrCreate

# After apiserver restarts, re-encrypt all existing secrets
kubectl get secrets -A -o json | kubectl replace -f -

# Verify a secret is encrypted in etcd
ETCDCTL_API=3 etcdctl get /registry/secrets/default/my-secret \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  | hexdump -C | head
# Should show: k8s:enc:aescbc:v1:key1:... (encrypted, not plaintext)

🎯 Scenario: Your team uses GitOps (all config in Git), but Kubernetes Secrets can’t be committed to Git as they’re only base64-encoded. How do you solve this?

Answer:

Sealed Secrets encrypts K8s Secrets with a public key. The encrypted SealedSecret is safe to commit to Git. Only the in-cluster controller can decrypt it.

# Install Sealed Secrets controller
helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
helm install sealed-secrets sealed-secrets/sealed-secrets \
  --namespace kube-system

# Install kubeseal CLI
brew install kubeseal

# Fetch the public key (for offline use)
kubeseal --fetch-cert > pub-cert.pem
# Create a SealedSecret from a regular Secret
kubectl create secret generic db-secret \
  --from-literal=DB_PASSWORD=my-super-secret \
  --dry-run=client -o yaml \
  | kubeseal \
  --cert pub-cert.pem \
  --format yaml > sealed-db-secret.yaml

# Now commit sealed-db-secret.yaml to Git — it's safe!
git add sealed-db-secret.yaml
git commit -m "Add sealed DB secret"

# Apply to cluster — controller decrypts and creates regular Secret
kubectl apply -f sealed-db-secret.yaml

# Verify Secret was created
kubectl get secret db-secret
# sealed-db-secret.yaml (safe to commit to Git)
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: db-secret
  namespace: production
spec:
  encryptedData:
    DB_PASSWORD: AgB7x9... (long base64 encrypted string)
  template:
    metadata:
      name: db-secret
      namespace: production
    type: Opaque

🎯 Scenario: Your cluster nodes are running out of memory and pods are getting OOMKilled randomly. How do you properly configure resources?

Answer:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: app
    image: myapp:v2.0
    resources:
      requests:
        cpu: "250m"     # Used for SCHEDULING: node must have 250m CPU free
        memory: "512Mi" # Used for SCHEDULING: node must have 512Mi RAM free
      limits:
        cpu: "1000m"    # THROTTLED if exceeded (never killed for CPU)
        memory: "1Gi"   # KILLED (OOMKill) if exceeded
CPU behavior:
  Request → Scheduling guarantee + proportional share
  Limit   → Hard throttle (CPU is compressible — pod slows down but doesn't die)

Memory behavior:
  Request → Scheduling guarantee
  Limit   → Hard kill (memory is incompressible — OOMKill if exceeded)

Quality of Service (QoS) classes — affects eviction priority:

# Guaranteed QoS — requests == limits (both set, equal values)
# LAST to be evicted under node pressure
resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "500m"      # Same as request
    memory: "512Mi"  # Same as request
# Burstable QoS — requests < limits
# Middle priority for eviction
resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"
# BestEffort QoS — no requests or limits set at all
# FIRST to be evicted under node pressure
# Never use for production workloads
# Find pods without resource limits (dangerous!)
kubectl get pods -A -o json | jq -r '
  .items[] |
  select(.spec.containers[].resources.limits == null) |
  "\(.metadata.namespace)/\(.metadata.name)"
'

# Check actual resource usage
kubectl top pods -A --sort-by=memory
kubectl top nodes

🎯 Scenario: You have GPU nodes for ML and spot nodes for batch workloads. Critical services must run on on-demand nodes only.

Answer:

# Label nodes by type
kubectl label node gpu-node-1 accelerator=nvidia-v100
kubectl label node spot-node-1 lifecycle=spot
kubectl label node ondemand-1 lifecycle=on-demand

# Taint GPU nodes (repel all pods that don't explicitly tolerate it)
kubectl taint node gpu-node-1 gpu=true:NoSchedule

# Taint spot nodes (warn pods — they can be interrupted)
kubectl taint node spot-node-1 spot=true:NoSchedule
# ML Training Job — targets GPU nodes via toleration + nodeSelector
apiVersion: v1
kind: Pod
metadata:
  name: ml-training
spec:
  tolerations:
  - key: "gpu"
    operator: "Equal"
    value: "true"
    effect: "NoSchedule"

  affinity:
    nodeAffinity:
      # HARD requirement — must run on GPU nodes
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: accelerator
            operator: In
            values: ["nvidia-v100", "nvidia-a100"]

  containers:
  - name: trainer
    image: tensorflow/tensorflow:latest-gpu
    resources:
      limits:
        nvidia.com/gpu: 2   # Request 2 GPUs
# Batch Job — prefers spot nodes, tolerates interruption
apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      tolerations:
      - key: "spot"
        operator: "Equal"
        value: "true"
        effect: "NoSchedule"

      affinity:
        nodeAffinity:
          # SOFT preference — prefer spot but fallback to on-demand
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: lifecycle
                operator: In
                values: ["spot"]
# Critical API — must run on on-demand, spread across nodes
apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 6
  template:
    spec:
      affinity:
        # HARD: must be on on-demand
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: lifecycle
                operator: In
                values: ["on-demand"]
        # HARD: spread across different nodes
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: critical-api
            topologyKey: kubernetes.io/hostname

🎯 Scenario: Your API handles traffic spikes. You want it to scale from 2 to 20 pods based on CPU and a custom requests-per-second metric.

Answer:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  metrics:
  # Built-in: scale on CPU utilization
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70    # Scale when average CPU > 70%
  # Built-in: scale on memory
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 400Mi
  # Custom metric from Prometheus Adapter
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"     # Scale when >1000 RPS per pod
  # External metric (e.g., SQS queue depth)
  - type: External
    external:
      metric:
        name: sqs_queue_depth
        selector:
          matchLabels:
            queue: orders-queue
      target:
        type: Value
        value: "500"

  # Scaling behavior — prevents thrashing
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60      # Wait 60s before scaling up again
      policies:
      - type: Pods
        value: 4                          # Add at most 4 pods per 60s
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300     # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 10                         # Remove at most 10% per 60s
        periodSeconds: 60
# Check HPA status
kubectl get hpa -n production
# TARGETS: 75%/70%  means current=75%, target=70% → will scale up

kubectl describe hpa api-hpa -n production
# Shows scaling events and decisions

# HPA requires Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

🎯 Scenario: You have a consumer pod that processes SQS messages. You want it to scale based on queue depth, including scaling to zero when the queue is empty.

Answer:

# Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda \
  --namespace keda \
  --create-namespace
# ScaledObject — scale Deployment based on SQS queue depth
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: sqs-consumer-scaler
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sqs-consumer
  minReplicaCount: 0    # Scale to ZERO when queue is empty!
  maxReplicaCount: 50
  pollingInterval: 15   # Check queue every 15 seconds
  cooldownPeriod: 60    # Wait 60s before scaling to zero
  triggers:
  - type: aws-sqs-queue
    authenticationRef:
      name: keda-aws-auth
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders-queue
      queueLength: "10"       # 1 pod per 10 messages
      awsRegion: us-east-1
      identityOwner: operator
# ScaledJob — scale Jobs based on Kafka topic lag
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
  name: kafka-processor
  namespace: production
spec:
  jobTargetRef:
    template:
      spec:
        restartPolicy: OnFailure
        containers:
        - name: processor
          image: kafka-processor:v1
  triggers:
  - type: kafka
    metadata:
      bootstrapServers: kafka.kafka.svc.cluster.local:9092
      consumerGroup: my-consumer-group
      topic: orders-topic
      lagThreshold: "100"   # 1 pod per 100 messages lag
# Cron-based scaling — scale to 0 at night for dev clusters
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dev-cron-scaler
  namespace: development
spec:
  scaleTargetRef:
    name: dev-api
  minReplicaCount: 0
  maxReplicaCount: 3
  triggers:
  - type: cron
    metadata:
      timezone: "America/New_York"
      start: "0 8 * * 1-5"    # Scale up 8 AM weekdays
      end: "0 20 * * 1-5"     # Scale to 0 at 8 PM weekdays
      desiredReplicas: "3"

🎯 Scenario: You suspect your pods are over-provisioned (costing money) or under-provisioned (causing OOMKills). How do you determine the right sizing?

Answer:

# VPA in "Off" mode — recommendations only, don't apply automatically
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"      # Off / Initial / Recreate / Auto
    # Off      = show recommendations only
    # Initial  = apply only when pod is first created
    # Recreate = apply and restart pods when recommendations change significantly
    # Auto     = same as Recreate currently (may drain and recreate in future)
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: "4"
        memory: 4Gi
      controlledResources: ["cpu", "memory"]
# After running VPA in "Off" mode for 24-48 hours, check recommendations
kubectl describe vpa api-vpa -n production

# Output shows:
# Container Recommendations:
#   Container Name: api
#     Lower Bound:
#       Cpu:    100m
#       Memory: 256Mi
#     Target:                  ← USE THIS for requests
#       Cpu:    250m
#       Memory: 512Mi
#     Upper Bound:
#       Cpu:    500m
#       Memory: 1Gi
#     Uncapped Target:
#       Cpu:    250m
#       Memory: 512Mi

⚠️ Don’t use VPA + HPA on the same CPU/memory metric simultaneously — they fight each other. Use VPA to right-size, then use HPA based on custom metrics (RPS, queue depth).

🎯 Scenario: Your 6-replica deployment must have exactly 2 pods per availability zone for fault tolerance.

Answer:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ha-web-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: ha-web-app
  template:
    metadata:
      labels:
        app: ha-web-app
    spec:
      topologySpreadConstraints:
      # Spread evenly across AZs
      - maxSkew: 1                           # Max pods difference between zones
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule      # Hard requirement
        labelSelector:
          matchLabels:
            app: ha-web-app
        matchLabelKeys:
        - pod-template-hash                   # K8s 1.27+: only count same deployment
        minDomains: 3                         # Require at least 3 zones to exist

      # Also spread across nodes within each zone
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway     # Soft preference for nodes
        labelSelector:
          matchLabels:
            app: ha-web-app
# Verify pod distribution
kubectl get pods -o wide -l app=ha-web-app

# Check which zone each node is in
kubectl get nodes -L topology.kubernetes.io/zone

# Expected output: 2 pods in each of us-east-1a, us-east-1b, us-east-1c

🎯 Scenario: Your cluster runs out of nodes during peak hours, and pods sit Pending. But at night there are underutilized nodes wasting money.

Answer:

# Install Cluster Autoscaler for EKS
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=my-cluster \
  --set awsRegion=us-east-1 \
  --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::123456789:role/ClusterAutoscalerRole
# Annotate node groups for Cluster Autoscaler discovery
# (on the EC2 AutoScaling Group in AWS)
# k8s.io/cluster-autoscaler/enabled: "true"
# k8s.io/cluster-autoscaler/<cluster-name>: "owned"

# Cluster Autoscaler behavior:
# SCALE UP:  Pod is Pending due to insufficient resources → adds nodes
# SCALE DOWN: Node utilization < 50% for 10min → drains and terminates node
#             (only if all pods can be rescheduled elsewhere)
# Prevent specific pods from being evicted during scale-down
apiVersion: v1
kind: Pod
metadata:
  name: critical-job
  annotations:
    cluster-autoscaler.kubernetes.io/safe-to-evict: "false"
spec:
  containers:
  - name: critical-job
    image: long-running-job:v1
# Check Cluster Autoscaler status and decisions
kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

# View scale events
kubectl get events -n kube-system | grep cluster-autoscaler

🎯 Scenario: A developer needs to view pods, logs, and deployments in the staging namespace but must not modify anything.

Answer:

# Role — namespace-scoped permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer-readonly
  namespace: staging
rules:
# View pods and their logs
- apiGroups: [""]
  resources: ["pods", "pods/log", "pods/status"]
  verbs: ["get", "list", "watch"]
# View services, configmaps, events
- apiGroups: [""]
  resources: ["services", "endpoints", "configmaps",
               "events", "persistentvolumeclaims", "replicationcontrollers"]
  verbs: ["get", "list", "watch"]
# View deployments, replicasets
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets", "statefulsets",
               "daemonsets", "replicationcontrollers"]
  verbs: ["get", "list", "watch"]
# View jobs and cronjobs
- apiGroups: ["batch"]
  resources: ["jobs", "cronjobs"]
  verbs: ["get", "list", "watch"]
# View HPA
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["get", "list", "watch"]
# Explicitly NO exec, portforward, or secret access
# RoleBinding — assigns Role to user/group
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-readonly-binding
  namespace: staging
subjects:
- kind: User
  name: jane.doe@company.com
  apiGroup: rbac.authorization.k8s.io
- kind: Group
  name: dev-team                    # All members of this group
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer-readonly
  apiGroup: rbac.authorization.k8s.io
# Verify permissions
kubectl auth can-i list pods \
  --namespace=staging --as=jane.doe@company.com      # → yes

kubectl auth can-i delete deployments \
  --namespace=staging --as=jane.doe@company.com      # → no

kubectl auth can-i exec pods \
  --namespace=staging --as=jane.doe@company.com      # → no

# List all permissions for a user
kubectl auth can-i --list --namespace=staging \
  --as=jane.doe@company.com

🎯 Scenario: Your app pod needs to read from an S3 bucket and write to DynamoDB. How do you give it AWS permissions without static credentials?

Answer:

IRSA (IAM Roles for Service Accounts) binds a Kubernetes ServiceAccount to an AWS IAM role using OIDC federation. No static AWS keys in pods.

# Step 1: Create IAM OIDC provider for the EKS cluster
eksctl utils associate-iam-oidc-provider \
  --cluster my-cluster \
  --region us-east-1 \
  --approve

# Step 2: Create IAM role with trust policy for the ServiceAccount
OIDC_ISSUER=$(aws eks describe-cluster --name my-cluster \
  --query "cluster.identity.oidc.issuer" --output text)

aws iam create-role \
  --role-name MyAppRole \
  --assume-role-policy-document "{
    \"Version\": \"2012-10-17\",
    \"Statement\": [{
      \"Effect\": \"Allow\",
      \"Principal\": {\"Federated\": \"arn:aws:iam::123456789:oidc-provider/${OIDC_ISSUER#*//}\"},
      \"Action\": \"sts:AssumeRoleWithWebIdentity\",
      \"Condition\": {
        \"StringEquals\": {
          \"${OIDC_ISSUER#*//}:sub\": \"system:serviceaccount:production:my-app-sa\"
        }
      }
    }]
  }"

# Step 3: Attach minimal IAM policy
aws iam put-role-policy --role-name MyAppRole \
  --policy-name MyAppPolicy \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": ["s3:GetObject", "s3:ListBucket"],
        "Resource": ["arn:aws:s3:::my-app-bucket", "arn:aws:s3:::my-app-bucket/*"]
      },
      {
        "Effect": "Allow",
        "Action": ["dynamodb:PutItem", "dynamodb:GetItem", "dynamodb:Query"],
        "Resource": "arn:aws:dynamodb:us-east-1:123456789:table/MyTable"
      }
    ]
  }'
# Step 4: Create annotated ServiceAccount in K8s
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-sa
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: "arn:aws:iam::123456789:role/MyAppRole"
    eks.amazonaws.com/token-expiration: "86400"  # 24h token expiry
# Step 5: Use ServiceAccount in Deployment
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      serviceAccountName: my-app-sa   # Automatically gets AWS credentials
      containers:
      - name: app
        image: myapp:v2.0
        # AWS SDK auto-discovers credentials from the projected volume
        # No AWS_ACCESS_KEY_ID or AWS_SECRET_ACCESS_KEY needed!

🎯 Scenario: Security audit requires that no pods in production run as root, use privileged mode, or mount host paths.

Answer:

# Enforce Pod Security Standards at namespace level (K8s 1.25+, built-in)
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    # enforce: deny pods that violate the policy
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.28
    # warn: allow but show warning (use during migration)
    pod-security.kubernetes.io/warn: restricted
    # audit: record in audit log
    pod-security.kubernetes.io/audit: restricted

Three built-in policy levels:

LevelRestrictions
privilegedNo restrictions — for trusted system pods
baselineMinimal restrictions — blocks privileged, hostNetwork, hostPID
restrictedMost secure — non-root, read-only FS, dropped capabilities, seccomp
# Pod that satisfies the "restricted" policy
apiVersion: v1
kind: Pod
metadata:
  name: secure-app
  namespace: production
spec:
  securityContext:
    runAsNonRoot: true           # Must not run as root
    runAsUser: 10000             # Specific non-root UID
    runAsGroup: 10000
    fsGroup: 10000
    seccompProfile:
      type: RuntimeDefault       # Default seccomp profile required

  containers:
  - name: app
    image: myapp:v2.0
    securityContext:
      allowPrivilegeEscalation: false    # Cannot gain more privileges
      readOnlyRootFilesystem: true       # Immutable container filesystem
      runAsNonRoot: true
      capabilities:
        drop: ["ALL"]                    # Drop all Linux capabilities
        add: ["NET_BIND_SERVICE"]        # Only add what's needed

    volumeMounts:
    - name: tmp-dir                      # Writable temp dir (since rootFS is RO)
      mountPath: /tmp
    - name: cache-dir
      mountPath: /app/cache

  volumes:
  - name: tmp-dir
    emptyDir: {}
  - name: cache-dir
    emptyDir: {}

🎯 Scenario: Someone deleted a production deployment. How do you trace exactly who did it, when, and from where?

Answer:

# Audit policy — what to log and at what verbosity
# /etc/kubernetes/audit-policy.yaml (on control plane)
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- RequestReceived   # Don't log every incoming request
rules:
# Log all modifications to critical resources in detail
- level: RequestResponse
  verbs: ["create", "update", "patch", "delete"]
  resources:
  - group: "apps"
    resources: ["deployments", "statefulsets", "daemonsets"]
  - group: ""
    resources: ["secrets", "configmaps", "serviceaccounts"]
  namespaces: ["production", "staging"]

# Log who accessed secrets (metadata only — don't log secret values)
- level: Metadata
  verbs: ["get", "list", "watch"]
  resources:
  - group: ""
    resources: ["secrets"]

# Log Node/ServiceAccount auth issues
- level: Metadata
  users: ["system:anonymous"]

# Don't log health check noise
- level: None
  nonResourceURLs: ["/healthz*", "/readyz*", "/livez*"]

# Default: log metadata for everything else
- level: Metadata
# Enable in kube-apiserver (add to /etc/kubernetes/manifests/kube-apiserver.yaml)
# --audit-policy-file=/etc/kubernetes/audit-policy.yaml
# --audit-log-path=/var/log/kubernetes/audit.log
# --audit-log-maxage=30
# --audit-log-maxbackup=10
# --audit-log-maxsize=100   # 100MB per file

# Search for who deleted the deployment
cat /var/log/kubernetes/audit.log | \
  jq 'select(.verb=="delete" and .objectRef.resource=="deployments" and .objectRef.name=="web-app")' | \
  jq '{time:.requestReceivedTimestamp, user:.user.username, userAgent:.userAgent, sourceIP:.sourceIPs[0]}'

# Output: {"time":"2024-01-15T14:23:07Z", "user":"john.doe", "userAgent":"kubectl/v1.28.0", "sourceIP":"10.0.1.5"}

🎯 Scenario: You want to enforce that all pods have resource limits set, all images come from your private registry, and all namespaces have a required label.

Answer:

# Install Gatekeeper
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml
# ConstraintTemplate — defines the policy logic in Rego
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlimits
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLimits
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8srequiredlimits

      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        not container.resources.limits.memory
        msg := sprintf("Container '%v' must have memory limits set", [container.name])
      }

      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        not container.resources.limits.cpu
        msg := sprintf("Container '%v' must have CPU limits set", [container.name])
      }
# Constraint — applies the template as an actual policy
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLimits
metadata:
  name: require-resource-limits
spec:
  enforcementAction: deny    # deny / warn / dryrun
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
    namespaces: ["production", "staging"]
    excludedNamespaces: ["kube-system", "monitoring"]
# Policy: images must come from approved registries
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8sallowedrepos
spec:
  crd:
    spec:
      names:
        kind: K8sAllowedRepos
      validation:
        openAPIV3Schema:
          properties:
            repos:
              type: array
              items:
                type: string
  targets:
  - target: admission.k8s.gatekeeper.sh
    rego: |
      package k8sallowedrepos

      violation[{"msg": msg}] {
        container := input.review.object.spec.containers[_]
        satisfied := [good | repo := input.parameters.repos[_]; good := startswith(container.image, repo)]
        not any(satisfied)
        msg := sprintf("Image '%v' is not from an approved registry", [container.image])
      }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: allowed-repos
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    repos:
    - "gcr.io/my-company/"
    - "123456789.dkr.ecr.us-east-1.amazonaws.com/"

🎯 Scenario: You want to prevent unscanned or unsigned container images from being deployed.

Answer:

# GitHub Actions: scan image before push
name: Build and Scan
on: [push]
jobs:
  build-and-scan:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4

    - name: Build image
      run: docker build -t myapp:${{ github.sha }} .

    - name: Scan with Trivy
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: myapp:${{ github.sha }}
        format: sarif
        output: trivy-results.sarif
        severity: CRITICAL,HIGH
        exit-code: 1          # Fail pipeline on CRITICAL/HIGH vulns
        ignore-unfixed: true  # Skip vulns with no fix available

    - name: Sign with Cosign (keyless)
      uses: sigstore/cosign-installer@v3
    - run: |
        cosign sign --yes myapp:${{ github.sha }}
        # Creates a signature stored in OCI registry
# Policy Controller / Connaisseur — enforce signature verification at admission
apiVersion: connaisseur.io/v1beta1
kind: ValidationPolicy
metadata:
  name: require-signed-images
spec:
  validators:
  - name: cosign
    type: cosign
    host: https://sigstore.dev
  policy:
  - pattern: "123456789.dkr.ecr.us-east-1.amazonaws.com/*:*"
    validators:
    - name: cosign
      with:
        key: k8s://cosign-keys/cosign-pub-key

🎯 Scenario: You have dev, staging, and production namespaces on the same cluster. Dev pods must not be able to talk to production services.

Answer:

# Block all cross-namespace traffic INTO production
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-cross-namespace
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  ingress:
  # Only allow traffic from WITHIN production namespace
  - from:
    - podSelector: {}     # Any pod in THIS namespace
  # AND from monitoring namespace (for Prometheus scraping)
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: monitoring
# Allow Ingress controller (lives in ingress-nginx namespace) to reach production
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-controller
  namespace: production
spec:
  podSelector:
    matchLabels:
      tier: frontend          # Only frontend pods accessible externally
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx
    ports:
    - protocol: TCP
      port: 8080

🎯 Scenario: Your cluster has no monitoring. Set up CPU, memory, pod health metrics with dashboards and alerts.

Answer:

# Install kube-prometheus-stack (Prometheus + Grafana + Alertmanager + node-exporter)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.adminPassword=changeme \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=fast-ssd \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi
# ServiceMonitor — tell Prometheus to scrape your app
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: api-server-monitor
  namespace: production
  labels:
    release: monitoring       # Must match Prometheus serviceMonitorSelector
spec:
  selector:
    matchLabels:
      app: api-server
  endpoints:
  - port: metrics
    path: /metrics
    interval: 15s
    scrapeTimeout: 10s
# PrometheusRule — define alerting rules
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: api-server-alerts
  namespace: production
  labels:
    release: monitoring
spec:
  groups:
  - name: api-server
    rules:
    - alert: HighErrorRate
      expr: |
        (rate(http_requests_total{status=~"5..",job="api-server"}[5m]) /
         rate(http_requests_total{job="api-server"}[5m])) > 0.05
      for: 3m
      labels:
        severity: critical
        team: backend
      annotations:
        summary: "High error rate on {{ $labels.pod }}"
        description: "Error rate {{ $value | humanizePercentage }} for 3+ minutes"
        runbook_url: "https://wiki/runbook/high-error-rate"

    - alert: PodCrashLooping
      expr: increase(kube_pod_container_status_restarts_total[1h]) > 3
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"

    - alert: PodMemoryNearLimit
      expr: |
        container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.90
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.pod }} memory at {{ $value | humanizePercentage }} of limit"

    - alert: NodeDiskSpaceLow
      expr: |
        (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.10
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Node {{ $labels.instance }} disk is {{ $value | humanizePercentage }} free"

🎯 Scenario: You deployed a new pod and it’s been in Pending state for 10 minutes. How do you diagnose and fix it?

Answer:

# Step 1: Always start with describe — read Events section
kubectl describe pod <pod-name> -n <namespace>
# Focus on the LAST few lines of the Events section

# ─── COMMON PENDING REASONS AND FIXES ───────────────────────────────

# Error 1: "0/3 nodes are available: 3 Insufficient memory"
# → Node doesn't have enough resources
kubectl describe nodes | grep -A 10 "Allocated resources"
kubectl top nodes
# Fix: Reduce pod requests, scale up node group, or delete unused pods

# Error 2: "0/3 nodes are available: 3 node(s) had taint {key: value}"
# → Pod doesn't tolerate node taints
kubectl get nodes -o custom-columns='NAME:.metadata.name,TAINTS:.spec.taints'
# Fix: Add toleration to pod spec

# Error 3: "0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector"
# → Node labels don't match pod's nodeSelector or nodeAffinity
kubectl get nodes --show-labels
# Fix: Correct nodeSelector/affinity or add missing labels to nodes

# Error 4: "pod has unbound immediate PersistentVolumeClaims"
# → PVC not bound to a PV
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
# Fix: Create StorageClass, create PV, or wait for dynamic provisioning

# Error 5: "persistentvolumeclaim ... not found"
# → PVC doesn't exist at all
# Fix: kubectl apply -f pvc.yaml

# Error 6: "Maximum number of Pods is already running"
# → Node has hit pod limit (default 110 pods/node)
kubectl describe node <node> | grep "pods:"
# Fix: Add more nodes, reduce pods per node, or increase --max-pods kubelet flag
# Check scheduler events specifically
kubectl get events -n <namespace> \
  --field-selector reason=FailedScheduling \
  --sort-by='.lastTimestamp'

# Check if ResourceQuota is blocking
kubectl get resourcequota -n <namespace>
kubectl describe resourcequota -n <namespace>

# Check LimitRange
kubectl get limitrange -n <namespace>
kubectl describe limitrange -n <namespace>

🎯 Scenario: Pod A cannot connect to Service B in the same namespace. How do you diagnose the root cause?

Answer:

# Step 1: Verify pod and service exist
kubectl get pods -l app=service-b -n production
kubectl get svc service-b -n production

# Step 2: Check endpoints — are any pods backing the service?
kubectl get endpoints service-b -n production
# NAME        ENDPOINTS           AGE
# service-b   <none>              5m  ← PROBLEM: empty endpoints

# Diagnose empty endpoints:
# a) Get service selector
kubectl get svc service-b -o jsonpath='{.spec.selector}' -n production
# {"app":"service-b","version":"v1"}

# b) Check if any pods match ALL selector labels
kubectl get pods -n production -l app=service-b,version=v1 --show-labels
# If no pods found → label mismatch!

# c) Check pod readiness (unready pods are excluded from endpoints)
kubectl get pods -n production -l app=service-b
# If READY=0/1 → pod failing readiness probe

# Step 3: Test DNS resolution from pod-a
kubectl exec -it pod-a -n production -- nslookup service-b
kubectl exec -it pod-a -n production -- \
  nslookup service-b.production.svc.cluster.local

# Step 4: Test port connectivity
kubectl exec -it pod-a -n production -- \
  nc -zv service-b 8080
# or:
kubectl exec -it pod-a -n production -- \
  curl -v http://service-b:8080/health

# Step 5: Check NetworkPolicies blocking traffic
kubectl get networkpolicy -n production
kubectl describe networkpolicy <policy-name> -n production

# Step 6: Debug with ephemeral container (K8s 1.23+)
kubectl debug -it pod-a -n production \
  --image=nicolaka/netshoot \
  --target=app-container

# Inside netshoot: full network debugging toolkit
curl -v http://service-b:8080
tcpdump -i eth0 host service-b
nmap -p 8080 service-b

🎯 Scenario: Your cluster is slow and nodes are under pressure. How do you identify the culprit pods?

Answer:

# Check node pressure
kubectl get nodes
# If STATUS shows "MemoryPressure" or "DiskPressure" — node is struggling

kubectl describe node <node-name>
# Look at: Conditions, Allocated Resources, Events

# Find the top CPU consumers
kubectl top pods -A --sort-by=cpu | head -20

# Find the top memory consumers
kubectl top pods -A --sort-by=memory | head -20

# Check container-level usage
kubectl top pods -A --containers --sort-by=memory | head -30

# Find pods near their memory limits (risk of OOMKill)
kubectl get pods -A -o json | jq -r '
  .items[] |
  .metadata.namespace + "/" + .metadata.name
' | while read pod; do
  ns=$(echo $pod | cut -d/ -f1)
  name=$(echo $pod | cut -d/ -f2)
  kubectl top pod $name -n $ns --containers 2>/dev/null
done

# PromQL queries for investigation:
# Top memory consumers:
# topk(10, container_memory_working_set_bytes{container!="POD"})

# Memory usage % of limit:
# container_memory_working_set_bytes / container_spec_memory_limit_bytes > 0.8

# OOMKill history:
# increase(kube_pod_container_status_restarts_total[24h]) > 0
# kube_pod_container_status_last_terminated_reason{reason="OOMKilled"} == 1
# Find and clean up resources wasting cluster capacity
# Deployments with 0 replicas
kubectl get deployments -A --field-selector=spec.replicas=0

# Completed/failed pods not cleaned up
kubectl get pods -A --field-selector=status.phase=Succeeded
kubectl get pods -A --field-selector=status.phase=Failed

# Clean up completed pods older than 1 day
kubectl get pods -A --field-selector=status.phase=Succeeded -o json | \
  jq -r '.items[] | select(.status.startTime < "2024-01-14") | .metadata.namespace + "/" + .metadata.name'

🎯 Scenario: A distroless production container has no shell or debugging tools. You need to debug it in production without restarting it.

Answer:

# Ephemeral containers (K8s 1.23+ GA) — inject a debug container into a running pod
# The debug container shares the same network, PID namespace as the target container

# Basic debug with busybox
kubectl debug -it <pod-name> \
  --image=busybox:1.35 \
  --target=<container-name>   # --target shares PID namespace

# Debug with netshoot (full network toolkit)
kubectl debug -it <pod-name> \
  --image=nicolaka/netshoot \
  --target=main-app

# Debug with a copy of the pod (with modifications)
kubectl debug <pod-name> \
  -it \
  --copy-to=debug-pod \
  --image=myapp:debug \    # Override with debug-enabled image
  --share-processes        # Share PID namespace to see original app process

# Debug a node (runs privileged container with host filesystem access)
kubectl debug node/<node-name> \
  -it \
  --image=ubuntu:22.04

# Inside node debug pod:
# chroot /host   → access the full node filesystem
# systemctl status kubelet
# journalctl -u kubelet -f
# Alternative: inject a debug sidecar via strategic merge patch
kubectl patch pod <pod-name> -n production \
  --patch '{"spec":{"ephemeralContainers":[{
    "name":"debug",
    "image":"nicolaka/netshoot",
    "stdin":true,
    "tty":true,
    "targetContainerName":"main-app"
  }]}}'

kubectl attach <pod-name> -c debug -it

🎯 Scenario: Your API response time is 3 seconds. You have 5 microservices in the call chain. How do you find where the slowness is?

Answer:

# Install Jaeger (distributed tracing)
helm repo add jaegertracing https://jaegertracing.github.io/helm-charts
helm install jaeger jaegertracing/jaeger \
  --namespace tracing \
  --create-namespace \
  --set allInOne.enabled=true \
  --set provisionDataStore.cassandra=false
# OpenTelemetry Collector — collects traces from all services
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otel-collector
  namespace: tracing
spec:
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch: {}
      memory_limiter:
        limit_mib: 400
    exporters:
      jaeger:
        endpoint: jaeger-collector:14250
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [jaeger]
# Python app — instrument with OpenTelemetry
from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(OTLPSpanExporter(endpoint="otel-collector:4317"))
)
trace.set_tracer_provider(provider)
tracer = trace.get_tracer("my-service")

@app.route("/orders")
def get_orders():
    with tracer.start_as_current_span("get-orders") as span:
        span.set_attribute("user.id", user_id)
        # Call downstream services — trace ID propagated automatically
        result = call_inventory_service()
        return result

🎯 Scenario: Your team wants to search and correlate logs across hundreds of pods in Grafana (same tool you use for metrics).

Answer:

# Install Loki + Promtail + Grafana (Loki Stack)
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki-stack grafana/loki-stack \
  --namespace logging \
  --create-namespace \
  --set loki.enabled=true \
  --set promtail.enabled=true \
  --set grafana.enabled=true \
  --set loki.persistence.enabled=true \
  --set loki.persistence.size=50Gi
# Promtail ConfigMap — scrape pod logs with metadata enrichment
apiVersion: v1
kind: ConfigMap
metadata:
  name: promtail-config
  namespace: logging
data:
  promtail.yaml: |
    server:
      http_listen_port: 3101
    clients:
    - url: http://loki:3100/loki/api/v1/push
    scrape_configs:
    - job_name: kubernetes-pods
      kubernetes_sd_configs:
      - role: pod
      pipeline_stages:
      - cri: {}
      - labeldrop:
        - filename
      relabel_configs:
      # Enrich logs with K8s metadata
      - source_labels: [__meta_kubernetes_pod_name]
        target_label: pod
      - source_labels: [__meta_kubernetes_namespace]
        target_label: namespace
      - source_labels: [__meta_kubernetes_pod_label_app]
        target_label: app
      - source_labels: [__meta_kubernetes_pod_container_name]
        target_label: container
# LogQL queries (Loki Query Language) in Grafana:

# View all logs from production namespace
{namespace="production"}

# View error logs from specific app
{app="api-server", namespace="production"} |= "ERROR"

# Count error rate
rate({app="api-server"} |= "ERROR" [5m])

# Parse JSON logs and filter by field
{app="api-server"} | json | status_code >= 500

# View logs from specific pod
{pod="api-server-7d9f4b-xxxx"}

# Correlate with trace ID
{namespace="production"} | json | traceID = "abc123def456"

🎯 Scenario: A node is in NotReady state and pods on it are being evicted. How do you investigate?

Answer:

# Step 1: Check node status
kubectl get nodes
# NAME        STATUS     ROLES    AGE
# node-1      NotReady   <none>   30d  ← Problem

# Step 2: Describe the node — check Conditions
kubectl describe node node-1
# Conditions:
#   MemoryPressure    False   (if True: node is OOM)
#   DiskPressure      False   (if True: disk almost full)
#   PIDPressure       False   (if True: too many processes)
#   Ready             False   ← PROBLEM
#
# Events:
#   Warning NodeNotReady  kubelet stopped posting node status

# Step 3: SSH to the node and check kubelet
ssh node-1
sudo systemctl status kubelet
sudo journalctl -u kubelet -f --since "1 hour ago"

# Common kubelet issues:
# "Unable to connect to the server" → networking issue
# "certificate has expired" → TLS certs expired
# "failed to get node info" → API server unreachable
# "PLEG is not healthy" → container runtime stuck

# Step 4: Check container runtime
sudo systemctl status containerd
sudo crictl ps   # List running containers via CRI

# Step 5: Check disk space
df -h
du -sh /var/lib/containerd   # Check container storage

# Step 6: Check memory
free -h
sudo dmesg | grep -i "oom\|killed"   # OOM kill events

# Step 7: Check network connectivity to control plane
nc -zv <api-server-ip> 6443

# Step 8: Restart kubelet if needed
sudo systemctl restart kubelet
sudo systemctl restart containerd

🎯 Scenario: Your team wants every change to Kubernetes to go through Git — no manual kubectl apply in production.

Answer:

# Install ArgoCD
kubectl create namespace argocd
kubectl apply -n argocd \
  -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Get initial admin password
argocd admin initial-password -n argocd

# Expose UI via LoadBalancer
kubectl patch svc argocd-server -n argocd \
  -p '{"spec":{"type":"LoadBalancer"}}'
# ArgoCD Application — watches Git and syncs to cluster
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app-production
  namespace: argocd
  finalizers:
  - resources-finalizer.argocd.argoproj.io   # Cascade delete
spec:
  project: production-apps
  source:
    repoURL: https://github.com/my-org/k8s-manifests.git
    targetRevision: main
    path: apps/web-app/overlays/production
    # For Helm:
    # chart: web-app
    # helm:
    #   releaseName: web-app
    #   valueFiles: [values-prod.yaml]
    #   parameters:
    #   - name: image.tag
    #     value: v2.1.0
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true         # Delete resources removed from Git
      selfHeal: true      # Revert manual changes back to Git state
      allowEmpty: false
    syncOptions:
    - CreateNamespace=true
    - PrunePropagationPolicy=foreground
    - ApplyOutOfSyncOnly=true   # Only sync changed resources
    retry:
      limit: 5
      backoff:
        duration: 10s
        factor: 2
        maxDuration: 5m
# ArgoCD AppProject — group apps and restrict permissions
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: production-apps
  namespace: argocd
spec:
  description: Production applications
  # Which repos this project can use
  sourceRepos:
  - https://github.com/my-org/k8s-manifests.git
  # Which clusters/namespaces this project can deploy to
  destinations:
  - namespace: production
    server: https://kubernetes.default.svc
  # Which K8s resources this project can create
  clusterResourceWhitelist:
  - group: ''
    kind: Namespace
  namespaceResourceWhitelist:
  - group: apps
    kind: Deployment
  - group: ''
    kind: Service
  # Prevent deletion in production
  orphanedResources:
    warn: true

🎯 Scenario: After an image passes tests in staging, you want to automatically promote it to production without changing the manifest files.

Answer:

# CI/CD image promotion workflow

# .github/workflows/promote.yml
name: Promote to Production
on:
  workflow_dispatch:
    inputs:
      image_tag:
        description: 'Image tag to promote to production'
        required: true
  # Or trigger automatically when staging tests pass:
  # workflow_run:
  #   workflows: ["Staging Tests"]
  #   types: [completed]
  #   branches: [main]

jobs:
  promote:
    runs-on: ubuntu-latest
    if: github.event.workflow_run.conclusion == 'success'
    steps:
    - uses: actions/checkout@v4

    - name: Install Kustomize
      run: curl -s "https://raw.githubusercontent.com/kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash

    - name: Update production image tag
      run: |
        cd overlays/production
        kustomize edit set image myapp=myapp:${{ inputs.image_tag }}

    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v5
      with:
        title: "Promote ${{ inputs.image_tag }} to production"
        body: |
          Promoting image tag `${{ inputs.image_tag }}` to production.
          Tested and validated in staging.
        branch: promote/${{ inputs.image_tag }}
        base: main
        labels: ["promotion", "production"]
# ArgoCD Image Updater — automatic image promotion based on tag policy
helm install argocd-image-updater \
  argo/argocd-image-updater \
  --namespace argocd
# ArgoCD Application with Image Updater annotations
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app-staging
  annotations:
    argocd-image-updater.argoproj.io/image-list: "myapp=123456.dkr.ecr.us-east-1.amazonaws.com/myapp"
    argocd-image-updater.argoproj.io/myapp.update-strategy: semver
    argocd-image-updater.argoproj.io/myapp.allow-tags: "~1.x.x"   # Only 1.x.x tags
    argocd-image-updater.argoproj.io/write-back-method: git        # Write tag to Git

🎯 Scenario: Your team manages both AWS infrastructure (Terraform) and Kubernetes workloads (Helm charts) in the same repository. How do you automate deployments safely?

Answer:

# .github/workflows/deploy.yml
name: Deploy to Production
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

permissions:
  id-token: write        # OIDC
  contents: read
  pull-requests: write

jobs:
  # ─── TERRAFORM ──────────────────────────────────────────
  terraform:
    name: Terraform Plan/Apply
    runs-on: ubuntu-latest
    defaults:
      run:
        working-directory: ./infrastructure
    steps:
    - uses: actions/checkout@v4

    - name: Configure AWS credentials (OIDC)
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume: arn:aws:iam::123456789:role/github-actions-role
        aws-region: us-east-1

    - uses: hashicorp/setup-terraform@v3
      with:
        terraform_version: 1.7.0

    - run: terraform init
    - run: terraform validate
    - run: terraform fmt -check

    - name: Terraform Plan
      id: plan
      run: terraform plan -out=tfplan -no-color 2>&1 | tee plan.txt

    - name: Comment Plan on PR
      if: github.event_name == 'pull_request'
      uses: actions/github-script@v7
      with:
        script: |
          const plan = require('fs').readFileSync('./infrastructure/plan.txt','utf8')
          github.rest.issues.createComment({
            issue_number: context.issue.number,
            owner: context.repo.owner,
            repo: context.repo.repo,
            body: '## Terraform Plan\n```\n' + plan.slice(0,60000) + '\n```'
          })

    - name: Terraform Apply
      if: github.ref == 'refs/heads/main' && github.event_name == 'push'
      run: terraform apply -auto-approve tfplan

  # ─── HELM DEPLOY ────────────────────────────────────────
  helm-deploy:
    name: Deploy to Kubernetes
    runs-on: ubuntu-latest
    needs: terraform
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    steps:
    - uses: actions/checkout@v4

    - name: Configure AWS credentials
      uses: aws-actions/configure-aws-credentials@v4
      with:
        role-to-assume: arn:aws:iam::123456789:role/github-actions-role
        aws-region: us-east-1

    - name: Update kubeconfig
      run: aws eks update-kubeconfig --name my-cluster --region us-east-1

    - name: Build and push image
      run: |
        aws ecr get-login-password | docker login --username AWS \
          --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
        docker build -t myapp:${{ github.sha }} .
        docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:${{ github.sha }}

    - name: Helm upgrade
      run: |
        helm upgrade --install web-app ./charts/web-app \
          --namespace production \
          --set image.tag=${{ github.sha }} \
          --set image.repository=123456789.dkr.ecr.us-east-1.amazonaws.com/myapp \
          --values charts/web-app/values-prod.yaml \
          --wait \
          --timeout 10m \
          --atomic   # Auto-rollback if upgrade fails

    - name: Verify deployment
      run: |
        kubectl rollout status deployment/web-app -n production
        kubectl get pods -n production -l app=web-app

🎯 Scenario: You need an instant traffic switch to a new version with instant rollback capability — rolling update is too slow.

Answer:

# Blue deployment — current production (receives all traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-blue
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
      slot: blue
  template:
    metadata:
      labels:
        app: web-app
        slot: blue
        version: v1.0
    spec:
      containers:
      - name: web
        image: myapp:v1.0
---
# Green deployment — new version (deployed, but receives no traffic yet)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app-green
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
      slot: green
  template:
    metadata:
      labels:
        app: web-app
        slot: green
        version: v2.0
    spec:
      containers:
      - name: web
        image: myapp:v2.0
---
# Service — currently pointing to blue
apiVersion: v1
kind: Service
metadata:
  name: web-app
spec:
  selector:
    app: web-app
    slot: blue       # ← Change to "green" to switch all traffic instantly
  ports:
  - port: 80
    targetPort: 8080
# Deploy green alongside blue (no traffic yet)
kubectl apply -f green-deployment.yaml

# Test green directly before switching
kubectl port-forward deployment/web-app-green 8081:8080
curl http://localhost:8081/health
curl http://localhost:8081/api/test

# Run smoke tests against green
# ...all good?

# SWITCH TRAFFIC — instant, takes effect in <1 second
kubectl patch service web-app \
  -p '{"spec":{"selector":{"slot":"green"}}}'

# Verify switch took effect
kubectl get endpoints web-app   # Should show green pod IPs

# Monitor error rate for 15 minutes...

# If issues: INSTANT ROLLBACK (1 command)
kubectl patch service web-app \
  -p '{"spec":{"selector":{"slot":"blue"}}}'

# After successful validation: clean up blue
kubectl delete deployment web-app-blue

🎯 Scenario: You need to manage Helm deployments across dev, staging, and production with different values and encrypted secrets in each.

Answer:

Chart structure:
my-app/
├── Chart.yaml
├── values.yaml            # Defaults
├── values-dev.yaml        # Dev overrides
├── values-staging.yaml    # Staging overrides
├── values-prod.yaml       # Production overrides
└── templates/
    ├── deployment.yaml
    ├── service.yaml
    ├── ingress.yaml
    └── hpa.yaml
# values.yaml (base defaults)
replicaCount: 1
image:
  repository: myapp
  tag: latest
  pullPolicy: IfNotPresent
resources:
  requests:
    cpu: 100m
    memory: 128Mi
autoscaling:
  enabled: false
ingress:
  enabled: false
# values-prod.yaml (production overrides)
replicaCount: 5
image:
  repository: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp
  tag: "v2.1.0"
  pullPolicy: Always
resources:
  requests:
    cpu: 500m
    memory: 512Mi
  limits:
    cpu: "2"
    memory: 2Gi
autoscaling:
  enabled: true
  minReplicas: 5
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70
ingress:
  enabled: true
  host: api.example.com
  tlsEnabled: true
podDisruptionBudget:
  enabled: true
  minAvailable: 3
# Deploy to production
helm upgrade --install web-app ./my-app \
  --namespace production \
  --values values.yaml \
  --values values-prod.yaml \
  --set image.tag=$IMAGE_TAG \
  --atomic \
  --wait \
  --timeout 10m

# Helm Secrets plugin — encrypt secret values with SOPS
helm secrets enc secrets-prod.yaml   # Encrypt
helm secrets dec secrets-prod.yaml   # Decrypt

helm upgrade --install web-app ./my-app \
  --values values-prod.yaml \
  --values secrets-prod.yaml          # Auto-decrypted by helm-secrets plugin

# Helmfile — manage multiple Helm releases declaratively
helmfile sync                         # Apply all releases
helmfile diff                         # Show pending changes
helmfile apply --selector app=web-app # Apply specific release

🎯 Scenario: Every deployment requires running database migrations. How do you ensure migrations run exactly once, in the right order, and deployments don’t start if migrations fail?

Answer:

# Helm hook — runs before deployment upgrade
apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration-{{ .Release.Revision }}
  namespace: {{ .Release.Namespace }}
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-weight": "-5"           # Run early (lower = first)
    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
  backoffLimit: 2
  activeDeadlineSeconds: 300
  template:
    spec:
      restartPolicy: Never
      serviceAccountName: migration-sa
      initContainers:
      # Wait for database to be ready before migrating
      - name: wait-for-db
        image: busybox:1.35
        command: ['sh', '-c',
          'until nc -z postgres-service 5432; do sleep 2; done']
      containers:
      - name: migrate
        image: {{ .Values.image.repository }}:{{ .Values.image.tag }}
        command: ["python", "manage.py", "migrate", "--no-input"]
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-secret
              key: url
        - name: MIGRATION_LOCK_TIMEOUT
          value: "60s"          # Prevent deadlock if another migration is running
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
# Helm deployment order with hooks:
# 1. pre-install / pre-upgrade → migration job runs
# 2. If migration succeeds → deployment proceeds
# 3. If migration fails → deployment is aborted (with --atomic)

helm upgrade --install web-app ./my-app \
  --atomic \                    # Rollback if any hook fails
  --timeout 15m \
  --wait

# Check migration job logs
kubectl logs -l job-name=db-migration -n production

# For ArgoCD: use Resource Hooks
# annotations:
#   argocd.argoproj.io/hook: PreSync
#   argocd.argoproj.io/hook-delete-policy: HookSucceeded

🎯 Scenario: Broken YAML or misconfigured manifests reach production, causing deployment failures. How do you catch these in CI?

Answer:

# GitHub Actions — comprehensive manifest validation
name: Validate K8s Manifests
on: [pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4

    # 1. YAML syntax validation
    - name: Validate YAML
      run: |
        pip install yamllint
        yamllint -d relaxed k8s/

    # 2. Kubernetes schema validation with kubeval
    - name: Kubeval
      run: |
        wget https://github.com/instrumenta/kubeval/releases/latest/download/kubeval-linux-amd64.tar.gz
        tar xf kubeval-linux-amd64.tar.gz
        ./kubeval --kubernetes-version=1.28.0 k8s/**/*.yaml

    # 3. Advanced validation with kubeconform
    - name: Kubeconform
      uses: docker://ghcr.io/yannh/kubeconform:latest
      with:
        args: "-strict -summary -kubernetes-version 1.28.0 k8s/"

    # 4. Security scanning with Trivy
    - name: Trivy K8s scan
      uses: aquasecurity/trivy-action@master
      with:
        scan-type: config
        scan-ref: k8s/
        severity: CRITICAL,HIGH
        exit-code: 1

    # 5. Policy compliance with Checkov
    - name: Checkov K8s policies
      uses: bridgecrewio/checkov-action@master
      with:
        directory: k8s/
        framework: kubernetes
        soft_fail: false

    # 6. Kustomize build verification
    - name: Kustomize build
      run: |
        for env in dev staging production; do
          echo "Building $env..."
          kubectl kustomize k8s/overlays/$env > /dev/null
          echo "$env: OK"
        done

    # 7. Helm chart linting
    - name: Helm lint
      run: |
        helm lint charts/web-app/ \
          --values charts/web-app/values-prod.yaml

🎯 Scenario: You have 12 pods and 4 nodes across 2 AZs. You want at most 2 pods per node and even distribution across AZs.

Answer:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-api
spec:
  replicas: 12
  selector:
    matchLabels:
      app: critical-api
  template:
    metadata:
      labels:
        app: critical-api
    spec:
      topologySpreadConstraints:
      # Constraint 1: Spread across AZs — max 1 pod difference between zones
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: critical-api

      # Constraint 2: Spread across nodes — max 2 pods per node
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: critical-api

      # ANTI-AFFINITY: Soft preference to avoid same node as DB
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: postgres
              topologyKey: kubernetes.io/hostname

🎯 Scenario: Your team wants to run Kafka in Kubernetes with automatic partition rebalancing, rolling upgrades, and self-healing. Should you write a StatefulSet or use an Operator?

Answer:

Operators encode human operational knowledge into code — they extend Kubernetes with domain-specific controllers.

# Install Strimzi Kafka Operator
helm repo add strimzi https://strimzi.io/charts
helm install strimzi-kafka-operator strimzi/strimzi-kafka-operator \
  --namespace kafka --create-namespace
# Kafka cluster managed by Strimzi Operator
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: production-kafka
  namespace: kafka
spec:
  kafka:
    version: 3.6.0
    replicas: 3
    listeners:
    - name: plain
      port: 9092
      type: internal
      tls: false
    - name: tls
      port: 9093
      type: internal
      tls: true
    config:
      offsets.topic.replication.factor: 3
      transaction.state.log.replication.factor: 3
      default.replication.factor: 3
      min.insync.replicas: 2
    storage:
      type: persistent-claim
      size: 100Gi
      class: fast-ssd
    resources:
      requests:
        cpu: "1"
        memory: 4Gi
      limits:
        cpu: "2"
        memory: 4Gi
  zookeeper:
    replicas: 3
    storage:
      type: persistent-claim
      size: 10Gi
      class: fast-ssd
  entityOperator:
    topicOperator: {}
    userOperator: {}
# Strimzi manages Topic creation declaratively
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
  name: orders-topic
  namespace: kafka
  labels:
    strimzi.io/cluster: production-kafka
spec:
  partitions: 12
  replicas: 3
  config:
    retention.ms: 604800000     # 7 days
    segment.bytes: 1073741824   # 1GB segments

💡 When to use an Operator: For complex stateful applications (Kafka, PostgreSQL, Elasticsearch, Redis) where the operational runbook has many steps. Check OperatorHub.io before writing your own.

🎯 Scenario: You’re building a SaaS platform where each customer gets an isolated environment on your Kubernetes cluster.

Answer:

Multi-tenancy approaches:

1. Namespace-per-tenant (soft isolation)
   ├── Pro: Simple, low overhead
   ├── Con: Shared kernel, shared cluster DNS
   └── Use: Internal teams, trusted tenants

2. Cluster-per-tenant (hard isolation)
   ├── Pro: Complete isolation
   ├── Con: Expensive, high operational overhead
   └── Use: Highly regulated industries, untrusted code

3. vCluster (virtual clusters)
   ├── Pro: Near-complete isolation, cheaper than full clusters
   ├── Con: Complexity of nested Kubernetes
   └── Use: Best balance for SaaS multi-tenancy
# vCluster — virtual Kubernetes clusters inside namespaces
helm repo add loft-sh https://charts.loft.sh
helm install vcluster-tenant-a vcluster \
  --repo https://charts.loft.sh \
  --namespace tenant-a \
  --create-namespace \
  --values - <<EOF
sync:
  ingresses:
    enabled: true
storage:
  size: 5Gi
isolation:
  enabled: true
  podSecurityStandard: baseline
  resourceQuota:
    enabled: true
    quota:
      requests.cpu: "10"
      requests.memory: 20Gi
      pods: "50"
EOF

# Connect to vCluster as tenant-a admin
vcluster connect vcluster-tenant-a -n tenant-a
kubectl get pods   # Connected to tenant's virtual cluster
# HNC (Hierarchical Namespace Controller) — namespace trees
# parent namespace propagates RBAC and policies to children
apiVersion: hnc.x-k8s.io/v1alpha2
kind: HierarchyConfiguration
metadata:
  name: hierarchy
  namespace: tenant-a-prod
spec:
  parent: tenant-a   # Inherits RBAC from tenant-a namespace

🎯 Scenario: Your AWS EKS bill is $60K/month. How do you reduce it by 30% without impacting production performance?

Answer:

# 1. Install Kubecost for cost visibility
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost --create-namespace \
  --set global.prometheus.enabled=true

# Access Kubecost UI to see cost breakdown per namespace/workload
kubectl port-forward svc/kubecost-cost-analyzer 9090:9090 -n kubecost
# 2. Use Spot instances for non-critical workloads
# Node group with mixed instances (EKS Managed Node Group)
# 0% on-demand base, 100% Spot above base
# Multiple instance types for Spot diversity

# Deployment that tolerates spot interruption
spec:
  template:
    spec:
      tolerations:
      - key: "eks.amazonaws.com/capacityType"
        operator: "Equal"
        value: "SPOT"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: eks.amazonaws.com/capacityType
                operator: In
                values: ["SPOT"]
# 3. Scale to zero dev/staging at night with KEDA
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dev-scale-to-zero
  namespace: development
spec:
  scaleTargetRef:
    name: dev-api
  minReplicaCount: 0         # Scales to ZERO
  maxReplicaCount: 5
  triggers:
  - type: cron
    metadata:
      timezone: America/New_York
      start: "0 9 * * 1-5"   # Up at 9 AM weekdays
      end: "0 19 * * 1-5"    # Down at 7 PM weekdays
      desiredReplicas: "3"
# 4. Right-size with VPA recommendations
kubectl get vpa -A -o json | jq '.items[] | {
  name: .metadata.name,
  namespace: .metadata.namespace,
  current: .spec.resourcePolicy,
  recommended: .status.recommendation.containerRecommendations
}'

# 5. Remove unused resources
# Identify unused PVCs (no pod mounting them)
kubectl get pvc -A -o json | jq -r '
  .items[] |
  select(.status.phase == "Bound") |
  select(.metadata.annotations["pv.kubernetes.io/bind-completed"] == "yes") |
  .metadata.namespace + "/" + .metadata.name
'

# 6. Use Descheduler to rebalance pods after scale-down
helm install descheduler kubernetes-sigs/descheduler \
  --namespace kube-system \
  --set cronJobApiVersion=batch/v1 \
  --set schedule="0 */2 * * *"

Cost saving levers:

ActionExpected Savings
Right-size over-provisioned pods via VPA20–40%
Spot instances for dev/batch workloads60–80% on those nodes
Scale dev/staging to zero at night50–70% on those envs
Cluster Autoscaler aggressive scale-down15–25%
Spot + Cluster Autoscaler for production batch40–60%
Remove unused PVs and idle LoadBalancers5–10%

🎯 Scenario: Your microservices need automatic mTLS, circuit breaking, retry logic, and distributed tracing without changing application code.

Answer:

# Install Istio
istioctl install --set profile=production

# Enable automatic sidecar injection for production namespace
kubectl label namespace production istio-injection=enabled

# All new pods in "production" now automatically get an Envoy proxy sidecar
kubectl rollout restart deployment -n production
# Traffic management — canary with header-based routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: web-app
  namespace: production
spec:
  hosts:
  - web-app
  http:
  # Header-based routing for internal testers
  - match:
    - headers:
        x-canary:
          exact: "true"
    route:
    - destination:
        host: web-app
        subset: v2
  # Weight-based traffic split (10% canary)
  - route:
    - destination:
        host: web-app
        subset: v1
      weight: 90
    - destination:
        host: web-app
        subset: v2
      weight: 10
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: "gateway-error,connect-failure,retriable-4xx"
    timeout: 10s
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: web-app
spec:
  host: web-app
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 100
        http2MaxRequests: 1000
    outlierDetection:              # Circuit breaker
      consecutive5xxErrors: 5
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
  subsets:
  - name: v1
    labels:
      version: v1.0
  - name: v2
    labels:
      version: v2.0

🎯 Scenario: Your EKS cluster is on K8s 1.26 and needs to upgrade to 1.28. How do you do it with zero downtime?

Answer:

# Pre-upgrade checklist:

# 1. Check deprecated APIs being used
# Install pluto
helm install pluto --repo https://fairwinds.github.io/pluto pluto
kubectl pluto detect-helm    # Check Helm releases
pluto detect-files -d ./k8s  # Check manifest files

# Common API removals in recent versions:
# K8s 1.25: PodSecurityPolicy removed
# K8s 1.26: FlowSchema v1beta1 removed
# K8s 1.27: CSIStorageCapacity v1beta1 removed

# 2. Check addon compatibility (CoreDNS, kube-proxy, CSI drivers)
kubectl get pods -n kube-system -o wide

# 3. Backup etcd
ETCDCTL_API=3 etcdctl snapshot save /backup/pre-upgrade.db ...

# 4. Ensure PodDisruptionBudgets exist for critical workloads
kubectl get pdb -A

# 5. Test upgrade in staging first!
# EKS upgrade process:

# Step 1: Upgrade control plane (AWS manages this)
aws eks update-cluster-version \
  --name my-cluster \
  --kubernetes-version 1.28

# Wait for control plane upgrade
aws eks wait cluster-active --name my-cluster

# Step 2: Update managed addons
aws eks update-addon --cluster-name my-cluster --addon-name kube-proxy \
  --addon-version v1.28.0-eksbuild.1
aws eks update-addon --cluster-name my-cluster --addon-name coredns \
  --addon-version v1.10.1-eksbuild.1
aws eks update-addon --cluster-name my-cluster --addon-name aws-ebs-csi-driver \
  --addon-version v1.25.0-eksbuild.1

# Step 3: Upgrade node groups (rolling replacement of nodes)
aws eks update-nodegroup-version \
  --cluster-name my-cluster \
  --nodegroup-name main \
  --kubernetes-version 1.28

# Monitor node group update
aws eks wait nodegroup-active --cluster-name my-cluster --nodegroup-name main

# Verify cluster is healthy
kubectl get nodes
kubectl get pods -A | grep -v Running | grep -v Completed

🎯 Scenario: You want to test your system’s resilience by intentionally causing failures and verifying your application handles them gracefully.

Answer:

# Install Chaos Mesh
helm repo add chaos-mesh https://charts.chaos-mesh.org
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --create-namespace \
  --set chaosDaemon.runtime=containerd \
  --set chaosDaemon.socketPath=/run/containerd/containerd.sock
# PodChaos — randomly kill pods (simulates node failures)
apiVersion: chaos-mesh.org/v1alpha1
kind: PodChaos
metadata:
  name: pod-failure-test
  namespace: production
spec:
  action: pod-kill             # pod-kill / pod-failure / container-kill
  mode: random-max-percent
  value: "30"                  # Kill up to 30% of matching pods
  selector:
    namespaces:
    - production
    labelSelectors:
      app: web-app
  scheduler:
    cron: "@every 10m"         # Run every 10 minutes
# NetworkChaos — simulate network latency (test timeout handling)
apiVersion: chaos-mesh.org/v1alpha1
kind: NetworkChaos
metadata:
  name: network-delay-test
spec:
  action: delay
  mode: all
  selector:
    namespaces:
    - production
    labelSelectors:
      app: api-server
  delay:
    latency: "200ms"          # Add 200ms latency
    correlation: "25"
    jitter: "50ms"
  direction: to               # Affects outgoing traffic
  duration: "5m"
# StressChaos — memory pressure test (verify OOMKill behavior)
apiVersion: chaos-mesh.org/v1alpha1
kind: StressChaos
metadata:
  name: memory-stress-test
spec:
  mode: one
  selector:
    namespaces: [production]
    labelSelectors:
      app: api-server
  stressors:
    memory:
      workers: 1
      size: "512MB"       # Allocate 512MB in the target container
  duration: "1m"

🎯 Scenario: Your team wants a lightweight GitOps solution that automatically syncs Git to the cluster without a UI dependency.

Answer:

# Install Flux CLI
curl -s https://fluxcd.io/install.sh | sudo bash

# Bootstrap Flux — installs controllers + configures Git repo
flux bootstrap github \
  --owner=my-org \
  --repository=k8s-manifests \
  --branch=main \
  --path=clusters/production \
  --personal
# GitRepository — tells Flux where to pull manifests from
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: k8s-manifests
  namespace: flux-system
spec:
  interval: 1m           # Check for changes every minute
  url: https://github.com/my-org/k8s-manifests
  ref:
    branch: main
  secretRef:
    name: github-token
# Kustomization — applies manifests from the GitRepository
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: production-apps
  namespace: flux-system
spec:
  interval: 5m
  path: ./apps/production
  prune: true              # Delete resources removed from Git
  sourceRef:
    kind: GitRepository
    name: k8s-manifests
  healthChecks:
  - apiVersion: apps/v1
    kind: Deployment
    name: web-app
    namespace: production
  timeout: 5m
# HelmRelease — manage Helm charts through Flux
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: web-app
  namespace: production
spec:
  interval: 10m
  chart:
    spec:
      chart: web-app
      version: ">=1.0.0 <2.0.0"
      sourceRef:
        kind: HelmRepository
        name: my-charts
        namespace: flux-system
  values:
    replicaCount: 5
    image:
      tag: v2.1.0
  upgrade:
    remediation:
      retries: 3
  rollback:
    timeout: 5m

🎯 Scenario: Your PostgreSQL StatefulSet needs maintenance (failover, upgrade, resize) without application downtime.

Answer:

# Use CloudNativePG operator — the production-grade PostgreSQL Operator
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: postgres-ha
  namespace: production
spec:
  instances: 3              # 1 primary + 2 replicas
  primaryUpdateStrategy: unsupervised   # Auto-failover
  postgresql:
    parameters:
      max_connections: "200"
      shared_buffers: "512MB"
      wal_level: logical
  bootstrap:
    initdb:
      database: myapp
      owner: appuser
      secret:
        name: app-db-credentials
  storage:
    storageClass: fast-ssd
    size: 100Gi
  backup:
    retentionPolicy: "30d"
    barmanObjectStore:
      destinationPath: s3://my-pg-backups
      s3Credentials:
        inheritFromIAMRole: true
  monitoring:
    enabled: true
  # Automatically failover within 30s if primary fails
  failoverDelay: 0
# CloudNativePG operations
kubectl cnpg status postgres-ha -n production
kubectl cnpg promote postgres-ha-2 -n production   # Manual failover to replica 2
kubectl cnpg backup postgres-ha -n production       # On-demand backup
kubectl cnpg restart postgres-ha -n production      # Rolling restart

# Check replication lag
kubectl exec -it postgres-ha-1 -n production -- \
  psql -c "SELECT * FROM pg_stat_replication;"

🎯 Scenario: During a cluster upgrade, Kubernetes drained 3 out of 4 pods of your API simultaneously, causing a brief outage.

Answer:

# PodDisruptionBudget — protects against voluntary disruptions
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-api-pdb
  namespace: production
spec:
  # Choose ONE of:
  minAvailable: 3          # Always keep at least 3 pods running
  # maxUnavailable: 1      # Allow at most 1 pod down at a time
  # minAvailable: "75%"    # Keep at least 75% of pods running

  selector:
    matchLabels:
      app: web-api
# PDB for StatefulSet (database cluster — at most 1 unavailable)
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
  namespace: production
spec:
  maxUnavailable: 1       # Only 1 DB pod can be down at a time
  selector:
    matchLabels:
      app: postgres
# Check PDB status
kubectl get pdb -n production
# NAME          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
# web-api-pdb   3               N/A               1                     5d
# postgres-pdb  N/A             1                 1                     5d

# ALLOWED DISRUPTIONS shows how many pods can currently be voluntarily disrupted

# PDB prevents kubectl drain from draining too many pods:
kubectl drain node-1 --ignore-daemonsets
# error: Cannot evict pod as it would violate the pod's disruption budget
# ← This is expected! PDB is protecting the service

Always create PDBs before cluster upgrades. Node drain respects PDBs — if you can’t drain without violating a PDB, the drain blocks until conditions are met (e.g., a pod is rescheduled on another node first).

🎯 Scenario: Your app needs dynamic database credentials from HashiCorp Vault, rotated every hour, without pod restarts.

Answer:

# Install Vault with HA in Kubernetes
helm repo add hashicorp https://helm.releases.hashicorp.com
helm install vault hashicorp/vault \
  --namespace vault \
  --create-namespace \
  --set server.ha.enabled=true \
  --set server.ha.replicas=3 \
  --set server.auditStorage.enabled=true
# Vault Agent Injector — sidecars that inject and renew secrets
# Annotate pods to get automatic secret injection

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
spec:
  template:
    metadata:
      annotations:
        # Enable Vault Agent sidecar injection
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "web-app"
        # Inject DB credentials at /vault/secrets/db-creds
        vault.hashicorp.com/agent-inject-secret-db-creds: "database/creds/web-app"
        # Template to format the secret as a .env file
        vault.hashicorp.com/agent-inject-template-db-creds: |
          {{- with secret "database/creds/web-app" -}}
          export DB_USERNAME="{{ .Data.username }}"
          export DB_PASSWORD="{{ .Data.password }}"
          {{- end }}
        # Renew lease — app gets new creds before they expire
        vault.hashicorp.com/agent-pre-populate-only: "false"
    spec:
      serviceAccountName: web-app-sa  # Vault uses K8s SA for auth
      containers:
      - name: app
        image: myapp:v2.0
        command:
        - /bin/sh
        - -c
        - |
          source /vault/secrets/db-creds  # Load dynamic credentials
          exec python app.py
# Configure Vault database secrets engine
vault secrets enable database
vault write database/config/postgres \
  plugin_name=postgresql-database-plugin \
  allowed_roles="web-app" \
  connection_url="postgresql://{{username}}:{{password}}@postgres:5432/mydb?sslmode=disable" \
  username="vault-admin" \
  password="vault-admin-password"

vault write database/roles/web-app \
  db_name=postgres \
  creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO \"{{name}}\";" \
  default_ttl="1h" \
  max_ttl="24h"

🎯 Scenario: During a resource crunch, you want critical services to always get resources, and best-effort workloads to be evicted first.

Answer:

# PriorityClass — defines scheduling and eviction priority
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical-production
value: 1000000        # Higher value = higher priority
globalDefault: false
description: "Critical production services — never evict"
preemptionPolicy: PreemptLowerPriority   # Can preempt lower-priority pods
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: standard-production
value: 100000
globalDefault: true
description: "Standard production workloads"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch-workload
value: 1000
description: "Batch jobs — evict first during pressure"
preemptionPolicy: Never
# Critical service uses high-priority class
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
spec:
  template:
    spec:
      priorityClassName: critical-production   # High priority
      containers:
      - name: payment
        resources:
          requests:
            cpu: "1"
            memory: 1Gi
          limits:
            cpu: "1"       # Guaranteed QoS (requests == limits)
            memory: 1Gi
# Batch job uses low-priority class
apiVersion: batch/v1
kind: Job
spec:
  template:
    spec:
      priorityClassName: batch-workload       # Low priority — evicted first
      containers:
      - name: batch
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
# ResourceQuota per PriorityClass per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: critical-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
  scopeSelector:
    matchExpressions:
    - scopeName: PriorityClass
      operator: In
      values:
      - critical-production

🎯 Scenario: You’re launching a new production Kubernetes cluster. What does your production-readiness checklist include?

Answer:

☐ CLUSTER SETUP
  ☐ Multi-master control plane (3+ masters for HA)
  ☐ etcd with 3+ nodes, automated backups to S3 every 15 min
  ☐ Worker nodes across 3+ availability zones
  ☐ Cluster Autoscaler configured for node scaling
  ☐ Managed K8s (EKS/GKE/AKS) preferred over self-managed

☐ NETWORKING
  ☐ CNI plugin with NetworkPolicy support (Calico/Cilium)
  ☐ NGINX or AWS Load Balancer Ingress controller
  ☐ cert-manager for automatic TLS certificate management
  ☐ CoreDNS with anti-affinity (spread across nodes)
  ☐ Default-deny NetworkPolicies per namespace

☐ SECURITY
  ☐ RBAC configured — least privilege for all service accounts
  ☐ Pod Security Standards enforced (restricted level for production)
  ☐ Secrets encryption at rest (EncryptionConfiguration or KMS)
  ☐ External Secrets Operator with AWS Secrets Manager
  ☐ Image vulnerability scanning in CI (Trivy, Grype)
  ☐ OPA/Gatekeeper policies (resource limits required, allowed registries)
  ☐ Audit logging enabled with 30-day retention
  ☐ No hostNetwork, hostPID, privileged containers in production
  ☐ Container images built FROM non-root base images

☐ WORKLOADS
  ☐ All Deployments have resource requests AND limits
  ☐ All Deployments have liveness + readiness probes
  ☐ All Deployments have PodDisruptionBudgets (minAvailable: 2+)
  ☐ maxUnavailable: 0 in rolling update strategy
  ☐ terminationGracePeriodSeconds >= 30 + preStop hook
  ☐ Pods spread across AZs with topologySpreadConstraints
  ☐ No bare pods (use Deployment/StatefulSet/DaemonSet)

☐ STORAGE
  ☐ StorageClass with WaitForFirstConsumer binding mode
  ☐ allowVolumeExpansion: true on StorageClasses
  ☐ reclaimPolicy: Retain for production databases
  ☐ Regular PVC snapshots / application-level backups
  ☐ Backup restore tested regularly

☐ OBSERVABILITY
  ☐ Prometheus + Alertmanager + Grafana (kube-prometheus-stack)
  ☐ Loki or EFK for centralized log aggregation
  ☐ Jaeger or Tempo for distributed tracing
  ☐ Alerts: PodCrashLooping, HighErrorRate, NodeNotReady, DiskPressure
  ☐ Grafana dashboards for SLIs (latency, error rate, saturation)
  ☐ SLO tracking and error budget monitoring

☐ CI/CD
  ☐ GitOps (ArgoCD/Flux) — all changes via Git
  ☐ Manifest validation in CI (kubeval, kubeconform, Checkov)
  ☐ Image scanning before push
  ☐ Helm or Kustomize for environment-specific configs
  ☐ Automated rollback on health check failure

☐ CAPACITY & COST
  ☐ VPA in recommendation mode — review monthly
  ☐ HPA on all customer-facing deployments
  ☐ Spot instances for dev/staging/batch
  ☐ Kubecost or AWS Cost Explorer for K8s cost allocation
  ☐ ResourceQuotas per namespace to prevent runaway costs
  ☐ Descheduler for pod bin-packing

📋 Quick Reference Cheatsheet

Pod Debugging

kubectl get pods -A -o wide                              # All pods + node/IP info
kubectl describe pod <pod> -n <ns>                       # Full details + events
kubectl logs <pod> --previous -c <container>             # Pre-crash logs
kubectl logs -l app=web --all-containers --prefix -f     # Tail multi-pod
kubectl exec -it <pod> -c <container> -- bash            # Shell into container
kubectl debug -it <pod> --image=nicolaka/netshoot        # Inject debug container
kubectl debug node/<node> -it --image=ubuntu:22.04       # Debug a node
kubectl port-forward pod/<pod> 8080:8080                 # Local port forwarding
kubectl cp <pod>:/path/file ./local                      # Copy from pod
kubectl get events --sort-by='.lastTimestamp' -n <ns>    # Recent events
kubectl get events --field-selector reason=Failed        # Only failures

Deployments & Rollouts

kubectl apply -f manifest.yaml --dry-run=server          # Preview apply
kubectl diff -f manifest.yaml                            # Show pending changes
kubectl rollout status deployment/<name>                 # Watch rollout
kubectl rollout history deployment/<name>                # Revision history
kubectl rollout undo deployment/<name>                   # Rollback
kubectl rollout undo deployment/<name> --to-revision=3   # Specific revision
kubectl rollout restart deployment/<name>                # Force restart all pods
kubectl scale deployment/<name> --replicas=5             # Manual scale
kubectl set image deployment/<name> app=myapp:v2.0       # Update image
kubectl autoscale deployment/<name> --min=2 --max=10     # Quick HPA

Nodes & Cluster

kubectl get nodes -o wide                                # Node status
kubectl describe node <node>                             # Node details + capacity
kubectl drain <node> --ignore-daemonsets \
  --delete-emptydir-data                                 # Drain for maintenance
kubectl cordon <node>                                    # Stop scheduling
kubectl uncordon <node>                                  # Resume scheduling
kubectl top nodes                                        # CPU/memory usage
kubectl top pods --containers --sort-by=memory           # Container metrics
kubectl get componentstatuses                            # Control plane health
kubectl cluster-info                                     # Cluster endpoints

Resources & Config

kubectl get all -n <namespace>                           # All resources in ns
kubectl get pv,pvc -A                                    # Storage overview
kubectl get networkpolicy -A                             # All network policies
kubectl get ingress -A                                   # All ingresses
kubectl get hpa,vpa,scaledobject -A                      # All autoscalers
kubectl api-resources                                    # All resource types
kubectl explain deployment.spec.strategy               # Inline docs
kubectl get quota -A                                     # Resource quotas
kubectl get limitrange -A                                # Limit ranges

Security & RBAC

kubectl auth can-i create pods -n prod --as=user@co.com  # Check permissions
kubectl auth can-i --list -n prod --as=user@co.com       # List all permissions
kubectl get rolebindings,clusterrolebindings -A          # All RBAC bindings
kubectl get serviceaccounts -A                           # All service accounts
kubectl get secrets -A --field-selector type=Opaque      # Opaque secrets

State & Troubleshooting

kubectl get pods --field-selector=status.phase=Pending   # Pending pods
kubectl get pods --field-selector=status.phase=Failed    # Failed pods
KUBECONFIG=~/.kube/config kubectl config get-contexts    # List clusters
kubectl config use-context <context>                     # Switch cluster
kubectl config set-context --current --namespace=<ns>    # Set default ns
kubectl delete pod <pod> --grace-period=0 --force        # Force delete
TF_LOG=DEBUG kubectl apply -f file.yaml                  # Verbose kubectl

🎯 Interview Tips

TopicWhat Interviewers Want to Hear
ArchitectureControl plane vs data plane, API server is the hub, etcd quorum
SchedulingFilters → Scores → Binds; taints/tolerations; affinity
Probesstartup → liveness → readiness; consequences of each failing
ServicesClusterIP/NodePort/LoadBalancer; headless for StatefulSets
NetworkingCNI; CoreDNS; NetworkPolicy requires compatible CNI
StoragePV/PVC/StorageClass abstraction; WaitForFirstConsumer for multi-AZ
SecurityRBAC least privilege; PSS; non-root; secrets encryption; IRSA
ScalingHPA (stateless) + VPA (right-size) + KEDA (event-driven) + CA (nodes)
GitOpsArgoCD/Flux; declarative; self-healing; drift detection
ResiliencePDBs; maxUnavailable=0; multi-AZ spread; graceful shutdown
ObservabilityPrometheus/Grafana metrics; Loki logs; Jaeger traces
Troubleshootingdescribe → logs –previous → events → exec → network debug
CostSpot instances; scale-to-zero; right-sizing; Kubecost
UpgradesCheck deprecated APIs; PDBs first; control plane then nodes

Good luck with your Kubernetes interviews! ☸️

Add More Questions to This Guide

Know questions that should be here? Share them and help the community!

Open Google Form