Jenkins Scenario-Based Interview Questions and Answers (2026)
20 real-world Jenkins scenario-based interview questions and answers covering architecture, pipelines, shared libraries, agents, security, optimization, Docker/Kubernetes, AWS integrations, and disaster recovery — Senior DevOps Engineer Edition.
Section 1 — Jenkins Architecture & Setup
Problem Analysis:
- Single controller handling both job orchestration AND build execution.
- No separation of concerns between controller and agents.
- All workloads competing for the same resources.
Target Architecture — Jenkins Controller + Dynamic Agent Pool:
┌─────────────────────────────────────────────────┐
│ JENKINS CONTROLLER (HA Pair) │
│ - Job scheduling & orchestration only │
│ - No builds run on controller │
│ - 8 vCPU / 32GB RAM / 500GB SSD │
└──────────────────┬──────────────────────────────┘
│
┌─────────────────────────┼─────────────────────────────┐
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Static Agents │ │ Docker Agents │ │ K8s Pod Agents │
│ (Heavy builds) │ │ (Standard CI) │ │ (On-demand) │
│ 4x c5.4xlarge │ │ Docker-in-Docker│ │ Kubernetes Plugin│
│ Maven/Gradle │ │ ephemeral │ │ Auto-scale │
└──────────────────┘ └──────────────────┘ └──────────────────┘
Key Changes:
1. Disable builds on the controller:
// Jenkins system configuration (JCasC)
jenkins:
numExecutors: 0 // ZERO executors on controller
mode: EXCLUSIVE
2. Kubernetes Plugin for dynamic agents:
pipeline {
agent {
kubernetes {
yaml '''
apiVersion: v1
kind: Pod
spec:
containers:
- name: maven
image: maven:3.9-eclipse-temurin-17
command: ["sleep", "infinity"]
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
- name: docker
image: docker:24-dind
securityContext:
privileged: true
'''
}
}
stages {
stage('Build') {
steps {
container('maven') {
sh 'mvn clean package -T 4'
}
}
}
}
}
3. Agent labeling strategy:
| Label | Agent Type | Use Case |
|---|---|---|
docker | K8s pod | Standard builds, linting |
heavy | Static EC2 | Large Maven/Gradle projects |
gpu | GPU instance | ML model training |
windows | Windows VM | .NET, PowerShell builds |
mac | Mac mini | iOS/macOS builds |
4. Build queue optimization:
- Enable
Throttle Concurrent Buildsplugin — limit per project. - Use
Priority Sorterplugin for critical production deployments. - Implement
Folderstructure to isolate team quotas.
Result: Build wait time drops from 30 minutes to under 2 minutes. Controller CPU drops from 90% to 15%.
Option 1: Active-Standby with EFS (AWS)
┌──────────────────────────────┐
│ Route 53 / ALB │
│ (health check every 10s) │
└──────────┬───────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
┌─────────────────┐ ┌─────────────────┐
│ Jenkins Active │ │ Jenkins Standby │
│ (RUNNING) │ │ (WARM, stopped) │
│ EC2: m5.2xl │ │ EC2: m5.2xl │
└────────┬────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Amazon EFS │ ←── Shared $JENKINS_HOME
│ (Multi-AZ NFS) │ jobs/, builds/, plugins/
└─────────────────┘
Failover script (triggered by ALB health check failure → Lambda):
#!/bin/bash
aws ec2 start-instances --instance-ids $STANDBY_INSTANCE_ID
aws ec2 wait instance-running --instance-ids $STANDBY_INSTANCE_ID
aws route53 change-resource-record-sets \
--hosted-zone-id $ZONE_ID \
--change-batch file://dns-failover.json
Option 2: CloudBees CI (Enterprise HA)
- True active-active clustering.
- Operations Center manages multiple controllers.
- Recommended for 200+ engineers.
What happens to running builds during a controller outage?
| Scenario | Behavior | Mitigation |
|---|---|---|
| Controller crashes mid-build | Build is lost (default) | Use checkpoints plugin (CloudBees) |
| Agent survives but controller gone | Agent idles, build orphaned | Implement build-level idempotency |
| EFS/NFS accessible | Standby can see build history | Standby promotion restores state |
Pipeline with stash/unstash | Stashed artifacts on agent are lost | Use S3 artifact storage |
Best Practice — Stateless Pipelines:
stage('Archive') {
steps {
// Always archive to S3, not just Jenkins
s3Upload(
bucket: 'ci-artifacts',
path: "${BUILD_TAG}/",
includePathPattern: 'target/*.jar'
)
}
}
Section 2 — Declarative & Scripted Pipelines
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
metadata:
labels:
app: jenkins-agent
spec:
serviceAccountName: jenkins-agent
containers:
- name: maven
image: maven:3.9-eclipse-temurin-17-alpine
command: ['sleep', 'infinity']
volumeMounts:
- name: maven-cache
mountPath: /root/.m2
- name: docker
image: docker:24-dind
securityContext:
privileged: true
env:
- name: DOCKER_TLS_CERTDIR
value: ""
- name: trivy
image: aquasec/trivy:latest
command: ['sleep', 'infinity']
- name: kubectl
image: bitnami/kubectl:1.29
command: ['sleep', 'infinity']
volumes:
- name: maven-cache
persistentVolumeClaim:
claimName: maven-cache-pvc
"""
}
}
environment {
APP_NAME = 'payment-service'
AWS_REGION = 'us-east-1'
ECR_REGISTRY = '123456789012.dkr.ecr.us-east-1.amazonaws.com'
ECR_REPO = "${ECR_REGISTRY}/${APP_NAME}"
IMAGE_TAG = "${BUILD_NUMBER}-${GIT_COMMIT[0..7]}"
KUBECONFIG_CRED = 'eks-kubeconfig-prod'
SONAR_TOKEN = credentials('sonarqube-token')
SLACK_CHANNEL = '#deployments'
}
options {
buildDiscarder(logRotator(numToKeepStr: '20'))
timestamps()
timeout(time: 60, unit: 'MINUTES')
disableConcurrentBuilds(abortPrevious: true)
ansiColor('xterm')
}
stages {
stage('Checkout') {
steps {
checkout scm
script {
env.GIT_AUTHOR = sh(script: "git log -1 --pretty=format:'%an'", returnStdout: true).trim()
env.GIT_MESSAGE = sh(script: "git log -1 --pretty=format:'%s'", returnStdout: true).trim()
}
}
}
stage('Unit Tests') {
steps {
container('maven') {
sh 'mvn test -T 4 --no-transfer-progress'
}
}
post {
always {
junit 'target/surefire-reports/*.xml'
jacoco(execPattern: 'target/jacoco.exec', minimumLineCoverage: '80')
}
}
}
stage('Code Quality') {
parallel {
stage('SonarQube Analysis') {
steps {
container('maven') {
sh """
mvn sonar:sonar \
-Dsonar.projectKey=${APP_NAME} \
-Dsonar.host.url=http://sonarqube:9000 \
-Dsonar.login=${SONAR_TOKEN} \
--no-transfer-progress
"""
}
}
}
stage('Dependency Check') {
steps {
container('maven') {
sh 'mvn org.owasp:dependency-check-maven:check --no-transfer-progress'
}
}
post {
always {
dependencyCheckPublisher pattern: 'target/dependency-check-report.xml'
}
}
}
}
}
stage('Build & Package') {
steps {
container('maven') {
sh 'mvn package -DskipTests --no-transfer-progress'
}
}
post {
success {
archiveArtifacts artifacts: 'target/*.jar', fingerprint: true
}
}
}
stage('Build Docker Image') {
steps {
container('docker') {
sh """
docker build \
--build-arg BUILD_NUMBER=${BUILD_NUMBER} \
--build-arg GIT_COMMIT=${GIT_COMMIT} \
--label git-commit=${GIT_COMMIT} \
--label build-date=\$(date -u +%Y-%m-%dT%H:%M:%SZ) \
-t ${ECR_REPO}:${IMAGE_TAG} \
-t ${ECR_REPO}:latest \
-f docker/Dockerfile .
"""
}
}
}
stage('Container Security Scan') {
steps {
container('trivy') {
sh """
trivy image \
--exit-code 1 \
--severity CRITICAL \
--no-progress \
--format template \
--template '@/contrib/junit.tpl' \
--output trivy-report.xml \
${ECR_REPO}:${IMAGE_TAG}
"""
}
}
post {
always {
junit allowEmptyResults: true, testResults: 'trivy-report.xml'
}
}
}
stage('Push to ECR') {
steps {
container('docker') {
withAWS(credentials: 'aws-ecr-credentials', region: AWS_REGION) {
sh """
aws ecr get-login-password --region ${AWS_REGION} \
| docker login --username AWS --password-stdin ${ECR_REGISTRY}
docker push ${ECR_REPO}:${IMAGE_TAG}
docker push ${ECR_REPO}:latest
"""
}
}
}
}
stage('Deploy to Dev') {
when { branch 'develop' }
steps {
container('kubectl') {
withKubeConfig([credentialsId: 'eks-kubeconfig-dev']) {
sh """
helm upgrade --install ${APP_NAME} ./helm/${APP_NAME} \
--namespace dev \
--set image.repository=${ECR_REPO} \
--set image.tag=${IMAGE_TAG} \
--set replicaCount=2 \
--wait --timeout 5m
"""
}
}
}
}
stage('Deploy to Staging') {
when { branch 'main' }
steps {
container('kubectl') {
withKubeConfig([credentialsId: 'eks-kubeconfig-staging']) {
sh """
helm upgrade --install ${APP_NAME} ./helm/${APP_NAME} \
--namespace staging \
--set image.repository=${ECR_REPO} \
--set image.tag=${IMAGE_TAG} \
--set replicaCount=3 \
--wait --timeout 10m
"""
}
}
}
post {
success {
script { runIntegrationTests() }
}
}
}
stage('Production Approval') {
when { branch 'main' }
steps {
script {
def approver = input(
id: 'prod-approval',
message: "Deploy ${APP_NAME}:${IMAGE_TAG} to PRODUCTION?",
parameters: [
choice(
name: 'ACTION',
choices: ['Deploy', 'Abort'],
description: 'Select deployment action'
),
string(
name: 'TICKET',
description: 'Change ticket number (e.g., JIRA-1234)'
)
],
submitter: 'senior-devops,release-managers',
submitterParameter: 'APPROVED_BY'
)
env.CHANGE_TICKET = approver.TICKET
env.APPROVED_BY = approver.APPROVED_BY
if (approver.ACTION == 'Abort') {
error("Deployment aborted by ${env.APPROVED_BY}")
}
}
}
}
stage('Deploy to Production') {
when { branch 'main' }
steps {
container('kubectl') {
withKubeConfig([credentialsId: KUBECONFIG_CRED]) {
sh """
helm upgrade --install ${APP_NAME} ./helm/${APP_NAME} \
--namespace production \
--set image.repository=${ECR_REPO} \
--set image.tag=${IMAGE_TAG} \
--set replicaCount=5 \
--set strategy.type=RollingUpdate \
--set strategy.rollingUpdate.maxUnavailable=0 \
--wait --timeout 15m
"""
}
}
}
}
} // end stages
post {
success {
slackSend(
channel: SLACK_CHANNEL,
color: 'good',
message: """
✅ *${APP_NAME}* deployed successfully!
• Version: `${IMAGE_TAG}`
• Branch: `${BRANCH_NAME}`
• Author: ${env.GIT_AUTHOR}
• Commit: ${env.GIT_MESSAGE}
• Ticket: ${env.CHANGE_TICKET ?: 'N/A'}
• <${BUILD_URL}|View Build>
"""
)
}
failure {
slackSend(
channel: SLACK_CHANNEL,
color: 'danger',
message: """
❌ *${APP_NAME}* build FAILED at stage: `${env.STAGE_NAME}`
• Branch: `${BRANCH_NAME}`
• Author: ${env.GIT_AUTHOR}
• <${BUILD_URL}console|View Logs>
"""
)
}
always {
cleanWs()
}
}
}
def runIntegrationTests() {
sh """
newman run tests/integration/postman_collection.json \
--environment tests/integration/staging.json \
--reporters cli,junit \
--reporter-junit-export integration-test-results.xml
"""
junit 'integration-test-results.xml'
}
| Aspect | Declarative | Scripted |
|---|---|---|
| Syntax | Structured DSL with predefined blocks | Full Groovy code |
| Learning curve | Lower — easier for teams | Higher — requires Groovy knowledge |
| Validation | Linted before execution | Errors only at runtime |
| Flexibility | Limited to DSL constructs | Unlimited Groovy logic |
| Error messages | Clear, stage-level | Can be cryptic |
| Best for | 90% of standard pipelines | Complex dynamic logic |
Use Scripted when you need dynamic stage generation:
node {
def services = readYaml file: 'services.yaml'
def buildStages = [:]
services.each { svc ->
def name = svc.name
buildStages[name] = {
stage("Build ${name}") {
sh "docker build -t ${name}:${BUILD_NUMBER} ./services/${name}"
}
}
}
parallel buildStages
}
Declarative with script block — best of both worlds:
pipeline {
agent any
stages {
stage('Dynamic Deploy') {
steps {
script {
def envs = ['dev', 'staging']
if (env.BRANCH_NAME == 'main') {
envs << 'production'
}
envs.each { environment ->
sh "kubectl apply -f k8s/${environment}/"
}
}
}
}
}
}
Rule of thumb: Start with Declarative. Drop into a
script {}block when you need Groovy logic. Only switch to fully Scripted for complex dynamic pipeline generation.
Section 3 — Shared Libraries
Shared Library Structure:
jenkins-shared-library/
├── vars/ ← Global functions (used as pipeline steps)
│ ├── buildDockerImage.groovy
│ ├── deployToKubernetes.groovy
│ ├── runSonarScan.groovy
│ ├── sendSlackNotification.groovy
│ └── standardPipeline.groovy ← One function = full standardized pipeline
├── src/ ← OOP classes
│ └── com/company/
│ ├── Docker.groovy
│ ├── Kubernetes.groovy
│ └── Utils.groovy
├── resources/ ← Non-Groovy templates and scripts
│ ├── helm/values-template.yaml
│ └── scripts/run-tests.sh
└── Jenkinsfile ← Library's own tests
vars/standardPipeline.groovy — the core shared pipeline:
def call(Map config = [:]) {
def appName = config.appName ?: error("appName is required")
def dockerfilePath = config.dockerfilePath ?: 'Dockerfile'
def helmChart = config.helmChart ?: "./helm/${appName}"
def prodApprovers = config.prodApprovers ?: 'senior-devops'
def skipTests = config.skipTests ?: false
pipeline {
agent { label 'docker-agent' }
environment {
IMAGE_TAG = "${BUILD_NUMBER}-${GIT_COMMIT[0..7]}"
ECR_REPO = "123456789012.dkr.ecr.us-east-1.amazonaws.com/${appName}"
}
stages {
stage('Build') { steps { script { buildDockerImage(imageName: "${ECR_REPO}:${IMAGE_TAG}", dockerfile: dockerfilePath) } } }
stage('Test') { when { expression { !skipTests } }; steps { script { runTests() } } }
stage('Security Scan') { steps { script { trivyScan(image: "${ECR_REPO}:${IMAGE_TAG}") } } }
stage('Push Image') { steps { script { pushToECR(image: "${ECR_REPO}:${IMAGE_TAG}") } } }
stage('Deploy') { steps { script { deployToKubernetes(appName: appName, imageTag: IMAGE_TAG, helmChart: helmChart, environment: branchToEnvironment(BRANCH_NAME), prodApprovers: prodApprovers) } } }
}
post {
success { sendSlackNotification(appName, 'SUCCESS', IMAGE_TAG) }
failure { sendSlackNotification(appName, 'FAILURE', IMAGE_TAG) }
always { cleanWs() }
}
}
}
def branchToEnvironment(branch) {
switch(branch) {
case 'main': return 'production'
case 'develop': return 'staging'
default: return 'dev'
}
}
Any microservice Jenkinsfile becomes just 6 lines:
@Library('company-shared-lib@main') _
standardPipeline(
appName: 'payment-service',
helmChart: './helm/payment',
prodApprovers: 'payment-team-leads'
)
Register the library in Jenkins via JCasC:
unclassified:
globalLibraries:
libraries:
- name: "company-shared-lib"
retriever:
modernSCM:
scm:
git:
remote: "git@github.com:company/jenkins-shared-library.git"
credentialsId: "github-ssh-key"
defaultVersion: "main"
implicit: false
allowVersionOverride: true
Section 4 — Multi-Branch & GitFlow Strategy
pipeline {
agent { label 'docker' }
stages {
stage('CI — Always Run') {
stages {
stage('Build') { steps { sh 'mvn package -DskipTests' } }
stage('Unit Tests') { steps { sh 'mvn test' }; post { always { junit 'target/surefire-reports/*.xml' } } }
stage('Code Analysis') { steps { sh 'mvn sonar:sonar' } }
}
}
stage('Docker Build & Push') {
when {
anyOf {
branch 'main'
branch 'develop'
branch pattern: 'release/.*', comparator: 'REGEXP'
}
}
steps {
sh "docker build -t myapp:${BUILD_NUMBER}-${GIT_COMMIT[0..6]} ."
sh "docker push myapp:${BUILD_NUMBER}-${GIT_COMMIT[0..6]}"
}
}
stage('Deploy to Dev') {
when { branch pattern: 'feature/.*|bugfix/.*', comparator: 'REGEXP' }
steps {
sh "helm upgrade --install myapp-dev ./helm --set env=dev --namespace dev"
}
}
stage('Deploy to Staging') {
when { branch 'develop' }
steps {
sh "helm upgrade --install myapp ./helm --set env=staging --namespace staging"
}
post {
success { sh 'newman run tests/smoke-tests.json' }
}
}
stage('Release Candidate') {
when { branch pattern: 'release/.*', comparator: 'REGEXP' }
stages {
stage('Tag RC') {
steps {
script {
def version = env.BRANCH_NAME.replace('release/', '')
sh "git tag rc-${version}-${BUILD_NUMBER}"
sh "git push origin rc-${version}-${BUILD_NUMBER}"
}
}
}
stage('Deploy RC to UAT') {
steps {
sh "helm upgrade --install myapp ./helm --set env=uat --namespace uat"
}
}
}
}
stage('Production Gate') {
when { branch 'main' }
steps {
script {
def approval = input(
message: '🚀 Deploy to Production?',
parameters: [
booleanParam(name: 'CONFIRMED', defaultValue: false,
description: 'I have reviewed the changelog and smoke tests'),
string(name: 'JIRA_TICKET', description: 'Change request ticket')
],
submitter: 'release-managers,cto'
)
if (!approval.CONFIRMED) error("Deployment not confirmed.")
env.JIRA_TICKET = approval.JIRA_TICKET
}
}
}
stage('Deploy to Production') {
when { branch 'main' }
steps {
sh """
helm upgrade --install myapp ./helm \
--set env=production \
--set image.tag=${BUILD_NUMBER}-${GIT_COMMIT[0..6]} \
--namespace production \
--wait --atomic --timeout 10m
"""
}
}
}
post {
success {
script {
if (env.BRANCH_NAME == 'main') {
sh "gh release create v${BUILD_NUMBER} --generate-notes"
}
}
}
}
}
Branch → Environment mapping at a glance:
| Branch Pattern | Target Environment | Requires Approval |
|---|---|---|
feature/*, bugfix/* | Dev | No |
develop | Staging | No |
release/* | UAT | No |
main | Production | ✅ Yes |
Section 5 — Jenkins Agents & Distributed Builds
Step 1 — Inspect the queue via Script Console:
// Manage Jenkins → Script Console
Jenkins.instance.queue.items.each { item ->
println "Job: ${item.task.name}"
println "Why blocked: ${item.why}"
println "In queue since: ${item.inQueueSince}"
println "---"
}
Step 2 — Check agent status:
Jenkins.instance.computers.each { computer ->
println "Agent: ${computer.name} | Online: ${computer.online} | Idle: ${computer.idle}"
println "Labels: ${computer.node.labelString} | Executors: ${computer.numExecutors}"
}
Common root causes and fixes:
| Symptom | Root Cause | Fix |
|---|---|---|
No agent with label 'X' | Label mismatch or agent offline | Fix label in Jenkinsfile or restart agent |
| All agents busy | Too few executors | Add agents or increase executor count |
| Agent online but no builds | Executor count = 0 | Set numExecutors ≥ 1 |
| K8s agents not spinning up | Pod template misconfigured or quota hit | kubectl get events -n jenkins |
| Agent connects then drops | Network / firewall issue | Check logs, verify JNLP port 50000 |
Fix — Auto-reconnect offline agents via Script Console:
Jenkins.instance.computers.each { computer ->
if (!computer.online && !computer.temporarilyOffline) {
computer.connect(false)
println "Reconnecting: ${computer.name}"
}
}
Fix — Debug K8s agent pod not starting:
kubectl get pods -n jenkins -w
kubectl describe pod <jenkins-agent-pod> -n jenkins
# Look for: Insufficient CPU/memory, image pull errors, node selector issues
kubectl logs -n jenkins <jenkins-controller-pod> | grep -i "kubernetes"
Preventive — Auto-scale agents with Karpenter:
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: jenkins-agents
spec:
requirements:
- key: node.kubernetes.io/instance-type
operator: In
values: ["c5.2xlarge", "c5.4xlarge"]
taints:
- key: jenkins-agent
effect: NoSchedule
ttlSecondsAfterEmpty: 120 # Scale down after 2 minutes idle
Section 6 — Security & Access Control
Immediate Actions:
- Revoke the compromised AWS credentials immediately via the IAM console.
- Check CloudTrail for any unauthorized API activity during the exposure window.
- Rotate all secrets in the affected environment.
- Purge credentials from git history.
Remove secrets from git history using git-filter-repo:
pip install git-filter-repo
# Replace the secret value across all commits
git filter-repo \
--replace-text <(echo "AKIAIOSFODNN7EXAMPLE==>***REMOVED***") \
--force
git push --force --all
git push --force --tags
The right way — Jenkins Credentials Binding:
// ❌ WRONG — Never hardcode credentials
environment {
AWS_ACCESS_KEY_ID = 'AKIAIOSFODNN7EXAMPLE'
AWS_SECRET_ACCESS_KEY = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
}
// ✅ RIGHT — Bind from Jenkins encrypted credential store
pipeline {
environment {
AWS_CREDS = credentials('aws-prod-credentials')
// Exposes: AWS_CREDS_USR (access key) and AWS_CREDS_PSW (secret key)
}
stages {
stage('Deploy') {
steps {
withAWS(credentials: 'aws-prod-credentials', region: 'us-east-1') {
sh 'aws s3 ls'
}
}
}
}
}
Best Practice — Use IAM Roles instead of keys (for EC2/EKS agents):
// Attach IAM role to Jenkins agent via IRSA — no credentials needed at all
pipeline {
agent { label 'eks-agent' }
stages {
stage('Deploy') {
steps {
sh 'aws s3 ls' // Uses agent's IAM role automatically
}
}
}
}
Prevention — detect-secrets pre-commit hook:
pip install detect-secrets
detect-secrets scan > .secrets.baseline
# .pre-commit-config.yaml
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
Also install the Mask Passwords Jenkins plugin — all registered credentials are automatically redacted from console output.
Plugin required: Role-based Authorization Strategy
Role hierarchy:
Global Roles:
├── Admin → Full Jenkins access (DevOps leads only)
├── Developer → View jobs, trigger builds, view logs
├── ReadOnly → View-only (auditors, stakeholders)
└── Anonymous → No access (deny all)
Folder Roles (per team):
├── team-payments → Full access to /payments/* jobs
├── team-platform → Full access to /platform/* jobs
└── team-frontend → Full access to /frontend/* jobs
JCasC configuration:
jenkins:
authorizationStrategy:
roleBased:
roles:
global:
- name: "admin"
permissions: ["Overall/Administer"]
assignments: ["devops-leads"]
- name: "developer"
permissions: ["Overall/Read", "Job/Build", "Job/Read", "Job/Workspace", "View/Read"]
assignments: ["all-engineers"]
- name: "readonly"
permissions: ["Overall/Read", "Job/Read", "View/Read"]
assignments: ["stakeholders", "auditors"]
items:
- name: "payments-team"
pattern: "payments/.*"
permissions:
- "Job/Build"
- "Job/Configure"
- "Job/Create"
- "Job/Delete"
- "Job/Read"
- "Run/Replay"
assignments: ["team-payments"]
securityRealm:
ldap:
configurations:
- server: "ldaps://ldap.company.com:636"
rootDN: "dc=company,dc=com"
userSearchBase: "ou=users"
groupSearchBase: "ou=groups"
managerDN: "cn=jenkins,ou=service-accounts,dc=company,dc=com"
managerPasswordSecret: "${LDAP_PASSWORD}"
Section 7 — Pipeline Optimization & Performance
Optimization 1 — Parallelize independent stages:
// ❌ BEFORE: Sequential — 25 min total
stages {
stage('Unit Tests') { steps { sh 'mvn test' } } // 8 min
stage('Integration Tests') { steps { sh 'mvn verify' } } // 10 min
stage('Code Analysis') { steps { sh 'mvn sonar:sonar' } } // 7 min
}
// ✅ AFTER: Parallel — 10 min total (only the longest stage gates progress)
stage('Quality Gates') {
parallel {
stage('Unit Tests') { steps { sh 'mvn test -pl unit-tests' } } // 8 min
stage('Integration Tests') { steps { sh 'mvn verify -pl integration-tests' } } // 10 min ← longest
stage('SonarQube') { steps { sh 'mvn sonar:sonar' } } // 7 min
}
}
Optimization 2 — Maven dependency caching via shared PVC:
agent {
kubernetes {
yaml """
spec:
containers:
- name: maven
image: maven:3.9
volumeMounts:
- name: maven-repo
mountPath: /root/.m2/repository
volumes:
- name: maven-repo
persistentVolumeClaim:
claimName: maven-cache-pvc # Shared across all builds
"""
}
}
// Use offline mode to skip network requests entirely
sh 'mvn package -o -T 4 -DskipTests'
Optimization 3 — Docker layer caching:
# Order layers: least-changed → most-changed
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline -q # Cached unless pom.xml changes
COPY src/ ./src/
RUN mvn package -DskipTests
CMD ["java", "-jar", "target/app.jar"]
sh """
docker build \
--cache-from ${ECR_REPO}:cache \
--build-arg BUILDKIT_INLINE_CACHE=1 \
-t ${ECR_REPO}:${IMAGE_TAG} \
-t ${ECR_REPO}:cache .
docker push ${ECR_REPO}:cache
"""
Optimization 4 — Skip unchanged services in a monorepo:
stage('Detect Changes') {
steps {
script {
def changedFiles = sh(script: "git diff --name-only HEAD~1 HEAD", returnStdout: true).trim().split('\n')
env.BUILD_PAYMENT = changedFiles.any { it.startsWith('services/payment/') } ? 'true' : 'false'
env.BUILD_AUTH = changedFiles.any { it.startsWith('services/auth/') } ? 'true' : 'false'
env.BUILD_FRONTEND = changedFiles.any { it.startsWith('frontend/') } ? 'true' : 'false'
}
}
}
stage('Build Payment Service') {
when { environment name: 'BUILD_PAYMENT', value: 'true' }
steps { sh './scripts/build.sh payment' }
}
Optimization 5 — Fail fast:
options {
parallelsAlwaysFailFast() // Abort parallel siblings on first failure
}
Result summary:
| Stage | Before | After | Technique |
|---|---|---|---|
| Unit + Integration + Sonar | 25 min | 10 min | Parallelized |
| Maven build | 8 min | 2 min | Dep. cache + -T 4 |
| Docker build | 6 min | 1 min | Layer cache + BuildKit |
| Unchanged services | 6 min | 0 min | Change detection |
| Total | 45 min | 13 min |
Section 8 — Troubleshooting & Debugging
What is it? The connection between the Jenkins controller and agent drops mid-build — the agent process died or the network timed out.
Diagnosis:
# Common log messages and their meaning:
# "Unexpected termination of the channel" → Agent JVM crashed (OOM)
# "java.net.SocketTimeoutException" → Network timeout (AWS NLB issue)
dmesg | grep -i "oom\|killed" # Check for OOM kills
journalctl -u jenkins-agent # Check agent service logs
Root causes and fixes:
Cause 1 — Agent JVM OOM killed:
# Increase heap in agent startup
java -Xmx2g -jar agent.jar -jnlpUrl ...
# Or in K8s pod template:
resources:
limits:
memory: "4Gi" # Must exceed JVM -Xmx
Cause 2 — AWS NLB idle timeout on long builds:
java -Dsun.net.client.defaultConnectTimeout=60000 \
-Dsun.net.client.defaultReadTimeout=60000 \
-jar agent.jar
Cause 3 — No output for extended periods (Jenkins assumes agent is dead):
stage('Long Running Build') {
options { timeout(time: 2, unit: 'HOURS') }
steps {
sh '''
# Print progress every 60 seconds to maintain heartbeat
while ./long-running-process.sh; do
echo "Still running at $(date)..."
sleep 60
done
'''
}
}
Technique 1 — Jenkins Script Console (fastest feedback loop):
// Manage Jenkins → Script Console
def testBranchToEnv(String branch) {
switch(branch) {
case 'main': return 'production'
case 'develop': return 'staging'
default: return 'dev'
}
}
['main', 'develop', 'feature/xyz', null].each { branch ->
try {
println "${branch} → ${testBranchToEnv(branch)}"
} catch(e) {
println "ERROR for '${branch}': ${e.message}"
}
}
Technique 2 — Add conditional debug logging to the shared library:
// vars/deployToKubernetes.groovy
def call(Map config) {
if (config.debug) {
echo "=== DEBUG: deployToKubernetes ==="
config.each { k, v -> echo " ${k}: ${v}" }
}
// ... rest of function
}
// Activate in Jenkinsfile:
deployToKubernetes(appName: 'myapp', debug: true)
Technique 3 — Replay Pipeline with inline edits:
- Open the failed build → click Replay → edit the Jenkinsfile inline → re-run.
- No git commit required. Ideal for quick iteration during debugging.
Technique 4 — Unit test shared libraries with JenkinsPipelineUnit:
class StandardPipelineTest extends BasePipelineTest {
@Test
void 'should deploy to staging on develop branch'() {
binding.setVariable('env', [BRANCH_NAME: 'develop', BUILD_NUMBER: '42'])
def script = loadScript('vars/standardPipeline.groovy')
script.call(appName: 'test-app')
assertJobStatusSuccess()
}
@Test
void 'should require approval on main branch'() {
binding.setVariable('env', [BRANCH_NAME: 'main'])
helper.registerAllowedMethod('input', [Map.class], {
return [ACTION: 'Deploy', TICKET: 'JIRA-123']
})
def script = loadScript('vars/standardPipeline.groovy')
script.call(appName: 'test-app')
assertJobStatusSuccess()
}
}
Section 9 — Docker & Kubernetes with Jenkins
Option 1 — Docker-in-Docker (DinD):
containers:
- name: docker
image: docker:24-dind
securityContext:
privileged: true # ⚠️ Grants root-level access to the host kernel
env:
- name: DOCKER_TLS_CERTDIR
value: ""
Option 2 — Docker socket mount:
containers:
- name: docker
image: docker:24-cli
volumeMounts:
- name: docker-sock
mountPath: /var/run/docker.sock
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
Option 3 — Kaniko (rootless, recommended for production):
pipeline {
agent {
kubernetes {
yaml """
spec:
containers:
- name: kaniko
image: gcr.io/kaniko-project/executor:debug
command: ['/busybox/sh', '-c', 'sleep infinity']
volumeMounts:
- name: kaniko-secret
mountPath: /kaniko/.docker
volumes:
- name: kaniko-secret
secret:
secretName: ecr-registry-secret
items:
- key: .dockerconfigjson
path: config.json
"""
}
}
stages {
stage('Build with Kaniko') {
steps {
container('kaniko') {
sh """
/kaniko/executor \
--context=dir://\${WORKSPACE} \
--dockerfile=Dockerfile \
--destination=123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp:${BUILD_NUMBER} \
--cache=true \
--cache-repo=123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp-cache
"""
}
}
}
}
}
Comparison:
| Method | Security | Speed | Recommended |
|---|---|---|---|
| DinD | Low (privileged) | Fast | Dev / test only |
| Socket mount | Medium (shares host daemon) | Fast | Internal clusters |
| Kaniko | High (rootless) | Medium | ✅ Production |
| Buildah | High (rootless) | Medium | ✅ Production |
Section 10 — Jenkins + AWS / Cloud Integrations
pipeline {
agent { label 'docker-agent' }
environment {
AWS_REGION = 'us-east-1'
ECR_REGISTRY = '123456789012.dkr.ecr.us-east-1.amazonaws.com'
APP_NAME = 'api-service'
ECS_CLUSTER = 'prod-cluster'
ECS_SERVICE = 'api-service-prod'
TASK_FAMILY = 'api-service-task'
LB_LISTENER_ARN = 'arn:aws:elasticloadbalancing:...'
TG_BLUE_ARN = 'arn:aws:elasticloadbalancing:...:targetgroup/blue'
TG_GREEN_ARN = 'arn:aws:elasticloadbalancing:...:targetgroup/green'
IMAGE_TAG = "${BUILD_NUMBER}-${GIT_COMMIT[0..7]}"
}
stages {
stage('Build & Push Image') {
steps {
withAWS(credentials: 'aws-deploy', region: AWS_REGION) {
sh """
aws ecr get-login-password | docker login --username AWS --password-stdin ${ECR_REGISTRY}
docker build -t ${ECR_REGISTRY}/${APP_NAME}:${IMAGE_TAG} .
docker push ${ECR_REGISTRY}/${APP_NAME}:${IMAGE_TAG}
"""
}
}
}
stage('Register New Task Definition') {
steps {
withAWS(credentials: 'aws-deploy', region: AWS_REGION) {
script {
def currentTaskDef = sh(
script: "aws ecs describe-task-definition --task-definition ${TASK_FAMILY} --query taskDefinition",
returnStdout: true
).trim()
def taskJson = readJSON text: currentTaskDef
taskJson.containerDefinitions[0].image = "${ECR_REGISTRY}/${APP_NAME}:${IMAGE_TAG}"
['taskDefinitionArn','revision','status','requiresAttributes',
'compatibilities','registeredAt','registeredBy'].each { taskJson.remove(it) }
writeJSON file: 'new-task-def.json', json: taskJson
def registerOutput = sh(
script: "aws ecs register-task-definition --cli-input-json file://new-task-def.json",
returnStdout: true
).trim()
env.NEW_TASK_DEF_ARN = (readJSON text: registerOutput).taskDefinition.taskDefinitionArn
}
}
}
}
stage('Determine Active / Standby Target Groups') {
steps {
withAWS(credentials: 'aws-deploy', region: AWS_REGION) {
script {
def activeTg = sh(
script: "aws elbv2 describe-rules --listener-arn ${LB_LISTENER_ARN} --query 'Rules[?Priority==`1`].Actions[0].TargetGroupArn' --output text",
returnStdout: true
).trim()
if (activeTg == TG_BLUE_ARN) {
env.ACTIVE_TG = TG_BLUE_ARN; env.STANDBY_TG = TG_GREEN_ARN
env.ACTIVE_COLOR = 'blue'; env.STANDBY_COLOR = 'green'
} else {
env.ACTIVE_TG = TG_GREEN_ARN; env.STANDBY_TG = TG_BLUE_ARN
env.ACTIVE_COLOR = 'green'; env.STANDBY_COLOR = 'blue'
}
echo "Active: ${env.ACTIVE_COLOR} → Deploying to: ${env.STANDBY_COLOR}"
}
}
}
}
stage('Deploy to Standby') {
steps {
withAWS(credentials: 'aws-deploy', region: AWS_REGION) {
sh """
aws ecs update-service \
--cluster ${ECS_CLUSTER} \
--service ${ECS_SERVICE}-${STANDBY_COLOR} \
--task-definition ${NEW_TASK_DEF_ARN} \
--force-new-deployment
aws ecs wait services-stable \
--cluster ${ECS_CLUSTER} \
--services ${ECS_SERVICE}-${STANDBY_COLOR}
"""
}
}
}
stage('Smoke Test Standby') {
steps {
withAWS(credentials: 'aws-deploy', region: AWS_REGION) {
script {
def testUrl = sh(script: "aws elbv2 describe-load-balancers --query 'LoadBalancers[0].DNSName' --output text", returnStdout: true).trim()
def response = sh(script: "curl -sf -o /dev/null -w '%{http_code}' http://${testUrl}:8080/health", returnStdout: true).trim()
if (response != '200') error("Smoke test failed with HTTP ${response}. NOT switching traffic.")
echo "✅ Smoke test passed on ${STANDBY_COLOR}"
}
}
}
}
stage('Switch Traffic') {
steps {
withAWS(credentials: 'aws-deploy', region: AWS_REGION) {
sh """
aws elbv2 modify-listener \
--listener-arn ${LB_LISTENER_ARN} \
--default-actions Type=forward,TargetGroupArn=${STANDBY_TG}
"""
echo "🔀 Traffic switched: ${ACTIVE_COLOR} → ${STANDBY_COLOR}"
}
}
}
stage('Verify Production') {
steps {
sleep(time: 60, unit: 'SECONDS')
withAWS(credentials: 'aws-deploy', region: AWS_REGION) {
script {
def errorRate = sh(
script: """aws cloudwatch get-metric-statistics \
--namespace AWS/ApplicationELB \
--metric-name HTTPCode_Target_5XX_Count \
--start-time \$(date -u -d '5 minutes ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time \$(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 300 --statistics Sum \
--query 'Datapoints[0].Sum' --output text""",
returnStdout: true
).trim()
if (errorRate?.isDouble() && errorRate.toDouble() > 10) {
echo "❌ High error rate! Rolling back..."; rollback()
} else {
echo "✅ Production verified. Blue-Green deployment complete!"
}
}
}
}
}
}
}
def rollback() {
withAWS(credentials: 'aws-deploy', region: env.AWS_REGION) {
sh "aws elbv2 modify-listener --listener-arn ${env.LB_LISTENER_ARN} --default-actions Type=forward,TargetGroupArn=${env.ACTIVE_TG}"
echo "⏪ Rolled back to ${env.ACTIVE_COLOR}"
}
}
Section 11 — Notifications & Reporting
Centralized notification shared library (vars/notify.groovy):
def slack(Map config) {
def emoji = config.status == 'SUCCESS' ? '✅' : config.status == 'FAILURE' ? '❌' : '⚠️'
def color = config.status == 'SUCCESS' ? 'good' : 'danger'
def channel = config.channel ?: '#ci-cd-alerts'
slackSend(
channel: channel,
color: color,
blocks: [
[type: "header", text: [type: "plain_text", text: "${emoji} Build ${config.status}: ${config.appName}"]],
[type: "section", fields: [
[type: "mrkdwn", text: "*Branch:*\n`${env.BRANCH_NAME}`"],
[type: "mrkdwn", text: "*Build:*\n`#${env.BUILD_NUMBER}`"],
[type: "mrkdwn", text: "*Duration:*\n`${currentBuild.durationString}`"]
]],
[type: "actions", elements: [
[type: "button", text: [type: "plain_text", text: "View Build"], url: env.BUILD_URL, style: "primary"]
]]
]
)
}
def email(Map config) {
if (config.status == 'FAILURE' || currentBuild.previousBuild?.result == 'FAILURE') {
emailext(
to: config.recipients ?: '${DEFAULT_RECIPIENTS}',
subject: "[Jenkins] ${config.status}: ${config.appName} #${env.BUILD_NUMBER}",
body: '''${SCRIPT, template="managed:groovy-html.template"}''',
attachLog: config.status == 'FAILURE'
)
}
}
def pagerDuty(String serviceKey, String message) {
if (env.BRANCH_NAME == 'main') {
sh """
curl -X POST https://events.pagerduty.com/v2/enqueue \
-H 'Content-Type: application/json' \
-d '{
"routing_key": "${serviceKey}",
"event_action": "trigger",
"payload": {
"summary": "${message}",
"severity": "critical",
"source": "Jenkins Build #${env.BUILD_NUMBER}"
}
}'
"""
}
}
Usage in any Jenkinsfile:
@Library('shared-lib') _
pipeline {
// ... stages ...
post {
success { notify.slack(status: 'SUCCESS', appName: 'payment-service', channel: '#team-payments') }
failure {
notify.slack(status: 'FAILURE', appName: 'payment-service')
notify.email(status: 'FAILURE', appName: 'payment-service', recipients: 'team@company.com')
script {
if (env.BRANCH_NAME == 'main') {
notify.pagerDuty(credentials('pagerduty-key'), "Production deployment FAILED for payment-service")
}
}
}
}
}
Section 12 — Backup, Disaster Recovery & High Availability
Critical files inside $JENKINS_HOME to back up:
$JENKINS_HOME/
├── config.xml ← Main Jenkins configuration
├── credentials.xml ← Encrypted credentials
├── jobs/ ← All job configurations
├── plugins/ ← Installed plugins (*.jpi)
├── users/ ← User accounts
├── secrets/ ← Encryption master keys ⚠️ CRITICAL — without these, credentials.xml is unreadable
│ ├── master.key
│ └── hudson.util.Secret
└── nodes/ ← Agent configurations
Automated daily backup pipeline:
pipeline {
agent { label 'controller' }
triggers { cron('0 2 * * *') } // 2 AM daily
environment {
JENKINS_HOME = '/var/jenkins_home'
S3_BUCKET = 's3://company-jenkins-backups'
BACKUP_DATE = sh(script: 'date +%Y-%m-%d', returnStdout: true).trim()
RETENTION_DAYS = '30'
}
stages {
stage('Create Backup') {
steps {
sh """
tar -czf /tmp/jenkins-backup-${BACKUP_DATE}.tar.gz \
--exclude='${JENKINS_HOME}/workspace' \
--exclude='${JENKINS_HOME}/jobs/*/builds' \
--exclude='${JENKINS_HOME}/caches' \
-C / \
var/jenkins_home/config.xml \
var/jenkins_home/credentials.xml \
var/jenkins_home/jobs \
var/jenkins_home/plugins \
var/jenkins_home/users \
var/jenkins_home/secrets \
var/jenkins_home/nodes
"""
}
}
stage('Upload to S3') {
steps {
withAWS(credentials: 'aws-jenkins-backup', region: 'us-east-1') {
sh """
aws s3 cp /tmp/jenkins-backup-${BACKUP_DATE}.tar.gz \
${S3_BUCKET}/backups/jenkins-backup-${BACKUP_DATE}.tar.gz \
--sse aws:kms --storage-class STANDARD_IA
rm /tmp/jenkins-backup-${BACKUP_DATE}.tar.gz
"""
}
}
}
stage('Verify Backup') {
steps {
withAWS(credentials: 'aws-jenkins-backup', region: 'us-east-1') {
sh """
aws s3 ls ${S3_BUCKET}/backups/jenkins-backup-${BACKUP_DATE}.tar.gz
echo "✅ Backup verified"
"""
}
}
}
}
post {
failure {
slackSend channel: '#ops-alerts', color: 'danger', message: "❌ Jenkins backup FAILED! Immediate attention needed."
}
}
}
Restore procedure:
#!/bin/bash
BACKUP_DATE=${1:-$(date +%Y-%m-%d)}
S3_BUCKET="s3://company-jenkins-backups"
JENKINS_HOME="/var/jenkins_home"
systemctl stop jenkins
aws s3 cp ${S3_BUCKET}/backups/jenkins-backup-${BACKUP_DATE}.tar.gz /tmp/
tar -xzf /tmp/jenkins-backup-${BACKUP_DATE}.tar.gz -C /
chown -R jenkins:jenkins ${JENKINS_HOME}
systemctl start jenkins
echo "✅ Jenkins restored from ${BACKUP_DATE}"
echo "Monitor startup: journalctl -u jenkins -f"
Section 13 — Jenkins as Code (JCasC)
Complete jenkins.yaml example:
jenkins:
systemMessage: "🔧 Jenkins — Managed by JCasC | Do NOT edit manually"
numExecutors: 0 # No builds on controller
mode: EXCLUSIVE
quietPeriod: 5
securityRealm:
ldap:
configurations:
- server: "ldaps://ldap.company.com:636"
rootDN: "dc=company,dc=com"
userSearchBase: "ou=users"
groupSearchBase: "ou=groups"
managerDN: "cn=jenkins-svc,ou=service-accounts,dc=company,dc=com"
managerPasswordSecret: "${LDAP_BIND_PASSWORD}"
authorizationStrategy:
roleBased:
roles:
global:
- name: "admin"
permissions: ["Overall/Administer"]
assignments: ["devops-team"]
- name: "developer"
permissions: ["Overall/Read", "Job/Build", "Job/Read", "View/Read"]
assignments: ["all-engineers"]
clouds:
- kubernetes:
name: "eks-prod"
serverUrl: "https://eks-api.company.com"
credentialsId: "eks-service-account-token"
namespace: "jenkins-agents"
jenkinsUrl: "http://jenkins.jenkins-system.svc.cluster.local:8080"
jenkinsTunnel: "jenkins.jenkins-system.svc.cluster.local:50000"
podRetention: never
templates:
- name: "standard-agent"
label: "docker-agent"
namespace: "jenkins-agents"
serviceAccount: "jenkins-agent"
idleMinutes: 5
containers:
- name: "jnlp"
image: "jenkins/inbound-agent:latest"
resourceRequestCpu: "500m"
resourceLimitCpu: "1"
resourceLimitMemory: "2Gi"
- name: "docker"
image: "docker:24-dind"
privileged: true
resourceLimitCpu: "2"
resourceLimitMemory: "4Gi"
unclassified:
globalLibraries:
libraries:
- name: "company-shared-lib"
retriever:
modernSCM:
scm:
git:
remote: "git@github.com:company/jenkins-shared-library.git"
credentialsId: "github-deploy-key"
defaultVersion: "main"
slackNotifier:
teamDomain: "company"
tokenCredentialId: "slack-bot-token"
room: "#ci-cd-alerts"
sonarGlobalConfiguration:
installations:
- name: "SonarQube"
serverUrl: "http://sonarqube.company.com"
credentialsId: "sonarqube-token"
credentials:
system:
domainCredentials:
- credentials:
- usernamePassword:
id: "aws-ecr-credentials"
username: "AWS"
password: "${ECR_PASSWORD}"
- sshUserPrivateKey:
id: "github-deploy-key"
username: "git"
privateKeySource:
directEntry:
privateKey: "${GITHUB_SSH_PRIVATE_KEY}"
- string:
id: "slack-bot-token"
secret: "${SLACK_BOT_TOKEN}"
Deploy JCasC via Helm:
helm upgrade jenkins jenkins/jenkins \
--set controller.JCasC.configScripts.jenkins-config="$(cat jenkins.yaml)" \
--set controller.adminPassword="${JENKINS_ADMIN_PASSWORD}" \
--namespace jenkins-system
Key benefit: The entire Jenkins platform — security, agents, credentials, plugins, shared libraries — is version-controlled, reviewable, and reproducible from scratch in minutes.
Section 14 — Advanced Pipeline Patterns
pipeline {
agent none // Each matrix cell provisions its own agent
stages {
stage('Matrix Build') {
matrix {
axes {
axis {
name 'JDK_VERSION'
values '11', '17', '21'
}
axis {
name 'OS'
values 'linux', 'windows'
}
}
// Exclude unsupported combinations
excludes {
exclude {
axis { name 'OS'; values 'windows' }
axis { name 'JDK_VERSION'; values '11' }
}
}
agent {
docker {
image "eclipse-temurin:${JDK_VERSION}-jdk"
label "${OS}-docker-agent"
}
}
stages {
stage('Build') {
steps {
sh "java -version"
sh "mvn clean package -DskipTests -T 2"
}
}
stage('Test') {
steps {
sh "mvn test -T 2"
}
post {
always { junit "target/surefire-reports/*.xml" }
}
}
}
}
}
}
}
Resulting matrix — 5 parallel cells (Windows/JDK11 excluded):
| JDK 11 | JDK 17 | JDK 21 | |
|---|---|---|---|
| Linux | ✅ | ✅ | ✅ |
| Windows | ❌ excluded | ✅ | ✅ |
pipeline {
agent { label 'controller' }
stages {
stage('Deploy All Services') {
steps {
script {
def services = [
'auth-service', 'payment-service', 'order-service', 'inventory-service',
'notification-service', 'user-service', 'product-service', 'cart-service',
'shipping-service', 'analytics-service', 'recommendation-service', 'search-service',
'review-service', 'coupon-service', 'loyalty-service', 'fraud-service',
'reporting-service', 'admin-service', 'webhook-service', 'email-service'
]
def batchSize = 5
def batches = services.collate(batchSize)
def failedServices = []
batches.eachWithIndex { batch, idx ->
echo "🚀 Deploying batch ${idx + 1}/${batches.size()}: ${batch}"
def parallelDeployments = ['failFast': false]
batch.each { service ->
parallelDeployments[service] = {
stage("Deploy ${service}") {
try {
deployService(service, env.IMAGE_TAG)
echo "✅ ${service} deployed"
} catch (e) {
echo "❌ ${service} failed: ${e.message}"
failedServices << service
// Do not throw — let remaining batch members continue
}
}
}
}
parallel parallelDeployments
echo "Batch ${idx + 1} complete. Waiting 10s before next batch..."
sleep(10)
}
if (failedServices) {
error("The following services failed: ${failedServices.join(', ')}")
}
}
}
}
}
}
def deployService(String service, String tag) {
sh """
helm upgrade --install ${service} ./charts/${service} \
--namespace production \
--set image.tag=${tag} \
--wait --timeout 5m
"""
}
Deployment flow:
Batch 1 (5 services) → all in parallel → wait 10s
Batch 2 (5 services) → all in parallel → wait 10s
Batch 3 (5 services) → all in parallel → wait 10s
Batch 4 (5 services) → all in parallel → done
Platform Architecture:
┌─────────────────────────────────────────────────────────────────┐
│ DEVELOPER EXPERIENCE LAYER │
│ GitHub Enterprise → PR triggers → Webhook → Multi-Branch │
└──────────────────────────┬──────────────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────────────┐
│ JENKINS PLATFORM (EKS Hosted, HA) │
│ ┌─────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Controller (Active) │ │ Controller (EFS Warm Standby) │ │
│ └──────────┬──────────┘ └─────────────────────────────────┘ │
│ │ │
│ ┌──────────▼───────────────────────────────────────────────┐ │
│ │ Dynamic K8s Agent Pool (Karpenter auto-scale) │ │
│ │ standard-agent │ heavy-build │ docker-agent │ gpu-agent │ │
│ └──────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│ │ │ │
┌──────▼──┐ ┌──────▼──┐ ┌─────▼──┐ ┌─────▼──────┐
│SonarQube│ │ Nexus │ │ Trivy │ │ Vault │
│ (SAST) │ │(Artifacts│ │ (CSPM) │ │ (Secrets) │
└─────────┘ └─────────┘ └────────┘ └────────────┘
│
┌──────▼─────────────────────────────────────────────────────┐
│ DEPLOYMENT TARGETS │
│ Dev EKS → Staging EKS → UAT EKS → Production EKS (Multi-AZ)│
└────────────────────────────────────────────────────────────┘
Platform components:
| Component | Technology | Purpose |
|---|---|---|
| Source Control | GitHub Enterprise | PR-based workflow, branch protection |
| CI Engine | Jenkins (HA on EKS) | Pipeline orchestration |
| Artifact Registry | ECR + Nexus | Docker images + Maven/npm packages |
| SAST | SonarQube | Code quality + security analysis |
| DAST | OWASP ZAP | Runtime security scanning |
| Container Scan | Trivy | CVE vulnerability scanning |
| Secrets | HashiCorp Vault | Zero-trust secret management |
| IaC | Terraform + Atlantis | Infrastructure provisioning |
| GitOps | ArgoCD | Kubernetes deployment reconciliation |
| Monitoring | Prometheus + Grafana | Build metrics and dashboards |
| Notifications | Slack + PagerDuty | Alerts and on-call escalation |
Governance rules (enforced by shared library — non-negotiable):
- PR → feature branch (unit test, lint, SAST) — required to pass before merge.
- Merge to develop → staging deploy with integration tests.
- Release cut → UAT deploy with performance tests and approval gate.
- Tag push → production deploy via blue/green with CloudWatch monitoring gates.
All configurations are GitOps-managed:
- Jenkinsfiles versioned in each repository.
- JCasC YAML in a dedicated
jenkins-platformrepo. - Shared library versioned and tagged in
jenkins-shared-libraryrepo. - Zero manual UI configuration — everything is reproducible from git.
Quick Reference — Common Jenkins Pipeline Patterns
// ── Credentials binding ──────────────────────────────────────────────────
withCredentials([
usernamePassword(credentialsId: 'my-cred', usernameVariable: 'USER', passwordVariable: 'PASS'),
sshUserPrivateKey(credentialsId: 'ssh-key', keyFileVariable: 'KEY_FILE'),
string(credentialsId: 'api-token', variable: 'TOKEN'),
file(credentialsId: 'kubeconfig', variable: 'KUBECONFIG')
]) {
sh 'curl -u $USER:$PASS https://api.example.com'
}
// ── Parallel stages with fail-fast ──────────────────────────────────────
stage('Tests') {
parallel {
stage('Unit') { steps { sh 'npm test' } }
stage('E2E') { steps { sh 'npx cypress run' } }
stage('Performance') { steps { sh 'k6 run load-test.js' } }
}
}
// ── Retry with timeout ──────────────────────────────────────────────────
stage('Deploy') {
options {
retry(3)
timeout(time: 10, unit: 'MINUTES')
}
steps { sh './deploy.sh' }
}
// ── Conditional execution ───────────────────────────────────────────────
when {
allOf {
branch 'main'
not { changeRequest() }
environment name: 'SKIP_DEPLOY', value: 'false'
}
}
// ── Stash / unstash across agents ───────────────────────────────────────
stage('Build') {
steps {
sh 'npm run build'
stash includes: 'dist/**', name: 'frontend-dist'
}
}
stage('Deploy') {
agent { label 'deploy-agent' }
steps {
unstash 'frontend-dist'
sh 'aws s3 sync dist/ s3://my-bucket/'
}
}
// ── Pipeline parameters ─────────────────────────────────────────────────
parameters {
string(name: 'IMAGE_TAG', defaultValue: 'latest', description: 'Docker tag to deploy')
choice(name: 'ENV', choices: ['dev', 'staging', 'prod'], description: 'Target environment')
booleanParam(name: 'SKIP_TESTS', defaultValue: false, description: 'Skip test stages')
password(name: 'SECRET', description: 'Deployment secret (masked in logs)')
}
Senior DevOps Engineer Mindset
- Security first — Credentials in the store, IAM roles over access keys, RBAC enforced everywhere.
- Pipeline as code — Nothing configured through the UI; everything committed to git.
- Shared libraries — Eliminate duplication across 50+ Jenkinsfiles with a standardized shared pipeline.
- Observability — Build metrics in Prometheus/Grafana; Slack and PagerDuty for alerts.
- Scalability — Dynamic K8s agents via Karpenter; never a fixed pool of static slaves.
- Fail fast — Parallelize everything,
parallelsAlwaysFailFast(), skip unchanged services. - DR awareness — Know how to back up, restore, and failover Jenkins in under 30 minutes.
- JCasC — Jenkins config in git means reproducible, auditable, peer-reviewed infrastructure.
- Idempotent pipelines — Every stage is safe to re-run without unintended side effects.
- Developer experience — Fast feedback loops, clear error messages, self-service deployments.
Last updated: 2026 | Questions sourced from real interviews at Atlassian, Spotify, Intuit, Twilio, LinkedIn, and similar organizations.
⭐ Pair with the Git Scenario Q&A and AWS README for complete Senior DevOps interview preparation.
Add More Questions to This Guide
Know questions that should be here? Share them and help the community!
Open Google Form