Home / Documentation / Infrastructure Management / Kubernetes Best Practices

Kubernetes Best Practices

7 min read
Updated Jun 18, 2025

Kubernetes Best Practices

Production-ready patterns and practices for building reliable, scalable, and secure Kubernetes applications.

Application Design Best Practices

The Twelve-Factor App

  • Codebase: One codebase tracked in version control
  • Dependencies: Explicitly declare and isolate dependencies
  • Config: Store config in environment variables
  • Backing Services: Treat as attached resources
  • Build, Release, Run: Strictly separate stages
  • Processes: Execute app as stateless processes
  • Port Binding: Export services via port binding
  • Concurrency: Scale out via process model
  • Disposability: Fast startup and graceful shutdown
  • Dev/Prod Parity: Keep environments similar
  • Logs: Treat logs as event streams
  • Admin Processes: Run admin tasks as one-off processes

Container Best Practices

# Multi-stage Dockerfile for optimal image size
# Build stage
FROM golang:1.19-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Final stage
FROM alpine:3.17
RUN apk --no-cache add ca-certificates
WORKDIR /root/

# Create non-root user
RUN addgroup -g 1000 -S appgroup && \
    adduser -u 1000 -S appuser -G appgroup

# Copy binary from builder
COPY --from=builder /app/main .
RUN chown -R appuser:appgroup /root

USER appuser
EXPOSE 8080
CMD ["./main"]

Resource Management

Resource Requests and Limits

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: app
        image: myapp:latest
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        # Vertical Pod Autoscaler recommendations
        # Use metrics-server to determine optimal values

Quality of Service Classes

QoS Class Criteria Priority Use Case
Guaranteed requests = limits for all resources Highest Critical workloads
Burstable requests < limits Medium Most applications
BestEffort No requests or limits Lowest Non-critical batch

Horizontal Pod Autoscaling

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app-deployment
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 30
      - type: Pods
        value: 4
        periodSeconds: 60

Health Checks and Lifecycle

Probes Configuration

apiVersion: v1
kind: Pod
metadata:
  name: app-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    ports:
    - containerPort: 8080
    
    # Startup probe for slow-starting containers
    startupProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 30
    
    # Liveness probe to restart unhealthy containers
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 30
      periodSeconds: 10
      timeoutSeconds: 5
      successThreshold: 1
      failureThreshold: 3
    
    # Readiness probe to control traffic routing
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 5
      timeoutSeconds: 1
      successThreshold: 1
      failureThreshold: 3
    
    # Graceful shutdown
    lifecycle:
      preStop:
        exec:
          command: ["/bin/sh", "-c", "sleep 15"]

Graceful Shutdown Pattern

// Go example for graceful shutdown
package main

import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)

func main() {
    srv := &http.Server{Addr: ":8080"}
    
    // Handle shutdown signals
    done := make(chan bool, 1)
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
    
    go func() {
        <-quit
        log.Println("Server is shutting down...")
        
        // Give k8s time to stop routing traffic
        time.Sleep(5 * time.Second)
        
        ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
        defer cancel()
        
        srv.SetKeepAlivesEnabled(false)
        if err := srv.Shutdown(ctx); err != nil {
            log.Fatalf("Could not gracefully shutdown: %v\n", err)
        }
        close(done)
    }()
    
    log.Println("Server is ready to handle requests at :8080")
    if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
        log.Fatalf("Could not listen on :8080: %v\n", err)
    }
    
    <-done
    log.Println("Server stopped")
}

Configuration Management

ConfigMaps and Secrets

# ConfigMap for non-sensitive configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database_host: "postgres.default.svc.cluster.local"
  log_level: "info"
  app.properties: |
    server.port=8080
    cache.size=1000

---
# Secret for sensitive data
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque
data:
  database_password: cGFzc3dvcmQxMjM=  # base64 encoded
  api_key: YWJjZGVmZ2hpams=

---
# Using ConfigMap and Secret in Pod
apiVersion: v1
kind: Pod
metadata:
  name: app-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    env:
    - name: DATABASE_HOST
      valueFrom:
        configMapKeyRef:
          name: app-config
          key: database_host
    - name: DATABASE_PASSWORD
      valueFrom:
        secretKeyRef:
          name: app-secrets
          key: database_password
    volumeMounts:
    - name: config
      mountPath: /etc/config
      readOnly: true
  volumes:
  - name: config
    configMap:
      name: app-config

External Secrets Management

# Using Sealed Secrets
# Install sealed-secrets controller
kubectl apply -f https://github.com/bitnami-labs/sealed-secrets/releases/download/v0.18.0/controller.yaml

# Create sealed secret
echo -n mypassword | kubectl create secret generic mysecret --dry-run=client --from-file=password=/dev/stdin -o yaml | kubeseal -o yaml > mysealedsecret.yaml

# Using External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: vault-backend
spec:
  provider:
    vault:
      server: "https://vault.example.com:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "demo"

---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
spec:
  refreshInterval: 15s
  secretStoreRef:
    name: vault-backend
    kind: SecretStore
  target:
    name: app-secrets
  data:
  - secretKey: password
    remoteRef:
      key: secret/data/database
      property: password

Networking Best Practices

Service Types and Use Cases

Service Type Use Case Access
ClusterIP Internal services Cluster internal only
NodePort External access (dev/test) Node IP:NodePort
LoadBalancer External production access External IP
ExternalName External service mapping DNS CNAME

Network Policies

# Zero-trust network policy
# Default deny all
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
# Allow specific ingress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-api
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: production
      podSelector:
        matchLabels:
          app: frontend
    ports:
    - protocol: TCP
      port: 8080

---
# Allow DNS egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53

Security Best Practices

Pod Security Standards

# Restricted security context
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 10001
    fsGroup: 10001
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
        add:
        - NET_BIND_SERVICE
    volumeMounts:
    - name: tmp
      mountPath: /tmp
    - name: cache
      mountPath: /app/cache
  volumes:
  - name: tmp
    emptyDir: {}
  - name: cache
    emptyDir: {}

RBAC Best Practices

# Principle of least privilege
# Developer role with limited permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: developer
  namespace: development
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
  resources: ["pods/log", "pods/exec"]
  verbs: ["get", "create"]

---
# Bind to specific users
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: development
subjects:
- kind: User
  name: [email protected]
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

Observability

Structured Logging

// Structured logging example
{
  "timestamp": "2023-12-01T10:30:45.123Z",
  "level": "info",
  "message": "Request processed",
  "service": "api",
  "version": "1.2.3",
  "trace_id": "abc123",
  "span_id": "def456",
  "user_id": "user789",
  "method": "POST",
  "path": "/api/orders",
  "duration_ms": 145,
  "status_code": 200
}

Distributed Tracing

# OpenTelemetry configuration
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    processors:
      batch:
        timeout: 1s
        send_batch_size: 1024
      memory_limiter:
        check_interval: 1s
        limit_mib: 512
    exporters:
      jaeger:
        endpoint: jaeger-collector:14250
        tls:
          insecure: true
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [jaeger]

Multi-Tenancy

Namespace Isolation

# Resource quotas per namespace
apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-quota
  namespace: tenant-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    persistentvolumeclaims: "10"
    services.loadbalancers: "2"

---
# Limit ranges
apiVersion: v1
kind: LimitRange
metadata:
  name: tenant-limits
  namespace: tenant-a
spec:
  limits:
  - max:
      cpu: "2"
      memory: 4Gi
    min:
      cpu: 100m
      memory: 128Mi
    default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 250m
      memory: 256Mi
    type: Container

GitOps and CI/CD

GitOps Principles

  • Declarative: Entire system described declaratively
  • Versioned: Canonical desired state in Git
  • Automated: Approved changes automatically applied
  • Observable: Easy to see what's deployed

Deployment Strategies

# Blue-Green Deployment
apiVersion: v1
kind: Service
metadata:
  name: app-service
spec:
  selector:
    app: myapp
    version: green  # Switch between blue/green
  ports:
  - port: 80
    targetPort: 8080

---
# Canary Deployment with Flagger
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
  name: app-canary
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: app
  progressDeadlineSeconds: 60
  service:
    port: 80
    targetPort: 8080
  analysis:
    interval: 30s
    threshold: 5
    maxWeight: 50
    stepWeight: 10
    metrics:
    - name: request-success-rate
      threshold: 99
      interval: 1m
    - name: request-duration
      threshold: 500
      interval: 30s

Disaster Recovery

Backup Strategy

  • Regular etcd backups
  • Persistent volume snapshots
  • Application state backups
  • Configuration management in Git
  • Cross-region replication

Multi-Cluster Setup

# Federation configuration
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
  name: app-deployment
  namespace: production
spec:
  template:
    metadata:
      labels:
        app: myapp
    spec:
      replicas: 6
      selector:
        matchLabels:
          app: myapp
      template:
        metadata:
          labels:
            app: myapp
        spec:
          containers:
          - name: app
            image: myapp:latest
  placement:
    clusters:
    - name: us-east-1
    - name: us-west-2
    - name: eu-central-1
  overrides:
  - clusterName: us-east-1
    clusterOverrides:
    - path: "/spec/replicas"
      value: 3
  - clusterName: us-west-2
    clusterOverrides:
    - path: "/spec/replicas"
      value: 2
  - clusterName: eu-central-1
    clusterOverrides:
    - path: "/spec/replicas"
      value: 1

Cost Optimization

Resource Optimization

  • Right-size resource requests/limits using VPA
  • Use spot/preemptible instances for non-critical workloads
  • Implement pod disruption budgets
  • Schedule batch jobs during off-peak hours
  • Use cluster autoscaler efficiently

Cost Monitoring

# Kubecost labels for cost allocation
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-deployment
  labels:
    app: myapp
    team: platform
    environment: production
    cost-center: engineering
spec:
  template:
    metadata:
      labels:
        app: myapp
        team: platform
        environment: production
        cost-center: engineering

Checklist Summary

  • □ Resource requests and limits defined
  • □ Health checks configured
  • □ Security context applied
  • □ Network policies implemented
  • □ Monitoring and logging in place
  • □ Backup strategy defined
  • □ CI/CD pipeline integrated
  • □ Documentation maintained
Note: This documentation is provided for reference purposes only. It reflects general best practices and industry-aligned guidelines, and any examples, claims, or recommendations are intended as illustrative—not definitive or binding.