Kubernetes Deployment Guide

14 min read

Updated Aug 05, 2025

Prerequisites

Before deploying Kubernetes, ensure you meet these requirements:

Component	Minimum	Recommended	Production
Master Nodes	1 node, 2 CPU, 4GB RAM	3 nodes, 4 CPU, 8GB RAM	3+ nodes, 8 CPU, 16GB RAM
Worker Nodes	1 node, 2 CPU, 4GB RAM	3 nodes, 4 CPU, 16GB RAM	5+ nodes, 16 CPU, 64GB RAM
Storage	50GB per node	100GB SSD per node	500GB+ NVMe per node
Network	1 Gbps	10 Gbps	25 Gbps+
Load Balancer	Optional	Required	HA Load Balancer

Software Requirements

Linux OS (Ubuntu 20.04+, CentOS 8+, or RHEL 8+)
Container runtime (containerd 1.6+ or CRI-O 1.24+)
kubectl CLI tool (matching cluster version)
Network connectivity between all nodes
Swap disabled on all nodes

Deployment Options

Choose the deployment method that best fits your requirements:

Managed Kubernetes

Cloud provider managed services like EKS, GKE, or AKS.

✓ Automated updates and patches

✓ Integrated with cloud services

✓ Built-in high availability

✗ Vendor lock-in

✗ Less control over configuration

kubeadm

Official Kubernetes deployment tool for bare metal or VMs.

✓ Full control and customization

✓ Platform agnostic

✓ Production-grade clusters

✗ Manual management required

✗ Complex initial setup

Kubespray

Ansible-based deployment for production clusters.

✓ Highly customizable

✓ Supports multiple OS

✓ Automated deployment

✗ Ansible knowledge required

✗ Longer deployment time

Cluster Setup with kubeadm

This guide demonstrates setting up a production Kubernetes cluster using kubeadm.

Step 1: Prepare All Nodes

prepare-nodes.sh bash

#!/bin/bash
# Run on all nodes (master and workers)

# Update system
sudo apt-get update
sudo apt-get upgrade -y

# Install required packages
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common

# Add Kubernetes repository
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo add-apt-repository "deb https://apt.kubernetes.io/ kubernetes-xenial main"

# Install Kubernetes components
sudo apt-get update
sudo apt-get install -y kubelet=1.27.0-00 kubeadm=1.27.0-00 kubectl=1.27.0-00
sudo apt-mark hold kubelet kubeadm kubectl

# Install containerd
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

# Configure containerd for systemd cgroup
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
sudo systemctl restart containerd

# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Load kernel modules
cat <

Step 2: Initialize Master Node

init-master.sh bash

#!/bin/bash
# Run only on the first master node

# Initialize cluster with custom configuration
sudo kubeadm init \
  --control-plane-endpoint="k8s-api.example.com:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16 \
  --service-cidr=10.96.0.0/12

# Configure kubectl for the admin user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Save join commands
kubeadm token create --print-join-command > worker-join-command.sh
kubeadm init phase upload-certs --upload-certs > control-plane-join-command.sh

Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You can now join any number of control-plane nodes by copying certificate authorities and service account keys on each node and then running the following as root: kubeadm join k8s-api.example.com:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:1234... \ --control-plane --certificate-key 5678...

Step 3: Configure High Availability (Optional)

┌─────────────────────────────────────────────────────┐
│              Load Balancer (HAProxy/NGINX)          │
│                  k8s-api.example.com:6443           │
└────────────┬────────────┬────────────┬──────────────┘
             │            │            │
        ┌────▼────┐  ┌────▼────┐  ┌────▼────┐
        │Master 1 │  │Master 2 │  │Master 3 │
        │ etcd    │  │ etcd    │  │ etcd    │
        │ API     │  │ API     │  │ API     │
        │ Sched   │  │ Sched   │  │ Sched   │
        │ CM      │  │ CM      │  │ CM      │
        └────┬────┘  └────┬────┘  └────┬────┘
             │            │            │
     ┌───────┴────────────┴────────────┴───────┐
     │                                         │
┌────▼────┐  ┌────▼────┐  ┌────▼────┐  ┌────▼────┐
│Worker 1 │  │Worker 2 │  │Worker 3 │  │Worker N │
│ kubelet │  │ kubelet │  │ kubelet │  │ kubelet │
│ kube-   │  │ kube-   │  │ kube-   │  │ kube-   │
│ proxy   │  │ proxy   │  │ proxy   │  │ proxy   │
└─────────┘  └─────────┘  └─────────┘  └─────────┘

Networking Configuration

Kubernetes requires a Container Network Interface (CNI) plugin for pod networking.

Install Calico CNI

install-calico.yaml yaml

# Download and customize Calico manifest
curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/tigera-operator.yaml -O
curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/custom-resources.yaml -O

# Modify CIDR in custom-resources.yaml to match pod-network-cidr
sed -i 's/cidr: 192.168.0.0\/16/cidr: 10.244.0.0\/16/g' custom-resources.yaml

# Apply Calico
kubectl create -f tigera-operator.yaml
kubectl create -f custom-resources.yaml

# Verify installation
kubectl get pods -n calico-system

Network Policies

network-policy-example.yaml yaml

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-http
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

Storage Configuration

Configure persistent storage for stateful applications.

Storage Classes

storage-classes.yaml yaml

---
# Fast SSD storage for databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iops: "10000"
  throughput: "250"
  fsType: ext4
  encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---
# Standard storage for general use
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  fsType: ext4
  encrypted: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---
# Shared storage for multi-pod access
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-shared
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-12345678
  directoryPerms: "700"
mountOptions:
  - tls
  - iam

Volume Snapshots

volume-snapshot.yaml yaml

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-aws-vsc
driver: ebs.csi.aws.com
deletionPolicy: Retain
parameters:
  encrypted: "true"

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: database-backup-20240108
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: database-pvc

Security Hardening

Implement security best practices for production clusters.

RBAC Configuration

rbac-example.yaml yaml

---
# Developer role with limited permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: development
  name: developer
rules:
- apiGroups: ["", "apps", "batch"]
  resources: ["pods", "deployments", "services", "jobs"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
  resources: ["pods/log", "pods/exec"]
  verbs: ["get", "list"]

---
# Bind role to user
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: development
subjects:
- kind: User
  name: [email protected]
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

---
# Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Security Policies

Enable audit logging for all API requests
Implement Pod Security Standards (replacing PSP)
Use service accounts with minimal permissions
Enable encryption at rest for etcd
Regularly rotate certificates and secrets
Implement network policies for all namespaces

Security Note: Always run containers as non-root users and use read-only root filesystems where possible.

Monitoring and Observability

Deploy comprehensive monitoring for your Kubernetes cluster.

Prometheus Stack Installation

install-monitoring.sh bash

#!/bin/bash
# Install Prometheus Operator using Helm

# Add Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Create monitoring namespace
kubectl create namespace monitoring

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=fast-ssd \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=standard \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi

# Install Loki for log aggregation
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set loki.persistence.enabled=true \
  --set loki.persistence.storageClassName=fast-ssd \
  --set loki.persistence.size=50Gi

Custom Metrics and Alerts

prometheus-alerts.yaml yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-apps
  namespace: monitoring
spec:
  groups:
  - name: kubernetes-apps
    interval: 30s
    rules:
    - alert: PodCrashLooping
      expr: |
        rate(kube_pod_container_status_restarts_total[5m]) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
        
    - alert: HighMemoryUsage
      expr: |
        (container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Container {{ $labels.container }} memory usage above 90%"
        
    - alert: PersistentVolumeSpaceLow
      expr: |
        (kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) < 0.1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "PV {{ $labels.persistentvolumeclaim }} has less than 10% free space"

Deploying Applications

Best practices for deploying production applications on Kubernetes.

Production-Ready Deployment

app-deployment.yaml yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
  labels:
    app: api
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
        version: v1.0.0
    spec:
      serviceAccountName: api-service-account
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: api
        image: myregistry.com/api:v1.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: database-url
        - name: LOG_LEVEL
          value: "info"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        volumeMounts:
        - name: config
          mountPath: /etc/api
          readOnly: true
        - name: cache
          mountPath: /var/cache/api
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
      volumes:
      - name: config
        configMap:
          name: api-config
      - name: cache
        emptyDir:
          medium: Memory
          sizeLimit: 1Gi
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - api
              topologyKey: failure-domain.beta.kubernetes.io/zone

Service and Ingress

service-ingress.yaml yaml

---
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
  labels:
    app: api
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
  selector:
    app: api

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

Scaling and Performance

Configure automatic scaling and optimize performance.

Horizontal Pod Autoscaler

hpa-example.yaml yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 5
        periodSeconds: 15
      selectPolicy: Max

Cluster Autoscaler

cluster-autoscaler.yaml yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false

Troubleshooting

Common issues and debugging techniques.

Debugging Commands

debug-commands.sh bash

# Check cluster health
kubectl get nodes
kubectl get pods --all-namespaces
kubectl cluster-info
kubectl get componentstatuses

# Debug pod issues
kubectl describe pod  -n 
kubectl logs  -n  --previous
kubectl exec -it  -n  -- /bin/sh

# Check events
kubectl get events --sort-by='.lastTimestamp' -A

# Resource usage
kubectl top nodes
kubectl top pods -A

# Network debugging
kubectl run debug --image=nicolaka/netshoot -it --rm
kubectl exec -it debug -- nslookup kubernetes.default
kubectl exec -it debug -- curl -k https://kubernetes.default:443

# Check RBAC
kubectl auth can-i --list --as=system:serviceaccount:default:default
kubectl get rolebindings,clusterrolebindings -A

# etcd health
kubectl exec -it -n kube-system etcd-master-1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint health

Pro Tip: Enable verbose logging with `-v=8` flag for kubectl commands when debugging complex issues.

Common Issues and Solutions

ImagePullBackOff: Check image name, registry credentials, and network connectivity
CrashLoopBackOff: Review container logs and ensure proper health checks
Pending Pods: Verify resource requests, node capacity, and PVC bindings
Network Issues: Check CNI plugin status, network policies, and DNS resolution
Certificate Errors: Verify certificate expiration and proper CA configuration