Home / Documentation / Infrastructure Management / Kubernetes Deployment Guide

Kubernetes Deployment Guide

14 min read
Updated Jun 18, 2025

Prerequisites

Before deploying Kubernetes, ensure you meet these requirements:

Component Minimum Recommended Production
Master Nodes 1 node, 2 CPU, 4GB RAM 3 nodes, 4 CPU, 8GB RAM 3+ nodes, 8 CPU, 16GB RAM
Worker Nodes 1 node, 2 CPU, 4GB RAM 3 nodes, 4 CPU, 16GB RAM 5+ nodes, 16 CPU, 64GB RAM
Storage 50GB per node 100GB SSD per node 500GB+ NVMe per node
Network 1 Gbps 10 Gbps 25 Gbps+
Load Balancer Optional Required HA Load Balancer

Software Requirements

  • Linux OS (Ubuntu 20.04+, CentOS 8+, or RHEL 8+)
  • Container runtime (containerd 1.6+ or CRI-O 1.24+)
  • kubectl CLI tool (matching cluster version)
  • Network connectivity between all nodes
  • Swap disabled on all nodes

Deployment Options

Choose the deployment method that best fits your requirements:

Managed Kubernetes

Cloud provider managed services like EKS, GKE, or AKS.

✓ Automated updates and patches
✓ Integrated with cloud services
✓ Built-in high availability
✗ Vendor lock-in
✗ Less control over configuration

kubeadm

Official Kubernetes deployment tool for bare metal or VMs.

✓ Full control and customization
✓ Platform agnostic
✓ Production-grade clusters
✗ Manual management required
✗ Complex initial setup

Kubespray

Ansible-based deployment for production clusters.

✓ Highly customizable
✓ Supports multiple OS
✓ Automated deployment
✗ Ansible knowledge required
✗ Longer deployment time

Cluster Setup with kubeadm

This guide demonstrates setting up a production Kubernetes cluster using kubeadm.

Step 1: Prepare All Nodes

prepare-nodes.sh bash
#!/bin/bash
# Run on all nodes (master and workers)

# Update system
sudo apt-get update
sudo apt-get upgrade -y

# Install required packages
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common

# Add Kubernetes repository
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo add-apt-repository "deb https://apt.kubernetes.io/ kubernetes-xenial main"

# Install Kubernetes components
sudo apt-get update
sudo apt-get install -y kubelet=1.27.0-00 kubeadm=1.27.0-00 kubectl=1.27.0-00
sudo apt-mark hold kubelet kubeadm kubectl

# Install containerd
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

# Configure containerd for systemd cgroup
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml
sudo systemctl restart containerd

# Disable swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Load kernel modules
cat <
          

Step 2: Initialize Master Node

init-master.sh bash
#!/bin/bash
# Run only on the first master node

# Initialize cluster with custom configuration
sudo kubeadm init \
  --control-plane-endpoint="k8s-api.example.com:6443" \
  --upload-certs \
  --pod-network-cidr=10.244.0.0/16 \
  --service-cidr=10.96.0.0/12

# Configure kubectl for the admin user
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# Save join commands
kubeadm token create --print-join-command > worker-join-command.sh
kubeadm init phase upload-certs --upload-certs > control-plane-join-command.sh
Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You can now join any number of control-plane nodes by copying certificate authorities and service account keys on each node and then running the following as root: kubeadm join k8s-api.example.com:6443 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:1234... \ --control-plane --certificate-key 5678...

Step 3: Configure High Availability (Optional)

┌─────────────────────────────────────────────────────┐
│              Load Balancer (HAProxy/NGINX)          │
│                  k8s-api.example.com:6443           │
└────────────┬────────────┬────────────┬──────────────┘
             │            │            │
        ┌────▼────┐  ┌────▼────┐  ┌────▼────┐
        │Master 1 │  │Master 2 │  │Master 3 │
        │ etcd    │  │ etcd    │  │ etcd    │
        │ API     │  │ API     │  │ API     │
        │ Sched   │  │ Sched   │  │ Sched   │
        │ CM      │  │ CM      │  │ CM      │
        └────┬────┘  └────┬────┘  └────┬────┘
             │            │            │
     ┌───────┴────────────┴────────────┴───────┐
     │                                         │
┌────▼────┐  ┌────▼────┐  ┌────▼────┐  ┌────▼────┐
│Worker 1 │  │Worker 2 │  │Worker 3 │  │Worker N │
│ kubelet │  │ kubelet │  │ kubelet │  │ kubelet │
│ kube-   │  │ kube-   │  │ kube-   │  │ kube-   │
│ proxy   │  │ proxy   │  │ proxy   │  │ proxy   │
└─────────┘  └─────────┘  └─────────┘  └─────────┘

Networking Configuration

Kubernetes requires a Container Network Interface (CNI) plugin for pod networking.

Install Calico CNI

install-calico.yaml yaml
# Download and customize Calico manifest
curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/tigera-operator.yaml -O
curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/custom-resources.yaml -O

# Modify CIDR in custom-resources.yaml to match pod-network-cidr
sed -i 's/cidr: 192.168.0.0\/16/cidr: 10.244.0.0\/16/g' custom-resources.yaml

# Apply Calico
kubectl create -f tigera-operator.yaml
kubectl create -f custom-resources.yaml

# Verify installation
kubectl get pods -n calico-system

Network Policies

network-policy-example.yaml yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-allow-http
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: frontend
    - namespaceSelector:
        matchLabels:
          name: monitoring
    ports:
    - protocol: TCP
      port: 8080
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: database
    ports:
    - protocol: TCP
      port: 5432
  - to:
    - namespaceSelector: {}
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

Storage Configuration

Configure persistent storage for stateful applications.

Storage Classes

storage-classes.yaml yaml
---
# Fast SSD storage for databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  iops: "10000"
  throughput: "250"
  fsType: ext4
  encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---
# Standard storage for general use
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  fsType: ext4
  encrypted: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer

---
# Shared storage for multi-pod access
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: efs-shared
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-12345678
  directoryPerms: "700"
mountOptions:
  - tls
  - iam

Volume Snapshots

volume-snapshot.yaml yaml
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: csi-aws-vsc
driver: ebs.csi.aws.com
deletionPolicy: Retain
parameters:
  encrypted: "true"

---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: database-backup-20240108
spec:
  volumeSnapshotClassName: csi-aws-vsc
  source:
    persistentVolumeClaimName: database-pvc

Security Hardening

Implement security best practices for production clusters.

RBAC Configuration

rbac-example.yaml yaml
---
# Developer role with limited permissions
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: development
  name: developer
rules:
- apiGroups: ["", "apps", "batch"]
  resources: ["pods", "deployments", "services", "jobs"]
  verbs: ["get", "list", "watch", "create", "update", "patch"]
- apiGroups: [""]
  resources: ["pods/log", "pods/exec"]
  verbs: ["get", "list"]

---
# Bind role to user
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: development
subjects:
- kind: User
  name: [email protected]
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: Role
  name: developer
  apiGroup: rbac.authorization.k8s.io

---
# Pod Security Standards
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted

Security Policies

  • Enable audit logging for all API requests
  • Implement Pod Security Standards (replacing PSP)
  • Use service accounts with minimal permissions
  • Enable encryption at rest for etcd
  • Regularly rotate certificates and secrets
  • Implement network policies for all namespaces
Security Note: Always run containers as non-root users and use read-only root filesystems where possible.

Monitoring and Observability

Deploy comprehensive monitoring for your Kubernetes cluster.

Prometheus Stack Installation

install-monitoring.sh bash
#!/bin/bash
# Install Prometheus Operator using Helm

# Add Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Create monitoring namespace
kubectl create namespace monitoring

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=fast-ssd \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=standard \
  --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi

# Install Loki for log aggregation
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set loki.persistence.enabled=true \
  --set loki.persistence.storageClassName=fast-ssd \
  --set loki.persistence.size=50Gi

Custom Metrics and Alerts

prometheus-alerts.yaml yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-apps
  namespace: monitoring
spec:
  groups:
  - name: kubernetes-apps
    interval: 30s
    rules:
    - alert: PodCrashLooping
      expr: |
        rate(kube_pod_container_status_restarts_total[5m]) > 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
        
    - alert: HighMemoryUsage
      expr: |
        (container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.9
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Container {{ $labels.container }} memory usage above 90%"
        
    - alert: PersistentVolumeSpaceLow
      expr: |
        (kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) < 0.1
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "PV {{ $labels.persistentvolumeclaim }} has less than 10% free space"

Deploying Applications

Best practices for deploying production applications on Kubernetes.

Production-Ready Deployment

app-deployment.yaml yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  namespace: production
  labels:
    app: api
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
        version: v1.0.0
    spec:
      serviceAccountName: api-service-account
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
      - name: api
        image: myregistry.com/api:v1.0.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: api-secrets
              key: database-url
        - name: LOG_LEVEL
          value: "info"
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: http
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        volumeMounts:
        - name: config
          mountPath: /etc/api
          readOnly: true
        - name: cache
          mountPath: /var/cache/api
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
      volumes:
      - name: config
        configMap:
          name: api-config
      - name: cache
        emptyDir:
          medium: Memory
          sizeLimit: 1Gi
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - api
              topologyKey: failure-domain.beta.kubernetes.io/zone

Service and Ingress

service-ingress.yaml yaml
---
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
  labels:
    app: api
spec:
  type: ClusterIP
  ports:
  - port: 80
    targetPort: http
    protocol: TCP
    name: http
  selector:
    app: api

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

Scaling and Performance

Configure automatic scaling and optimize performance.

Horizontal Pod Autoscaler

hpa-example.yaml yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
      - type: Pods
        value: 2
        periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 5
        periodSeconds: 15
      selectPolicy: Max

Cluster Autoscaler

cluster-autoscaler.yaml yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
      - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
        name: cluster-autoscaler
        command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --expander=least-waste
        - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
        - --balance-similar-node-groups
        - --skip-nodes-with-system-pods=false

Troubleshooting

Common issues and debugging techniques.

Debugging Commands

debug-commands.sh bash
# Check cluster health
kubectl get nodes
kubectl get pods --all-namespaces
kubectl cluster-info
kubectl get componentstatuses

# Debug pod issues
kubectl describe pod  -n 
kubectl logs  -n  --previous
kubectl exec -it  -n  -- /bin/sh

# Check events
kubectl get events --sort-by='.lastTimestamp' -A

# Resource usage
kubectl top nodes
kubectl top pods -A

# Network debugging
kubectl run debug --image=nicolaka/netshoot -it --rm
kubectl exec -it debug -- nslookup kubernetes.default
kubectl exec -it debug -- curl -k https://kubernetes.default:443

# Check RBAC
kubectl auth can-i --list --as=system:serviceaccount:default:default
kubectl get rolebindings,clusterrolebindings -A

# etcd health
kubectl exec -it -n kube-system etcd-master-1 -- etcdctl \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key \
  endpoint health
Pro Tip: Enable verbose logging with `-v=8` flag for kubectl commands when debugging complex issues.

Common Issues and Solutions

  • ImagePullBackOff: Check image name, registry credentials, and network connectivity
  • CrashLoopBackOff: Review container logs and ensure proper health checks
  • Pending Pods: Verify resource requests, node capacity, and PVC bindings
  • Network Issues: Check CNI plugin status, network policies, and DNS resolution
  • Certificate Errors: Verify certificate expiration and proper CA configuration
Note: This documentation is provided for reference purposes only. It reflects general best practices and industry-aligned guidelines, and any examples, claims, or recommendations are intended as illustrative—not definitive or binding.