Kubernetes Deployment Guide
Prerequisites
Before deploying Kubernetes, ensure you meet these requirements:
Component | Minimum | Recommended | Production |
---|---|---|---|
Master Nodes | 1 node, 2 CPU, 4GB RAM | 3 nodes, 4 CPU, 8GB RAM | 3+ nodes, 8 CPU, 16GB RAM |
Worker Nodes | 1 node, 2 CPU, 4GB RAM | 3 nodes, 4 CPU, 16GB RAM | 5+ nodes, 16 CPU, 64GB RAM |
Storage | 50GB per node | 100GB SSD per node | 500GB+ NVMe per node |
Network | 1 Gbps | 10 Gbps | 25 Gbps+ |
Load Balancer | Optional | Required | HA Load Balancer |
Software Requirements
- Linux OS (Ubuntu 20.04+, CentOS 8+, or RHEL 8+)
- Container runtime (containerd 1.6+ or CRI-O 1.24+)
- kubectl CLI tool (matching cluster version)
- Network connectivity between all nodes
- Swap disabled on all nodes
Deployment Options
Choose the deployment method that best fits your requirements:
Cluster Setup with kubeadm
This guide demonstrates setting up a production Kubernetes cluster using kubeadm.
Step 1: Prepare All Nodes
#!/bin/bash # Run on all nodes (master and workers) # Update system sudo apt-get update sudo apt-get upgrade -y # Install required packages sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common # Add Kubernetes repository curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - sudo add-apt-repository "deb https://apt.kubernetes.io/ kubernetes-xenial main" # Install Kubernetes components sudo apt-get update sudo apt-get install -y kubelet=1.27.0-00 kubeadm=1.27.0-00 kubectl=1.27.0-00 sudo apt-mark hold kubelet kubeadm kubectl # Install containerd sudo apt-get install -y containerd sudo mkdir -p /etc/containerd containerd config default | sudo tee /etc/containerd/config.toml # Configure containerd for systemd cgroup sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml sudo systemctl restart containerd # Disable swap sudo swapoff -a sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab # Load kernel modules cat <
Step 2: Initialize Master Node
#!/bin/bash # Run only on the first master node # Initialize cluster with custom configuration sudo kubeadm init \ --control-plane-endpoint="k8s-api.example.com:6443" \ --upload-certs \ --pod-network-cidr=10.244.0.0/16 \ --service-cidr=10.96.0.0/12 # Configure kubectl for the admin user mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config # Save join commands kubeadm token create --print-join-command > worker-join-command.sh kubeadm init phase upload-certs --upload-certs > control-plane-join-command.sh
Step 3: Configure High Availability (Optional)
┌─────────────────────────────────────────────────────┐ │ Load Balancer (HAProxy/NGINX) │ │ k8s-api.example.com:6443 │ └────────────┬────────────┬────────────┬──────────────┘ │ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │Master 1 │ │Master 2 │ │Master 3 │ │ etcd │ │ etcd │ │ etcd │ │ API │ │ API │ │ API │ │ Sched │ │ Sched │ │ Sched │ │ CM │ │ CM │ │ CM │ └────┬────┘ └────┬────┘ └────┬────┘ │ │ │ ┌───────┴────────────┴────────────┴───────┐ │ │ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐ │Worker 1 │ │Worker 2 │ │Worker 3 │ │Worker N │ │ kubelet │ │ kubelet │ │ kubelet │ │ kubelet │ │ kube- │ │ kube- │ │ kube- │ │ kube- │ │ proxy │ │ proxy │ │ proxy │ │ proxy │ └─────────┘ └─────────┘ └─────────┘ └─────────┘
Networking Configuration
Kubernetes requires a Container Network Interface (CNI) plugin for pod networking.
Install Calico CNI
# Download and customize Calico manifest curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/tigera-operator.yaml -O curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/custom-resources.yaml -O # Modify CIDR in custom-resources.yaml to match pod-network-cidr sed -i 's/cidr: 192.168.0.0\/16/cidr: 10.244.0.0\/16/g' custom-resources.yaml # Apply Calico kubectl create -f tigera-operator.yaml kubectl create -f custom-resources.yaml # Verify installation kubectl get pods -n calico-system
Network Policies
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: api-allow-http namespace: production spec: podSelector: matchLabels: app: api policyTypes: - Ingress - Egress ingress: - from: - podSelector: matchLabels: app: frontend - namespaceSelector: matchLabels: name: monitoring ports: - protocol: TCP port: 8080 egress: - to: - podSelector: matchLabels: app: database ports: - protocol: TCP port: 5432 - to: - namespaceSelector: {} podSelector: matchLabels: k8s-app: kube-dns ports: - protocol: UDP port: 53
Storage Configuration
Configure persistent storage for stateful applications.
Storage Classes
--- # Fast SSD storage for databases apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: fast-ssd provisioner: kubernetes.io/aws-ebs parameters: type: gp3 iops: "10000" throughput: "250" fsType: ext4 encrypted: "true" reclaimPolicy: Retain allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer --- # Standard storage for general use apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: standard annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/aws-ebs parameters: type: gp3 fsType: ext4 encrypted: "true" reclaimPolicy: Delete allowVolumeExpansion: true volumeBindingMode: WaitForFirstConsumer --- # Shared storage for multi-pod access apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: efs-shared provisioner: efs.csi.aws.com parameters: provisioningMode: efs-ap fileSystemId: fs-12345678 directoryPerms: "700" mountOptions: - tls - iam
Volume Snapshots
apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: csi-aws-vsc driver: ebs.csi.aws.com deletionPolicy: Retain parameters: encrypted: "true" --- apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: database-backup-20240108 spec: volumeSnapshotClassName: csi-aws-vsc source: persistentVolumeClaimName: database-pvc
Security Hardening
Implement security best practices for production clusters.
RBAC Configuration
--- # Developer role with limited permissions apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: development name: developer rules: - apiGroups: ["", "apps", "batch"] resources: ["pods", "deployments", "services", "jobs"] verbs: ["get", "list", "watch", "create", "update", "patch"] - apiGroups: [""] resources: ["pods/log", "pods/exec"] verbs: ["get", "list"] --- # Bind role to user apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: developer-binding namespace: development subjects: - kind: User name: [email protected] apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: developer apiGroup: rbac.authorization.k8s.io --- # Pod Security Standards apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted
Security Policies
- Enable audit logging for all API requests
- Implement Pod Security Standards (replacing PSP)
- Use service accounts with minimal permissions
- Enable encryption at rest for etcd
- Regularly rotate certificates and secrets
- Implement network policies for all namespaces
Monitoring and Observability
Deploy comprehensive monitoring for your Kubernetes cluster.
Prometheus Stack Installation
#!/bin/bash # Install Prometheus Operator using Helm # Add Prometheus community Helm repository helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Create monitoring namespace kubectl create namespace monitoring # Install kube-prometheus-stack helm install prometheus prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set prometheus.prometheusSpec.retention=30d \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=fast-ssd \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \ --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=standard \ --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi # Install Loki for log aggregation helm repo add grafana https://grafana.github.io/helm-charts helm install loki grafana/loki-stack \ --namespace monitoring \ --set loki.persistence.enabled=true \ --set loki.persistence.storageClassName=fast-ssd \ --set loki.persistence.size=50Gi
Custom Metrics and Alerts
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: kubernetes-apps namespace: monitoring spec: groups: - name: kubernetes-apps interval: 30s rules: - alert: PodCrashLooping expr: | rate(kube_pod_container_status_restarts_total[5m]) > 0 for: 5m labels: severity: critical annotations: summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping" - alert: HighMemoryUsage expr: | (container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.9 for: 5m labels: severity: warning annotations: summary: "Container {{ $labels.container }} memory usage above 90%" - alert: PersistentVolumeSpaceLow expr: | (kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) < 0.1 for: 5m labels: severity: critical annotations: summary: "PV {{ $labels.persistentvolumeclaim }} has less than 10% free space"
Deploying Applications
Best practices for deploying production applications on Kubernetes.
Production-Ready Deployment
apiVersion: apps/v1 kind: Deployment metadata: name: api-server namespace: production labels: app: api version: v1.0.0 spec: replicas: 3 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 maxUnavailable: 0 selector: matchLabels: app: api template: metadata: labels: app: api version: v1.0.0 spec: serviceAccountName: api-service-account securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 containers: - name: api image: myregistry.com/api:v1.0.0 imagePullPolicy: IfNotPresent ports: - containerPort: 8080 name: http protocol: TCP env: - name: DATABASE_URL valueFrom: secretKeyRef: name: api-secrets key: database-url - name: LOG_LEVEL value: "info" resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: http initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 readinessProbe: httpGet: path: /ready port: http initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 3 successThreshold: 1 failureThreshold: 3 volumeMounts: - name: config mountPath: /etc/api readOnly: true - name: cache mountPath: /var/cache/api securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: - ALL volumes: - name: config configMap: name: api-config - name: cache emptyDir: medium: Memory sizeLimit: 1Gi topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: api affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: app operator: In values: - api topologyKey: failure-domain.beta.kubernetes.io/zone
Service and Ingress
--- apiVersion: v1 kind: Service metadata: name: api-service namespace: production labels: app: api spec: type: ClusterIP ports: - port: 80 targetPort: http protocol: TCP name: http selector: app: api --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: api-ingress namespace: production annotations: cert-manager.io/cluster-issuer: letsencrypt-prod nginx.ingress.kubernetes.io/rate-limit: "100" nginx.ingress.kubernetes.io/ssl-redirect: "true" nginx.ingress.kubernetes.io/force-ssl-redirect: "true" spec: ingressClassName: nginx tls: - hosts: - api.example.com secretName: api-tls rules: - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: api-service port: number: 80
Scaling and Performance
Configure automatic scaling and optimize performance.
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa namespace: production spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-server minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Percent value: 10 periodSeconds: 60 - type: Pods value: 2 periodSeconds: 60 selectPolicy: Min scaleUp: stabilizationWindowSeconds: 60 policies: - type: Percent value: 100 periodSeconds: 15 - type: Pods value: 5 periodSeconds: 15 selectPolicy: Max
Cluster Autoscaler
apiVersion: apps/v1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system spec: template: spec: containers: - image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0 name: cluster-autoscaler command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production - --balance-similar-node-groups - --skip-nodes-with-system-pods=false
Troubleshooting
Common issues and debugging techniques.
Debugging Commands
# Check cluster health kubectl get nodes kubectl get pods --all-namespaces kubectl cluster-info kubectl get componentstatuses # Debug pod issues kubectl describe pod-n kubectl logs -n --previous kubectl exec -it -n -- /bin/sh # Check events kubectl get events --sort-by='.lastTimestamp' -A # Resource usage kubectl top nodes kubectl top pods -A # Network debugging kubectl run debug --image=nicolaka/netshoot -it --rm kubectl exec -it debug -- nslookup kubernetes.default kubectl exec -it debug -- curl -k https://kubernetes.default:443 # Check RBAC kubectl auth can-i --list --as=system:serviceaccount:default:default kubectl get rolebindings,clusterrolebindings -A # etcd health kubectl exec -it -n kube-system etcd-master-1 -- etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ endpoint health
Common Issues and Solutions
- ImagePullBackOff: Check image name, registry credentials, and network connectivity
- CrashLoopBackOff: Review container logs and ensure proper health checks
- Pending Pods: Verify resource requests, node capacity, and PVC bindings
- Network Issues: Check CNI plugin status, network policies, and DNS resolution
- Certificate Errors: Verify certificate expiration and proper CA configuration