Kubernetes Deployment Guide
Prerequisites
Before deploying Kubernetes, ensure you meet these requirements:
| Component | Minimum | Recommended | Production |
|---|---|---|---|
| Master Nodes | 1 node, 2 CPU, 4GB RAM | 3 nodes, 4 CPU, 8GB RAM | 3+ nodes, 8 CPU, 16GB RAM |
| Worker Nodes | 1 node, 2 CPU, 4GB RAM | 3 nodes, 4 CPU, 16GB RAM | 5+ nodes, 16 CPU, 64GB RAM |
| Storage | 50GB per node | 100GB SSD per node | 500GB+ NVMe per node |
| Network | 1 Gbps | 10 Gbps | 25 Gbps+ |
| Load Balancer | Optional | Required | HA Load Balancer |
Software Requirements
- Linux OS (Ubuntu 20.04+, CentOS 8+, or RHEL 8+)
- Container runtime (containerd 1.6+ or CRI-O 1.24+)
- kubectl CLI tool (matching cluster version)
- Network connectivity between all nodes
- Swap disabled on all nodes
Deployment Options
Choose the deployment method that best fits your requirements:
Cluster Setup with kubeadm
This guide demonstrates setting up a production Kubernetes cluster using kubeadm.
Step 1: Prepare All Nodes
#!/bin/bash # Run on all nodes (master and workers) # Update system sudo apt-get update sudo apt-get upgrade -y # Install required packages sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common # Add Kubernetes repository curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - sudo add-apt-repository "deb https://apt.kubernetes.io/ kubernetes-xenial main" # Install Kubernetes components sudo apt-get update sudo apt-get install -y kubelet=1.27.0-00 kubeadm=1.27.0-00 kubectl=1.27.0-00 sudo apt-mark hold kubelet kubeadm kubectl # Install containerd sudo apt-get install -y containerd sudo mkdir -p /etc/containerd containerd config default | sudo tee /etc/containerd/config.toml # Configure containerd for systemd cgroup sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/g' /etc/containerd/config.toml sudo systemctl restart containerd # Disable swap sudo swapoff -a sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab # Load kernel modules cat <
Step 2: Initialize Master Node
#!/bin/bash # Run only on the first master node # Initialize cluster with custom configuration sudo kubeadm init \ --control-plane-endpoint="k8s-api.example.com:6443" \ --upload-certs \ --pod-network-cidr=10.244.0.0/16 \ --service-cidr=10.96.0.0/12 # Configure kubectl for the admin user mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config # Save join commands kubeadm token create --print-join-command > worker-join-command.sh kubeadm init phase upload-certs --upload-certs > control-plane-join-command.sh
Step 3: Configure High Availability (Optional)
┌─────────────────────────────────────────────────────┐
│ Load Balancer (HAProxy/NGINX) │
│ k8s-api.example.com:6443 │
└────────────┬────────────┬────────────┬──────────────┘
│ │ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│Master 1 │ │Master 2 │ │Master 3 │
│ etcd │ │ etcd │ │ etcd │
│ API │ │ API │ │ API │
│ Sched │ │ Sched │ │ Sched │
│ CM │ │ CM │ │ CM │
└────┬────┘ └────┬────┘ └────┬────┘
│ │ │
┌───────┴────────────┴────────────┴───────┐
│ │
┌────▼────┐ ┌────▼────┐ ┌────▼────┐ ┌────▼────┐
│Worker 1 │ │Worker 2 │ │Worker 3 │ │Worker N │
│ kubelet │ │ kubelet │ │ kubelet │ │ kubelet │
│ kube- │ │ kube- │ │ kube- │ │ kube- │
│ proxy │ │ proxy │ │ proxy │ │ proxy │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Networking Configuration
Kubernetes requires a Container Network Interface (CNI) plugin for pod networking.
Install Calico CNI
# Download and customize Calico manifest curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/tigera-operator.yaml -O curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/custom-resources.yaml -O # Modify CIDR in custom-resources.yaml to match pod-network-cidr sed -i 's/cidr: 192.168.0.0\/16/cidr: 10.244.0.0\/16/g' custom-resources.yaml # Apply Calico kubectl create -f tigera-operator.yaml kubectl create -f custom-resources.yaml # Verify installation kubectl get pods -n calico-system
Network Policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-allow-http
namespace: production
spec:
podSelector:
matchLabels:
app: api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
- to:
- namespaceSelector: {}
podSelector:
matchLabels:
k8s-app: kube-dns
ports:
- protocol: UDP
port: 53
Storage Configuration
Configure persistent storage for stateful applications.
Storage Classes
---
# Fast SSD storage for databases
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
iops: "10000"
throughput: "250"
fsType: ext4
encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# Standard storage for general use
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: standard
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
fsType: ext4
encrypted: "true"
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
---
# Shared storage for multi-pod access
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: efs-shared
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: fs-12345678
directoryPerms: "700"
mountOptions:
- tls
- iam
Volume Snapshots
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-aws-vsc
driver: ebs.csi.aws.com
deletionPolicy: Retain
parameters:
encrypted: "true"
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: database-backup-20240108
spec:
volumeSnapshotClassName: csi-aws-vsc
source:
persistentVolumeClaimName: database-pvc
Security Hardening
Implement security best practices for production clusters.
RBAC Configuration
--- # Developer role with limited permissions apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: namespace: development name: developer rules: - apiGroups: ["", "apps", "batch"] resources: ["pods", "deployments", "services", "jobs"] verbs: ["get", "list", "watch", "create", "update", "patch"] - apiGroups: [""] resources: ["pods/log", "pods/exec"] verbs: ["get", "list"] --- # Bind role to user apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: developer-binding namespace: development subjects: - kind: User name: [email protected] apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: developer apiGroup: rbac.authorization.k8s.io --- # Pod Security Standards apiVersion: v1 kind: Namespace metadata: name: production labels: pod-security.kubernetes.io/enforce: restricted pod-security.kubernetes.io/audit: restricted pod-security.kubernetes.io/warn: restricted
Security Policies
- Enable audit logging for all API requests
- Implement Pod Security Standards (replacing PSP)
- Use service accounts with minimal permissions
- Enable encryption at rest for etcd
- Regularly rotate certificates and secrets
- Implement network policies for all namespaces
Monitoring and Observability
Deploy comprehensive monitoring for your Kubernetes cluster.
Prometheus Stack Installation
#!/bin/bash # Install Prometheus Operator using Helm # Add Prometheus community Helm repository helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Create monitoring namespace kubectl create namespace monitoring # Install kube-prometheus-stack helm install prometheus prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --set prometheus.prometheusSpec.retention=30d \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=fast-ssd \ --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=100Gi \ --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=standard \ --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi # Install Loki for log aggregation helm repo add grafana https://grafana.github.io/helm-charts helm install loki grafana/loki-stack \ --namespace monitoring \ --set loki.persistence.enabled=true \ --set loki.persistence.storageClassName=fast-ssd \ --set loki.persistence.size=50Gi
Custom Metrics and Alerts
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kubernetes-apps
namespace: monitoring
spec:
groups:
- name: kubernetes-apps
interval: 30s
rules:
- alert: PodCrashLooping
expr: |
rate(kube_pod_container_status_restarts_total[5m]) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
- alert: HighMemoryUsage
expr: |
(container_memory_working_set_bytes / container_spec_memory_limit_bytes) > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.container }} memory usage above 90%"
- alert: PersistentVolumeSpaceLow
expr: |
(kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes) < 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "PV {{ $labels.persistentvolumeclaim }} has less than 10% free space"
Deploying Applications
Best practices for deploying production applications on Kubernetes.
Production-Ready Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
labels:
app: api
version: v1.0.0
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
version: v1.0.0
spec:
serviceAccountName: api-service-account
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: api
image: myregistry.com/api:v1.0.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
protocol: TCP
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
- name: LOG_LEVEL
value: "info"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
volumeMounts:
- name: config
mountPath: /etc/api
readOnly: true
- name: cache
mountPath: /var/cache/api
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumes:
- name: config
configMap:
name: api-config
- name: cache
emptyDir:
medium: Memory
sizeLimit: 1Gi
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api
topologyKey: failure-domain.beta.kubernetes.io/zone
Service and Ingress
---
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: production
labels:
app: api
spec:
type: ClusterIP
ports:
- port: 80
targetPort: http
protocol: TCP
name: http
selector:
app: api
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: production
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
Scaling and Performance
Configure automatic scaling and optimize performance.
Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
- type: Pods
value: 2
periodSeconds: 60
selectPolicy: Min
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 5
periodSeconds: 15
selectPolicy: Max
Cluster Autoscaler
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/production
- --balance-similar-node-groups
- --skip-nodes-with-system-pods=false
Troubleshooting
Common issues and debugging techniques.
Debugging Commands
# Check cluster health kubectl get nodes kubectl get pods --all-namespaces kubectl cluster-info kubectl get componentstatuses # Debug pod issues kubectl describe pod-n kubectl logs -n --previous kubectl exec -it -n -- /bin/sh # Check events kubectl get events --sort-by='.lastTimestamp' -A # Resource usage kubectl top nodes kubectl top pods -A # Network debugging kubectl run debug --image=nicolaka/netshoot -it --rm kubectl exec -it debug -- nslookup kubernetes.default kubectl exec -it debug -- curl -k https://kubernetes.default:443 # Check RBAC kubectl auth can-i --list --as=system:serviceaccount:default:default kubectl get rolebindings,clusterrolebindings -A # etcd health kubectl exec -it -n kube-system etcd-master-1 -- etcdctl \ --endpoints=https://127.0.0.1:2379 \ --cacert=/etc/kubernetes/pki/etcd/ca.crt \ --cert=/etc/kubernetes/pki/etcd/server.crt \ --key=/etc/kubernetes/pki/etcd/server.key \ endpoint health
Common Issues and Solutions
- ImagePullBackOff: Check image name, registry credentials, and network connectivity
- CrashLoopBackOff: Review container logs and ensure proper health checks
- Pending Pods: Verify resource requests, node capacity, and PVC bindings
- Network Issues: Check CNI plugin status, network policies, and DNS resolution
- Certificate Errors: Verify certificate expiration and proper CA configuration