Container Orchestration
Container Orchestration
Master container orchestration platforms and patterns for deploying, scaling, and managing containerized applications in production.
Container Orchestration Overview
Container orchestration automates the deployment, management, scaling, and networking of containers, enabling efficient operation of containerized applications at scale.
Key Orchestration Features
- Service Discovery: Automatic container location and communication
- Load Balancing: Distribute traffic across containers
- Scaling: Automatic scaling based on demand
- Self-Healing: Restart failed containers automatically
- Rolling Updates: Zero-downtime deployments
- Secret Management: Secure handling of sensitive data
Orchestration Platforms Comparison
Platform | Best For | Complexity | Key Features |
---|---|---|---|
Kubernetes | Large-scale production | High | Extensible, ecosystem, multi-cloud |
Docker Swarm | Simple deployments | Low | Native Docker integration |
Amazon ECS | AWS workloads | Medium | AWS integration, Fargate serverless |
HashiCorp Nomad | Multi-workload | Medium | Supports non-container workloads |
Docker Swarm
Swarm Initialization
# Initialize swarm manager
docker swarm init --advertise-addr 10.0.1.10
# Join worker nodes
docker swarm join --token SWMTKN-1-xxxxx 10.0.1.10:2377
# Join additional managers
docker swarm join-token manager
# Deploy stack
docker stack deploy -c docker-compose.yml myapp
# Scale service
docker service scale myapp_web=5
# Update service
docker service update --image myapp:v2 myapp_web
Docker Compose for Swarm
# docker-compose.yml
version: '3.8'
services:
web:
image: myapp:latest
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
placement:
constraints:
- node.role == worker
- node.labels.zone == us-east
ports:
- "80:8080"
networks:
- webnet
secrets:
- db_password
configs:
- source: app_config
target: /app/config.json
db:
image: postgres:14
deploy:
replicas: 1
placement:
constraints:
- node.labels.type == database
environment:
POSTGRES_PASSWORD_FILE: /run/secrets/db_password
volumes:
- db-data:/var/lib/postgresql/data
networks:
- webnet
secrets:
- db_password
networks:
webnet:
driver: overlay
driver_opts:
encrypted: "true"
volumes:
db-data:
driver: local
secrets:
db_password:
external: true
configs:
app_config:
file: ./config.json
Amazon ECS
ECS Task Definition
{
"family": "myapp",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"containerDefinitions": [
{
"name": "app",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest",
"portMappings": [
{
"containerPort": 8080,
"protocol": "tcp"
}
],
"essential": true,
"environment": [
{
"name": "NODE_ENV",
"value": "production"
}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/myapp",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8080/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
}
}
],
"taskRoleArn": "arn:aws:iam::123456789:role/ecsTaskRole",
"executionRoleArn": "arn:aws:iam::123456789:role/ecsTaskExecutionRole"
}
ECS Service with Auto Scaling
# Create ECS service
aws ecs create-service \
--cluster production \
--service-name myapp \
--task-definition myapp:1 \
--desired-count 3 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxxxx],securityGroups=[sg-xxxxx],assignPublicIp=ENABLED}" \
--load-balancers targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=app,containerPort=8080
# Configure auto scaling
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/production/myapp \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 3 \
--max-capacity 20
# Add scaling policy
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/production/myapp \
--policy-name cpu-scaling \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
}
}'
HashiCorp Nomad
Nomad Job Specification
# myapp.nomad
job "myapp" {
datacenters = ["dc1"]
type = "service"
group "web" {
count = 3
update {
max_parallel = 1
min_healthy_time = "30s"
healthy_deadline = "5m"
auto_revert = true
canary = 1
}
network {
port "http" {
to = 8080
}
}
service {
name = "myapp-web"
port = "http"
check {
type = "http"
path = "/health"
interval = "10s"
timeout = "2s"
}
connect {
sidecar_service {}
}
}
task "app" {
driver = "docker"
config {
image = "myapp:latest"
ports = ["http"]
auth {
username = "${DOCKER_USER}"
password = "${DOCKER_PASS}"
}
}
env {
NODE_ENV = "production"
PORT = "${NOMAD_PORT_http}"
}
resources {
cpu = 500
memory = 256
}
template {
data = <<EOH
DB_HOST={{ with service "postgres" }}{{ with index . 0 }}{{ .Address }}{{ end }}{{ end }}
DB_PORT={{ with service "postgres" }}{{ with index . 0 }}{{ .Port }}{{ end }}{{ end }}
DB_PASSWORD={{ with secret "database/creds/myapp" }}{{ .Data.password }}{{ end }}
EOH
destination = "local/env"
env = true
}
}
}
}
Service Mesh Integration
Istio Service Mesh
# Install Istio
istioctl install --set profile=production
# Enable sidecar injection
kubectl label namespace production istio-injection=enabled
# Virtual Service configuration
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- match:
- headers:
x-version:
exact: v2
route:
- destination:
host: myapp
subset: v2
- route:
- destination:
host: myapp
subset: v1
weight: 90
- destination:
host: myapp
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
http1MaxPendingRequests: 10
http2MaxRequests: 100
loadBalancer:
simple: LEAST_REQUEST
outlierDetection:
consecutiveErrors: 5
interval: 30s
baseEjectionTime: 30s
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
Multi-Cluster Orchestration
Federation Setup
# Kubernetes Federation v2 (KubeFed)
# Install KubeFed
helm repo add kubefed-charts https://raw.githubusercontent.com/kubernetes-sigs/kubefed/master/charts
helm install kubefed kubefed-charts/kubefed --namespace kube-federation-system --create-namespace
# Join clusters
kubefedctl join cluster1 --cluster-context cluster1 --host-cluster-context cluster1
kubefedctl join cluster2 --cluster-context cluster2 --host-cluster-context cluster1
# Federated deployment
apiVersion: types.kubefed.io/v1beta1
kind: FederatedDeployment
metadata:
name: myapp
namespace: production
spec:
template:
metadata:
labels:
app: myapp
spec:
replicas: 6
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: app
image: myapp:latest
placement:
clusters:
- name: cluster1
- name: cluster2
overrides:
- clusterName: cluster1
clusterOverrides:
- path: "/spec/replicas"
value: 4
- clusterName: cluster2
clusterOverrides:
- path: "/spec/replicas"
value: 2
Container Security
Runtime Security
# Falco runtime security rules
- rule: Unauthorized Process
desc: Detect unauthorized process execution
condition: >
spawned_process and container and
not proc.name in (allowed_processes) and
not container.image.repository in (trusted_repos)
output: >
Unauthorized process started
(user=%user.name command=%proc.cmdline container=%container.name image=%container.image.repository)
priority: WARNING
# Pod Security Policy
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: restricted
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
supplementalGroups:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: true
Monitoring and Observability
Container Metrics
# Prometheus configuration
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
# Grafana dashboard query examples
# Container CPU usage
sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# Container memory usage
sum(container_memory_usage_bytes) by (pod)
# Container restart count
sum(increase(kube_pod_container_status_restarts_total[1h])) by (pod)
Disaster Recovery
Backup and Restore
# Velero backup configuration
velero backup create prod-backup \
--include-namespaces production \
--include-resources deployments,services,configmaps,secrets \
--ttl 720h
# Disaster recovery runbook
1. Verify backup integrity
velero backup describe prod-backup
2. Prepare recovery cluster
kubectl create namespace production
3. Restore application
velero restore create --from-backup prod-backup
4. Verify restoration
kubectl get all -n production
5. Update DNS/Load balancer
kubectl patch service myapp -p '{"spec":{"type":"LoadBalancer"}}'
6. Validate application
curl https://recovery.example.com/health
Performance Optimization
Resource Optimization
- Right-sizing: Use VPA to optimize resource requests
- Node affinity: Place containers on appropriate nodes
- Pod disruption budgets: Maintain availability during updates
- Horizontal scaling: Scale based on actual metrics
Network Optimization
# Optimize container networking
# Use host networking for high-performance apps
apiVersion: v1
kind: Pod
spec:
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
# Configure service mesh for efficient routing
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
http:
- timeout: 30s
retries:
attempts: 3
perTryTimeout: 10s
retryOn: 5xx
Cost Management
Cost Optimization Strategies
- Use spot instances for non-critical workloads
- Implement cluster autoscaling
- Schedule batch jobs during off-peak
- Use multi-tenant clusters
- Optimize container images
Best Practices
- Image Management: Use minimal base images, scan for vulnerabilities
- Configuration: Externalize config, use secrets management
- Monitoring: Implement comprehensive observability
- Security: Apply least privilege, use network policies
- Updates: Use rolling updates, test in staging
- Documentation: Maintain runbooks and architecture diagrams
Related Resources
Note: This documentation is provided for reference purposes only. It reflects general best practices and industry-aligned guidelines, and any examples, claims, or recommendations are intended as illustrative—not definitive or binding.