Home / Documentation / Architecture Guides / Auto-Scaling Setup

Auto-Scaling Setup

9 min read
Updated Jun 26, 2025

Auto-Scaling Setup

Configure intelligent auto-scaling to automatically adjust resources based on demand, ensuring optimal performance while minimizing costs.

Auto-Scaling Fundamentals

Auto-scaling automatically adjusts the number of compute resources based on predefined metrics and policies. This ensures applications can handle varying loads while optimizing costs.

Key Components

  • Scaling Groups: Collections of instances that scale together
  • Launch Templates: Configuration for new instances
  • Scaling Policies: Rules that trigger scaling actions
  • Health Checks: Ensure only healthy instances serve traffic

Scaling Strategies

Reactive Scaling

Scale based on current metrics:

  • CPU-based: Scale when CPU utilization exceeds threshold
  • Memory-based: Scale based on memory usage
  • Request-based: Scale on request count or response time
  • Custom metrics: Business-specific metrics

Predictive Scaling

Scale based on historical patterns:

  • Machine learning algorithms analyze past usage
  • Anticipate traffic spikes before they occur
  • Ideal for predictable patterns (daily, weekly cycles)
  • Reduces response lag during scale-up

Scheduled Scaling

Scale based on known events:

  • Marketing campaigns or product launches
  • Business hours vs. off-hours
  • Seasonal variations
  • Batch processing windows

Platform-Specific Implementation

AWS Auto Scaling

Configuration Steps:

  1. Create Launch Template:
    aws ec2 create-launch-template \
      --launch-template-name my-app-template \
      --version-description "v1.0" \
      --launch-template-data '{
        "ImageId": "ami-12345678",
        "InstanceType": "t3.medium",
        "SecurityGroupIds": ["sg-12345678"],
        "UserData": "base64-encoded-script"
      }'
  2. Create Auto Scaling Group:
    aws autoscaling create-auto-scaling-group \
      --auto-scaling-group-name my-app-asg \
      --launch-template '{"LaunchTemplateName":"my-app-template"}' \
      --min-size 2 \
      --max-size 10 \
      --desired-capacity 4 \
      --target-group-arns arn:aws:elasticloadbalancing:...
  3. Configure Scaling Policies:
    aws autoscaling put-scaling-policy \
      --auto-scaling-group-name my-app-asg \
      --policy-name cpu-target-tracking \
      --policy-type TargetTrackingScaling \
      --target-tracking-configuration '{
        "PredefinedMetricSpecification": {
          "PredefinedMetricType": "ASGAverageCPUUtilization"
        },
        "TargetValue": 70.0
      }'

Kubernetes Horizontal Pod Autoscaler

HPA Configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

Azure Virtual Machine Scale Sets

  • Define instance template with size, OS, and configuration
  • Set capacity limits (min, max, default)
  • Configure autoscale rules based on metrics
  • Integrate with Azure Load Balancer

Scaling Metrics and Thresholds

Common Metrics

Metric Scale-Out Threshold Scale-In Threshold Use Case
CPU Utilization 70-80% 20-30% Compute-intensive apps
Memory Usage 75-85% 30-40% Memory-intensive apps
Request Count 1000 req/min 200 req/min Web applications
Response Time > 500ms < 100ms Latency-sensitive apps
Queue Length > 100 messages < 10 messages Message processing

Custom Metrics

  • Business metrics: Active users, transactions per second
  • Application metrics: Cache hit rate, database connections
  • External metrics: Third-party API response times

Advanced Auto-Scaling Patterns

Step Scaling

Different scaling actions based on metric severity:

  • 60-70% CPU: Add 1 instance
  • 70-85% CPU: Add 2 instances
  • 85%+ CPU: Add 4 instances

Warm Pool Management

Pre-initialized instances for faster scaling:

  • Reduce application startup time
  • Maintain pool of stopped instances
  • Quick transition to running state
  • Cost-effective for predictable spikes

Multi-Tier Scaling

Coordinate scaling across application layers:

  1. Web tier scales based on request rate
  2. Application tier scales based on processing queue
  3. Cache tier scales based on memory pressure
  4. Database read replicas scale based on query load

Cost Optimization

Instance Type Selection

  • Burstable instances: For variable workloads (T3, T4g)
  • Compute optimized: For CPU-intensive tasks (C5, C6g)
  • Memory optimized: For in-memory databases (R5, X2)
  • Spot instances: For fault-tolerant workloads

Scaling Policies for Cost

  • More aggressive scale-in than scale-out
  • Time-based instance types (expensive during business hours)
  • Reserved capacity for baseline, auto-scaling for peaks
  • Spot instance integration for non-critical capacity

Monitoring and Alerting

Key Metrics to Monitor

  • Scaling activity frequency
  • Instance launch/termination success rate
  • Time to scale (launch to healthy)
  • Cost per scaling event
  • Application performance during scaling

Alert Configuration

  • Scaling failures or errors
  • Hitting max capacity limits
  • Unusual scaling patterns
  • Cost threshold breaches
  • Health check failures

Testing Auto-Scaling

Load Testing

  1. Start with baseline capacity
  2. Gradually increase load
  3. Verify scaling triggers correctly
  4. Confirm application handles new instances
  5. Test scale-in behavior

Chaos Engineering

  • Randomly terminate instances
  • Simulate availability zone failures
  • Test health check accuracy
  • Verify replacement instance provisioning

Best Practices

  • Gradual scaling: Avoid aggressive scaling that causes thrashing
  • Health checks: Ensure comprehensive health validation
  • Cooldown periods: Prevent rapid scale up/down cycles
  • Multiple metrics: Don't rely on a single metric
  • Regular reviews: Adjust thresholds based on usage patterns
  • Graceful shutdown: Allow instances to complete work before termination

Troubleshooting Common Issues

Scaling Not Triggering

  • Verify CloudWatch/metrics agent is installed
  • Check IAM permissions for auto-scaling
  • Review scaling policy configuration
  • Ensure metrics are being published

Instances Failing Health Checks

  • Increase health check grace period
  • Verify application startup sequence
  • Check security group rules
  • Review application logs

Cost Overruns

  • Set maximum instance limits
  • Configure budget alerts
  • Review scaling history
  • Optimize instance types
Note: This documentation is provided for reference purposes only. It reflects general best practices and industry-aligned guidelines, and any examples, claims, or recommendations are intended as illustrative—not definitive or binding.