Resource management in cloud: Autoscaling vs manual scaling for production workloads

Our team is running Opcenter Execution 4.2 resource management module in Azure and we’re evaluating autoscaling versus manual scaling strategies for handling production workload variations. We have predictable peak periods during shift changes (6am, 2pm, 10pm) when hundreds of operators log in simultaneously and production orders are released.

Autoscaling seems attractive for handling these peaks automatically, but I’m concerned about the configuration complexity and cost predictability. With manual scaling, we can schedule scale-up before known peaks, but we might waste resources during unexpected slow periods.

What have others found works best for MES workloads in cloud? Are there specific autoscaling configurations that work well with the resource management module’s usage patterns?

Based on experience with multiple Opcenter cloud deployments, here’s a comprehensive framework for choosing between autoscaling and manual scaling for resource management workloads.

Autoscaling Configuration: Autoscaling makes sense when your workload variation is significant and somewhat unpredictable. For Opcenter resource management, this means:

  • Peak load is 3x or more than baseline
  • Unexpected events (rush orders, equipment failures) create load spikes
  • You want to minimize costs during off-shifts or weekends

Key configuration principles:

  1. Use scheduled scaling for known peaks (shift changes) - scale out 15-20 minutes before the peak
  2. Use metric-based scaling for unexpected spikes - trigger on CPU >70% sustained for 5 minutes
  3. Set conservative scale-in policies - only scale down after 30+ minutes of low load to avoid thrashing
  4. Configure minimum instance count to handle baseline load without scaling

For your shift change scenario, a schedule-based rule works best:

  • 5:45am: Scale from 2 to 6 instances (before 6am shift)
  • 6:30am: Scale back to 3 instances (after login peak)
  • Similar patterns for 2pm and 10pm shifts

Manual Scaling Workload: Manual scaling is often more cost-effective when:

  • Load patterns are highly predictable
  • Peak-to-baseline ratio is less than 3x
  • Application warm-up time is significant
  • You want predictable monthly costs

For Opcenter, manual scaling means sizing your infrastructure to handle peak load comfortably (with 20-30% headroom) and running that configuration continuously. This seems wasteful but consider:

  • No autoscaling delays during critical shift changes
  • Predictable performance for production planning
  • Simpler troubleshooting without dynamic infrastructure
  • Often lower total cost when peak/baseline ratio is modest

Cost Predictability: This is where manual scaling has a clear advantage. With autoscaling, your monthly costs vary based on actual load patterns. If you have unexpected production increases, costs rise proportionally. Manual scaling gives you a fixed monthly infrastructure cost that’s easier to budget.

However, autoscaling provides better cost optimization if you have significant off-hours. For 24/7 manufacturing, the savings potential is limited. For facilities with clear off-shifts or weekend shutdowns, autoscaling can reduce costs by 40-50% by scaling down during those periods.

Hybrid Recommendation: For most Opcenter resource management deployments, I recommend a hybrid approach:

  1. Manually size for average load across all shifts (not peak)
  2. Use scheduled autoscaling to add capacity before known peaks
  3. Keep metric-based autoscaling as a safety net for unexpected spikes
  4. Scale down aggressively during planned maintenance windows or known low-production periods

This gives you cost predictability (baseline infrastructure is fixed), performance during peaks (scheduled scaling), and protection against unexpected events (metric-based scaling). The baseline manual sizing handles 70-80% of your load, autoscaling handles the rest.

From a cost perspective, autoscaling can actually increase costs if not configured properly. We found that the constant scaling up and down created more compute hours than just running a slightly larger manual configuration 24/7. The break-even point for us was when peak load was more than 3x the baseline load. Below that ratio, manual scaling with appropriate sizing was more cost-effective.

Another factor to consider is database connection pooling. When you autoscale application servers, you need to ensure your database can handle the increased connection count. We hit Azure SQL DTU limits during autoscale events because each new instance created 50 connections. Had to upgrade our database tier to support the peak connection count, which negated some of the cost savings from autoscaling application servers.