Real-time anomaly detection for AI workload costs – how granular is enough?

arun_lead · February 18, 2025, 9:42am

We’re trying to implement real-time cost anomaly detection for our AI inference workloads running across AWS and GCP. Right now we get daily cost reports, and by the time we spot something unusual, we’ve already burned through a few thousand dollars. Finance is pushing us to catch these anomalies faster, ideally within an hour or two of them starting.

We’ve experimented with setting up alerts on total daily spend increases (like 10% over baseline), but we’re drowning in false positives. Legitimate traffic spikes trigger alerts constantly, and the team has started ignoring them. We’ve also tried tracking cost-per-request as a unit metric, but our different AI features use different models with very different token costs, so it’s not clear what the baseline should be at an aggregate level.

Has anyone managed to get real-time anomaly detection working without overwhelming the team with noise? What level of granularity are you tracking – per model, per feature, per environment? And how do you handle the fact that AI costs are so volatile compared to traditional infrastructure?

kavita_func · February 19, 2025, 8:20am

From the finance side, the most useful metric we’ve found is cost-per-outcome rather than just cost-per-request. For example, if you’re running a recommendation engine, track cost per conversion or cost per user session, not just cost per API call. That way you can distinguish between healthy growth (more users, proportional cost increase) and efficiency problems (same users, higher cost per outcome).

elizabeth_58 · February 18, 2025, 11:15am

We had the exact same problem. Tracking aggregate spend was useless because everything scales together during normal growth. The breakthrough for us was switching to per-feature cost tracking with separate baselines. Each AI feature gets its own cost anomaly threshold based on its historical behavior and expected usage patterns. That way a legitimate spike in Feature A doesn’t mask a configuration problem in Feature B.

paulsql · February 19, 2025, 2:12pm

Thanks everyone, this is really helpful. Sounds like the consensus is per-feature or per-endpoint baselines with 15–30 minute detection windows, plus integration with deployment events to reduce false positives. We’ll start with tagging cleanup and then build out the auto-baselining. Appreciate the concrete direction.

stephanie_30 · February 18, 2025, 4:41pm

Do you have tagging in place for all your AI workloads? We struggled with attribution until we enforced consistent tagging by team, feature, and environment. Once we had that, we could set per-team baselines and alert the right people when their specific workloads went off-track. Without it, everything was just a black box of aggregate spend.

james_ninja · February 18, 2025, 1:28pm

You need to build dynamic baselines that account for time-of-day and day-of-week patterns, not static thresholds. We implemented auto-baselining where the system learns what normal spending looks like for each service, environment, and model endpoint over the past few weeks. The key is that normal spending at 2pm on a Tuesday looks very different from 3am on a Sunday, and the baseline adjusts for that.

For granularity, we track cost-per-request at the model endpoint level, not aggregated. So we know if a particular inference endpoint suddenly starts costing 40% more per call even if total volume hasn’t changed. That catches configuration problems like accidental routing to the wrong region, expanded prompt sizes, or falling back to more expensive model variants during peak load.

On detection latency, we ingest cost and usage data every 15 minutes and flag anomalies within that window. That’s fast enough to catch issues before they scale across all traffic but slow enough to avoid reacting to momentary blips. When an anomaly fires, the alert includes exactly which endpoint or feature changed, recent deployments in that area, and who owns it, so the team can correlate it to recent changes immediately. We’ve caught runaway training jobs, misconfigured model routing, and API retry storms this way before they became serious cost problems.

suresharchitect · February 19, 2025, 10:55am

We’ve also found that token-level tracking is critical for LLM workloads. A small change in prompt structure or response length can double token usage without changing the number of API calls. If you’re only tracking request volume, you’ll miss it completely. We instrument every LLM call to log input tokens, output tokens, and total cost, then aggregate that by feature.

Topic		Replies	Views
Real-time anomaly detection for LLM costs – which metrics actually matter? AI Adoption in Cloud question , finops , scaling , cost-optimization , real-time-monitoring , anomaly-detection , ai-adoption , llm , cloud-ai	7	0	February 19, 2025
Real-time anomaly detection for AI costs: worth the complexity? AI Adoption in Cloud discussion , finops , cost-optimization , real-time-monitoring , anomaly-detection , ai-adoption , llm , operating , cloud-ai	7	0	February 18, 2025
Detecting and Responding to Cloud Cost Anomalies Generic Cloud Topics use-case , real-time , alerts , cost-monitoring , incident-response , anomaly-detection , cloud-spend , cost-anomaly-detecti	6	0	January 13, 2025
How are you handling inference cost blow-ups when moving LLMs to production? AI Adoption in Cloud question , ai-adoption , llm , piloting , cloud-ai , gpu-compute , inference-costs , cost-governance	7	0	February 15, 2025
Real-time anomaly detection for energy usage visualized in IoT Analytics dashboard AWS IoT use-case , real-time-monitoring , quicksight , anomaly-detection , awsiot-24 , iot-analytics , analytics-ml , viz-dashboard , energy-management	6	0	July 4, 2025
Comparing ML-based vs rule-based anomaly detection for IoT alerts Google Cloud IoT discussion , custom-logic , operational-efficiency , vertex-ai , anomaly-detection , model-drift , asset-tracki , analytics-ml , gcpiot-25	4	0	September 24, 2025
Continuous SOX Monitoring with AI in Our Financial Close Process AI Adoption in ALM use-case , audit-trails , sox-compliance , anomaly-detection , ai-adoption , erp-ai , continuous-monitoring , operating , financial-controls	4	0	February 19, 2025
How are you structuring platform teams to support enterprise-wide AI adoption? AI Adoption in Cloud question , mlops , scaling , model-governance , ai-adoption , cloud-ai , gpu-orchestration , internal-developer-platform , agent-orchestration	7	0	February 14, 2025
Analytics cost control: cloud scaling strategies and budgeting for unpredictable workloads Mode Analytics discussion , cloud-deploy , analytics , optimization , ma-2023 , cloud-billing , cost-control , budget-overruns	4	1	November 5, 2025

Real-time anomaly detection for AI workload costs – how granular is enough?

Related topics