Cloud Monitoring alerts not triggering for failed ERP API requests to inventory module

naveenmaster · July 14, 2025, 6:01am

Our ERP integration makes thousands of API calls daily to external suppliers and internal microservices. We’ve configured Cloud Monitoring alerts to notify us when API requests fail, but alerts aren’t triggering even though we can see failed requests in the logs.

The log-based metric is configured to count log entries with severity ERROR and jsonPayload.status >= 400. The alerting policy checks if the metric exceeds 10 failures in 5 minutes. We’ve manually verified that failed API calls are being logged correctly with the right severity and status codes.


filter: severity="ERROR" AND jsonPayload.apiStatus>=400
metric.type: "logging.googleapis.com/user/erp_api_failures"

This is critical for incident detection - we’re missing failures that impact business operations. What could prevent the alerts from firing despite matching log entries?

naveenmaster · July 14, 2025, 6:39am

First thing to check - is the log-based metric actually incrementing? Go to Metrics Explorer and query your custom metric to see if data points are being recorded. Sometimes the metric filter syntax doesn’t match what’s actually in the logs. Also verify the metric is being created in the same project where you’re checking alerts.

singhace · July 18, 2025, 12:19pm

Carlos, the issue is likely your filter syntax. Log-based metrics use different syntax than Logs Explorer. For JSON payload fields, you need jsonPayload.apiStatus but the comparison operator should be separate. Try testing your filter directly in the log-based metric preview before saving. Also check if apiStatus is a string vs integer - that matters for comparisons.

barbara_func · July 16, 2025, 7:49am

Alex, checked Metrics Explorer and the metric shows zero data points over the past 7 days, even though Cloud Logging clearly shows ERROR entries with apiStatus fields. The logs and metric are in the same project. Could the jsonPayload path be incorrect?

julia90 · August 21, 2025, 5:00am

Here’s the comprehensive solution covering log-based metric configuration, alerting policy setup, and API error monitoring:

Root Cause Analysis:

Your alerts weren’t triggering due to three issues: incorrect filter syntax for string comparison, inconsistent log field names across services, and alignment period misconfiguration.

Log-Based Metric Configuration:

Create a proper filter that handles both string status codes and multiple field paths:


severity="ERROR" AND (
  jsonPayload.apiStatus="400" OR jsonPayload.apiStatus="500" OR
  httpResponse.status>=400
)
resource.type="cloud_run_revision" OR resource.type="k8s_container"

Key Configuration Steps:

Create the metric with correct data type:
- Metric type: Counter (for counting failures)
- Value extractor: Leave empty for simple counting
- Labels: Extract jsonPayload.serviceName and jsonPayload.endpoint for granular alerting
Test the filter before saving:
- Use the preview feature in log-based metrics console
- Verify it matches recent log entries
- Check data appears in Metrics Explorer within 2-3 minutes

Alerting Policy Setup:

Create a properly configured alerting policy:

Condition Configuration:
- Metric: Your log-based metric `logging.googleapis.com/user/erp_api_failures
- Rolling window: 5 minutes
- Alignment period: 1 minute (matches log ingestion frequency)
- Aggregator: Sum
- Threshold: > 10 failures
- Duration: 1 minute (trigger after threshold exceeded for 1 minute)
Group by fields: service_name, endpoint (for targeted alerts)
Notification channels: Configure multiple channels (email, Slack, PagerDuty) with proper webhook URLs

API Error Monitoring Best Practices:

Standardize logging across services:
- Use consistent field names (httpResponse.status recommended)
- Always log severity, timestamp, service name, endpoint, and error details
- Include correlation IDs for request tracing
Create tiered alerting:
- Warning: 5-10 failures in 5 minutes (email/Slack)
- Critical: >10 failures in 5 minutes (PagerDuty)
- Separate alerts for 4xx (client errors) vs 5xx (server errors)
Set up dashboard for visibility:
- Chart showing API failure rate over time
- Breakdown by service and endpoint
- Latency metrics alongside error rates

Validation Steps:

Generate test failures and verify metric increments in Metrics Explorer
Manually trigger alert by exceeding threshold
Confirm notifications arrive at all configured channels
Test alert auto-resolution when failures drop below threshold
Review alert history after 48 hours to tune thresholds

Common Pitfalls to Avoid:

Don’t use regex in log filters (performance impact)
Avoid alignment periods longer than your rolling window
Don’t set thresholds too low (alert fatigue) or too high (miss critical issues)
Remember log-based metrics have ~2 minute ingestion delay

Implement Cloud Trace for distributed tracing alongside these alerts for complete API observability across your ERP integration.

sage_boss · August 8, 2025, 4:27pm

Another gotcha - make sure your alerting policy alignment period matches your data granularity. If you’re using 1-minute alignment but data only arrives every 5 minutes, alerts might not trigger as expected.

ashley_gcp · August 1, 2025, 1:46pm

We had similar issues with inconsistent log structures across services. Recommend standardizing your logging format or creating multiple log-based metrics for different patterns. Also set up notification channels properly - we discovered our alerts were firing but going to a deprecated Slack channel.

carlossql · July 28, 2025, 7:57am

Maya, you’re right! The apiStatus field in our logs is actually a string “400” not an integer 400. The numeric comparison was failing silently. Also found that some logs use httpResponse.status instead of jsonPayload.apiStatus depending on the service.

Topic		Replies	Views
Cloud Monitoring alerts not triggering for custom metrics, missing critical incidents in production Google Cloud Platform (GCP) question , monitoring , notifications , observability , gcp-2019 , custom-metrics , cloud-monitoring , monitoring-alerts , alerting-policy	6	0	November 9, 2024
Cloud Monitoring alerts not triggering on ERP Compute Engine CPU spikes after metric filter change Google Cloud Platform (GCP) question , compute , observability , gcp-2020 , alerting , sla-breach , compute-engine , cloud-monitoring , cpu-metrics	3	0	July 12, 2025
Cloud Monitoring alerts not firing for ERP container CrashLoopBackOff events in GKE Google Cloud Platform (GCP) question , kubernetes , observability , gcp-2019 , cloud-monitoring , gke , container-servi , alerts-missing , incident-delay	3	0	April 25, 2025
Cloud Monitoring alerts missed ERP network latency spikes during peak hours IBM Cloud question , networking , performance , observability , ic-2019 , sla-monitoring , cloud-monitoring , alert-configuration , latency-metrics	4	0	March 18, 2025
CloudWatch metric alarms not triggering on custom application logs for ERP batch jobs Amazon Web Services (AWS) question , compute , observability , aws-2020 , cloudwatch , metric-filters , alarms , log-monitoring , missed-alerts	5	0	February 1, 2025
Cloud Monitoring alerts for Dataflow pipeline failures improved SLA compliance for marketing analytics Google Cloud Platform (GCP) use-case , monitoring , dataflow , observability , gcp-2020 , alerting , sla-compliance , cloud-monitoring , pipeline-monitoring	4	0	February 3, 2025
Best practices for API monitoring and alerting strategies using custom metrics in GCP Google Cloud Platform (GCP) discussion , observability , gcp-2021 , custom-metrics , cloud-monitoring , api-monitoring , alerting-policies , slo	4	0	May 18, 2025
Device registry alert not firing for unauthorized device registration attempts in multi-region setup Google Cloud IoT question , security , yaml , alerting , unauthorized-access , log-based-metrics , cloud-audit-logs , device-registry , gcpiot-24	4	0	December 2, 2025
CloudMonitor fails to trigger alerts for ECS CPU spikes during peak hours Alibaba Cloud question , monitoring , compute , notification , ac-2021 , ecs , cloudmonitor , alert-missing , cpu-threshold	3	1	November 30, 2025

Cloud Monitoring alerts not triggering for failed ERP API requests to inventory module

Related topics