Let me provide a comprehensive solution covering all three focus areas: custom metric ingestion, alerting policy configuration, and notification channel setup.
Custom Metric Ingestion:
The root cause of your issue is incomplete resource labels in your metric writes. For gce_instance resource type, you must provide all required labels. Here’s the corrected ingestion code:
import time
from google.cloud import monitoring_v3
client = monitoring_v3.MetricServiceClient()
project_name = f"projects/{project_id}"
# Create time series with complete resource labels
series = monitoring_v3.TimeSeries()
series.metric.type = 'custom.googleapis.com/app/error_rate'
series.resource.type = 'gce_instance'
series.resource.labels['instance_id'] = instance_id
series.resource.labels['zone'] = zone # REQUIRED - was missing
series.resource.labels['project_id'] = project_id
point = monitoring_v3.Point({
'interval': {'end_time': {'seconds': int(time.time())}},
'value': {'double_value': error_rate}
})
series.points = [point]
client.create_time_series(name=project_name, time_series=[series])
Before writing metrics, ensure your custom metric descriptor is properly created with the correct value type and metric kind. For error rates, use GAUGE metric kind with DOUBLE value type.
Alerting Policy Configuration:
Your alerting policy needs proper alignment and aggregation settings that match your metric ingestion frequency. Since you’re writing every 30 seconds, configure your policy with these parameters:
- Alignment Period: 1 minute (60 seconds)
- Per-series aligner: ALIGN_MEAN or ALIGN_MAX (use MAX for error rates to catch spikes)
- Cross-series reducer: REDUCE_MEAN (if aggregating across multiple instances)
- Condition Duration: 5 minutes (as you specified)
- Threshold: > 5.0
Critically, ensure your policy filter matches the exact resource labels you’re using in metric ingestion. Use the Metrics Explorer to construct the filter by selecting your custom metric and examining the available labels. The filter should look like:
resource.type = "gce_instance"
AND metric.type = "custom.googleapis.com/app/error_rate"
If you want to alert on specific instances, add instance-level filters. For alerting across all instances, use the cross-series reducer.
Notification Channel Setup:
Verify your notification channels are properly configured and linked:
- Navigate to Monitoring > Alerting > Edit Policy > Notifications
- Confirm your notification channels (email, PagerDuty, Slack, etc.) are explicitly listed
- Check notification channel settings for any filtering rules that might suppress alerts
- Enable “Auto-close duration” to automatically resolve incidents when conditions clear
- Consider setting up multiple notification channels for redundancy
Validation and Troubleshooting:
After implementing these fixes:
- Use Metrics Explorer to verify metrics are being ingested with complete labels
- Check the alerting policy’s “Incident” page for evaluation history - “No data” should disappear
- Temporarily lower your threshold to trigger a test alert and verify the entire pipeline
- Monitor the “Alerting Policies” dashboard for policy health and evaluation status
- Set up a synthetic test that deliberately exceeds thresholds to validate alerting
Additional Best Practices:
- Write metrics at consistent intervals (every 60 seconds is optimal for most use cases)
- Implement retry logic with exponential backoff for metric writes
- Monitor the Monitoring API quota usage to ensure you’re not hitting rate limits
- Use structured logging to capture metric write failures for debugging
- Consider using OpenTelemetry for standardized metric collection and export
By ensuring complete resource labels in your metric ingestion, properly configuring alignment and aggregation in your alerting policies, and verifying notification channel linkage, your alerts should start triggering correctly. The missing zone label was preventing policy evaluation, which is why you were seeing “No data” conditions and missing critical incidents.