Metrics REST API ingestion delays causing stale dashboard data in production monitoring

We’re experiencing significant delays when ingesting custom metrics through the OCI Monitoring REST API. Our production dashboards are showing stale data with a 5-10 minute lag, which is unacceptable for real-time monitoring of critical services.

We’re posting metrics in batches every minute using the PostMetricData endpoint. The API calls succeed (200 OK), but the metrics don’t appear in our dashboards until much later. I’m wondering if this is related to metrics ingestion latency inherent to the service, or if we’re hitting API rate limits without proper error indication.


POST /20180401/metrics
metrics: [{timestamp: "2025-02-08T14:30:00Z", ...}]

Anyone else dealt with timestamp formatting issues or batching strategies that improve ingestion speed?

Interesting point about rate limits. I’m not seeing any explicit throttling headers in responses. Our batch size is around 150 data points per call, posted once per minute. According to OCI docs, the limit is 50,000 data points per minute per tenancy, so we should be well under that threshold. Could the issue be with how we’re formatting the timestamp? We’re using ISO8601 format but maybe not exactly RFC3339?

I’ve seen this exact behavior. The issue is usually rate limiting that’s not being surfaced properly. OCI Monitoring has limits on both the number of API calls and the number of metric data points per minute. When you exceed these, the API still returns 200 OK but your metrics get throttled and queued. Check the response headers for any throttling indicators. We had to implement exponential backoff and reduce our batch frequency to every 2-3 minutes instead of every minute.

Another thing to check: are you setting the correct namespace and dimensions? If your namespace is too generic or you’re using high-cardinality dimensions, the backend processing can slow down significantly. We saw similar delays until we optimized our dimension structure. Also, consider using metric streams instead of REST API for high-frequency metrics - streams have better throughput and lower latency for real-time use cases.

I can provide a comprehensive solution based on your symptoms and what I’ve learned deploying high-frequency monitoring in OCI.

Metrics Ingestion Latency: OCI Monitoring has inherent processing latency of 1-3 minutes for custom metrics. This is by design for aggregation and indexing. However, 5-10 minute delays indicate additional issues. The service processes metrics asynchronously, so even a 200 OK response doesn’t guarantee immediate availability in queries/dashboards.

API Rate Limits: You mentioned 150 data points per minute - this seems low, but the issue is likely per-namespace or per-dimension-combination limits, not just the overall tenancy limit. OCI enforces:

  • 50,000 data points per minute per tenancy (overall)
  • Rate limiting per namespace (undocumented but real)
  • Throttling based on dimension cardinality

The API doesn’t always return explicit throttling errors. Monitor the ‘opc-request-id’ header and check OCI audit logs for throttling events. Implement proper retry logic with exponential backoff:


// Pseudocode - Retry strategy:
1. Make PostMetricData API call
2. If response time > 2 seconds, consider it throttled
3. Wait (attempt_count^2) seconds before retry
4. Max 3 retry attempts per batch
5. Log failures for analysis

Timestamp Formatting: This was likely your primary issue. OCI requires strict RFC3339 format:


2025-02-14T08:30:00.000Z  // Correct UTC
2025-02-14T08:30:00+00:00 // Also correct
2025-02-14T08:30:00       // WRONG - missing timezone

Incorrect timestamps cause metrics to be queued for delayed processing or silently dropped. Always use UTC with explicit ‘Z’ suffix. Ensure your source systems have NTP configured - time drift beyond 2 hours causes rejection.

Optimization Recommendations:

  1. Reduce posting frequency to every 2-3 minutes instead of every minute
  2. Keep batch size under 100 data points per call
  3. Use low-cardinality dimensions (avoid unique identifiers)
  4. Implement client-side aggregation before posting
  5. For true real-time needs, consider OCI Monitoring Metric Streams (Kafka-based) instead of REST API
  6. Add client-side timestamp validation before posting
  7. Monitor your own API call latency - if responses take >1 second, you’re being throttled

The 5-10 minute delay you’re seeing is likely a combination of timestamp issues causing reprocessing and soft rate limiting. Fix the timestamp format first, then optimize batch size and frequency.

I’ve been testing different configurations. The timestamp format was definitely part of the issue, but there’s more to it.

Check your timestamp format carefully. OCI Monitoring requires RFC3339 format with timezone. If you’re using local time without proper timezone designation, the service might be queuing your metrics for processing rather than accepting them immediately. Also, batch size matters - posting too many metrics in a single call can cause processing delays.