Real-time alerting on unauthorized Pub/Sub topic access attempts using IAM policy monitoring and custom log metrics

We built a security monitoring system that provides real-time alerts when unauthorized access attempts occur on our IoT Pub/Sub topics. Our deployment has 15 topics handling telemetry from different device types, and we needed to detect potential security incidents immediately.

The implementation monitors Cloud Audit Logs for IAM policy violations and generates alerts within 60 seconds of unauthorized access attempts. This catches both internal misconfigurations (service accounts with incorrect permissions) and potential external threats (compromised credentials attempting topic access).

Core components: IAM policy monitoring via audit logs, log-based metric tracking access denied events, and Cloud Monitoring alert policy with Slack integration. The metric filter captures permission denied errors:

resource.type="pubsub_topic"
protoPayload.status.code=7
protoPayload.methodName=~"^google.pubsub.v1.Publisher"

We extract the principal identity and topic name as metric labels for granular alerting. This has caught several security issues including a misconfigured third-party integration attempting to publish to production topics and a compromised service account key trying to access restricted telemetry data.

Excellent use case! How do you handle false positives from legitimate access denied events? For example, during deployments when service accounts might temporarily have incorrect permissions while IAM changes propagate?

Do you differentiate between subscriber access denied (pull/acknowledge) and publisher access denied? These have different security implications - unauthorized publishing is more critical than unauthorized subscription attempts. Your filter seems to only catch Publisher methods.

What’s your alert notification strategy? With 15 topics and potentially multiple service accounts, you could get a lot of alerts. Do you batch notifications or send individually? Also curious about your incident response process - what actions do your ops team take when they receive these alerts?

Your implementation covers the key security monitoring components effectively. Let me expand on the complete architecture:

IAM Policy Monitoring: Comprehensive IAM monitoring requires tracking multiple event types. Extend your audit log monitoring beyond just access denied:

# Primary filter - Access Denied
resource.type="pubsub_topic"
protoPayload.status.code=7
protoPayload.serviceName="pubsub.googleapis.com"

# Secondary filter - IAM Policy Changes
resource.type="pubsub_topic"
protoPayload.methodName="google.iam.v1.IAMPolicy.SetIamPolicy"

# Tertiary filter - Unusual Access Patterns
resource.type="pubsub_topic"
protoPayload.status.code=0
jsonPayload.sourceIp!~"^10\..*"  # External IPs

We maintain three separate log-based metrics, each feeding different alert policies with different severity levels. Access denied events are high severity (immediate Slack + PagerDuty), IAM changes are medium severity (Slack only), unusual access patterns are low severity (daily digest email).

Log-Based Metric for Access Denied: Your metric filter correctly captures publisher access denied, but should be expanded:

filter: |
  resource.type="pubsub_topic"
  protoPayload.status.code=7
  protoPayload.serviceName="pubsub.googleapis.com"
  (protoPayload.methodName=~"^google.pubsub.v1.Publisher.*" OR
   protoPayload.methodName=~"^google.pubsub.v1.Subscriber.*")
metricDescriptor:
  metricKind: DELTA
  valueType: INT64
  displayName: "Unauthorized Pub/Sub Access Attempts"
labelExtractors:
  principal: EXTRACT(protoPayload.authenticationInfo.principalEmail)
  topic: EXTRACT(resource.labels.topic_id)
  method: EXTRACT(protoPayload.methodName)
  source_ip: EXTRACT(protoPayload.requestMetadata.callerIp)

The additional labels enable more sophisticated alerting:

  • Alert immediately for external IPs (potential breach)
  • Alert after 3 attempts for internal IPs (likely misconfiguration)
  • Different notification channels based on topic sensitivity

Cloud Monitoring Alert Policy: Implement tiered alerting based on context:

High Priority Alert (immediate notification):

  • Condition: metric > 0 AND source_ip NOT in allowed ranges
  • Group by: principal, topic
  • Notification: Slack critical channel + PagerDuty
  • Auto-documentation includes: principal identity, topic name, source IP, timestamp

Medium Priority Alert (5-minute threshold):

  • Condition: metric > 3 in 5-minute window
  • Group by: principal
  • Notification: Slack security channel
  • Indicates repeated access attempts from same principal

Low Priority Alert (daily digest):

  • Condition: metric > 0 from known service accounts
  • Aggregation: Daily summary
  • Notification: Email to security team
  • Likely configuration issues rather than security threats

Incident Response Integration: Our alert notifications include actionable context:

{
  "alert_type": "unauthorized_pubsub_access",
  "severity": "HIGH",
  "principal": "service-account@project.iam.gserviceaccount.com",
  "topic": "iot-telemetry-production",
  "source_ip": "203.0.113.45",
  "timestamp": "2025-07-11T13:25:34Z",
  "recommended_actions": [
    "Verify if this service account should have access",
    "Check if IP address is from known infrastructure",
    "Review recent IAM policy changes",
    "Consider disabling service account key if compromised"
  ],
  "investigation_links": {
    "audit_logs": "https://console.cloud.google.com/logs/...",
    "iam_policy": "https://console.cloud.google.com/iam-admin/...",
    "service_account": "https://console.cloud.google.com/iam-admin/serviceaccounts/..."
  }
}

We use Cloud Functions to enrich alerts with this context before sending to Slack/PagerDuty. The function queries additional APIs to determine:

  • When the service account key was created
  • Recent successful accesses from this principal
  • Other topics this principal has accessed
  • Whether the principal is part of a known application

Advanced Features: We’ve added several enhancements beyond basic alerting:

  1. Automatic response for high-confidence threats: If access denied + external IP + never-seen-before principal, automatically disable the service account key and create an incident ticket

  2. Baseline learning: Track normal access patterns for 30 days, then alert on deviations (e.g., service account suddenly accessing topics it never touched before)

  3. Correlation with VPC Flow Logs: Cross-reference Pub/Sub access attempts with network traffic patterns to identify coordinated attacks

  4. Integration with Cloud Asset Inventory: Automatically check if the accessing principal has legitimate business need based on project labels and resource hierarchy

Results: Since implementing this system 8 months ago:

  • Detected 47 unauthorized access attempts (32 misconfigurations, 15 potential security incidents)
  • Average detection time: 45 seconds from access attempt to alert
  • 2 confirmed compromised service account keys caught before data exfiltration
  • Zero false negatives (all unauthorized attempts were detected)
  • False positive rate: ~5% (mostly from deployment timing issues)

The system has become a critical component of our security posture, providing visibility into access control that we previously lacked. The key success factor was making alerts actionable - including enough context that the on-call engineer can immediately determine if it’s a real threat or benign misconfiguration.

We implement a suppression window during planned maintenance. Our deployment pipeline publishes a message to a control topic that triggers a Cloud Function to temporarily disable the alert policy. After deployment completes, the policy is re-enabled. For unplanned IAM changes, we accept the alert as a signal to investigate - even if it’s a false positive, it indicates an unplanned permission change that should be reviewed.