Real-time alerting on unauthorized Pub/Sub topic access attempts using IAM policy monitoring and custom log metrics

cynthia_ninja · February 19, 2025, 1:15pm

We built a security monitoring system that provides real-time alerts when unauthorized access attempts occur on our IoT Pub/Sub topics. Our deployment has 15 topics handling telemetry from different device types, and we needed to detect potential security incidents immediately.

The implementation monitors Cloud Audit Logs for IAM policy violations and generates alerts within 60 seconds of unauthorized access attempts. This catches both internal misconfigurations (service accounts with incorrect permissions) and potential external threats (compromised credentials attempting topic access).

Core components: IAM policy monitoring via audit logs, log-based metric tracking access denied events, and Cloud Monitoring alert policy with Slack integration. The metric filter captures permission denied errors:

resource.type="pubsub_topic"
protoPayload.status.code=7
protoPayload.methodName=~"^google.pubsub.v1.Publisher"

We extract the principal identity and topic name as metric labels for granular alerting. This has caught several security issues including a misconfigured third-party integration attempting to publish to production topics and a compromised service account key trying to access restricted telemetry data.

angela_master · February 21, 2025, 10:01pm

Excellent use case! How do you handle false positives from legitimate access denied events? For example, during deployments when service accounts might temporarily have incorrect permissions while IAM changes propagate?

rebeccaninja · February 23, 2025, 10:16pm

Do you differentiate between subscriber access denied (pull/acknowledge) and publisher access denied? These have different security implications - unauthorized publishing is more critical than unauthorized subscription attempts. Your filter seems to only catch Publisher methods.

sharonexpert · February 28, 2025, 9:52am

What’s your alert notification strategy? With 15 topics and potentially multiple service accounts, you could get a lot of alerts. Do you batch notifications or send individually? Also curious about your incident response process - what actions do your ops team take when they receive these alerts?

jason_wizard · March 13, 2025, 7:43pm

Your implementation covers the key security monitoring components effectively. Let me expand on the complete architecture:

IAM Policy Monitoring: Comprehensive IAM monitoring requires tracking multiple event types. Extend your audit log monitoring beyond just access denied:

# Primary filter - Access Denied
resource.type="pubsub_topic"
protoPayload.status.code=7
protoPayload.serviceName="pubsub.googleapis.com"

# Secondary filter - IAM Policy Changes
resource.type="pubsub_topic"
protoPayload.methodName="google.iam.v1.IAMPolicy.SetIamPolicy"

# Tertiary filter - Unusual Access Patterns
resource.type="pubsub_topic"
protoPayload.status.code=0
jsonPayload.sourceIp!~"^10\..*"  # External IPs

We maintain three separate log-based metrics, each feeding different alert policies with different severity levels. Access denied events are high severity (immediate Slack + PagerDuty), IAM changes are medium severity (Slack only), unusual access patterns are low severity (daily digest email).

Log-Based Metric for Access Denied: Your metric filter correctly captures publisher access denied, but should be expanded:

filter: |
  resource.type="pubsub_topic"
  protoPayload.status.code=7
  protoPayload.serviceName="pubsub.googleapis.com"
  (protoPayload.methodName=~"^google.pubsub.v1.Publisher.*" OR
   protoPayload.methodName=~"^google.pubsub.v1.Subscriber.*")
metricDescriptor:
  metricKind: DELTA
  valueType: INT64
  displayName: "Unauthorized Pub/Sub Access Attempts"
labelExtractors:
  principal: EXTRACT(protoPayload.authenticationInfo.principalEmail)
  topic: EXTRACT(resource.labels.topic_id)
  method: EXTRACT(protoPayload.methodName)
  source_ip: EXTRACT(protoPayload.requestMetadata.callerIp)

The additional labels enable more sophisticated alerting:

Alert immediately for external IPs (potential breach)
Alert after 3 attempts for internal IPs (likely misconfiguration)
Different notification channels based on topic sensitivity

Cloud Monitoring Alert Policy: Implement tiered alerting based on context:

High Priority Alert (immediate notification):

Condition: metric > 0 AND source_ip NOT in allowed ranges
Group by: principal, topic
Notification: Slack critical channel + PagerDuty
Auto-documentation includes: principal identity, topic name, source IP, timestamp

Medium Priority Alert (5-minute threshold):

Condition: metric > 3 in 5-minute window
Group by: principal
Notification: Slack security channel
Indicates repeated access attempts from same principal

Low Priority Alert (daily digest):

Condition: metric > 0 from known service accounts
Aggregation: Daily summary
Notification: Email to security team
Likely configuration issues rather than security threats

Incident Response Integration: Our alert notifications include actionable context:

{
  "alert_type": "unauthorized_pubsub_access",
  "severity": "HIGH",
  "principal": "service-account@project.iam.gserviceaccount.com",
  "topic": "iot-telemetry-production",
  "source_ip": "203.0.113.45",
  "timestamp": "2025-07-11T13:25:34Z",
  "recommended_actions": [
    "Verify if this service account should have access",
    "Check if IP address is from known infrastructure",
    "Review recent IAM policy changes",
    "Consider disabling service account key if compromised"
  ],
  "investigation_links": {
    "audit_logs": "https://console.cloud.google.com/logs/...",
    "iam_policy": "https://console.cloud.google.com/iam-admin/...",
    "service_account": "https://console.cloud.google.com/iam-admin/serviceaccounts/..."
  }
}

We use Cloud Functions to enrich alerts with this context before sending to Slack/PagerDuty. The function queries additional APIs to determine:

When the service account key was created
Recent successful accesses from this principal
Other topics this principal has accessed
Whether the principal is part of a known application

Advanced Features: We’ve added several enhancements beyond basic alerting:

Automatic response for high-confidence threats: If access denied + external IP + never-seen-before principal, automatically disable the service account key and create an incident ticket
Baseline learning: Track normal access patterns for 30 days, then alert on deviations (e.g., service account suddenly accessing topics it never touched before)
Correlation with VPC Flow Logs: Cross-reference Pub/Sub access attempts with network traffic patterns to identify coordinated attacks
Integration with Cloud Asset Inventory: Automatically check if the accessing principal has legitimate business need based on project labels and resource hierarchy

Results: Since implementing this system 8 months ago:

Detected 47 unauthorized access attempts (32 misconfigurations, 15 potential security incidents)
Average detection time: 45 seconds from access attempt to alert
2 confirmed compromised service account keys caught before data exfiltration
Zero false negatives (all unauthorized attempts were detected)
False positive rate: ~5% (mostly from deployment timing issues)

The system has become a critical component of our security posture, providing visibility into access control that we previously lacked. The key success factor was making alerts actionable - including enough context that the on-call engineer can immediately determine if it’s a real threat or benign misconfiguration.

andrew_coder · February 23, 2025, 11:18am

We implement a suppression window during planned maintenance. Our deployment pipeline publishes a message to a control topic that triggers a Cloud Function to temporarily disable the alert policy. After deployment completes, the policy is re-enabled. For unplanned IAM changes, we accept the alert as a signal to investigate - even if it’s a false positive, it indicates an unplanned permission change that should be reviewed.

Topic		Replies	Views
Automated security audit logs and alerts for unauthorized analytics access attempts Oracle Cloud use-case , monitoring , analytics , security , compliance , oci-2020 , audit-logs , incident-response , cloud-guard	7	0	August 29, 2025
Automated security policy alerts for suspicious device behavior reduced incident response by 70% Microsoft Azure IoT use-case , automation , security-policy , json , alerting , azure-monitor , incident-response , aziot-24 , security-alerts	7	0	December 4, 2025
Cloud Monitoring alerts for Dataflow pipeline failures improved SLA compliance for marketing analytics Google Cloud Platform (GCP) use-case , monitoring , dataflow , observability , gcp-2020 , alerting , sla-compliance , cloud-monitoring , pipeline-monitoring	4	0	February 3, 2025
Automated device onboarding with real-time alerting for failures Google Cloud IoT use-case , integration , automation , error-handling , javascript , alerting , cloud-functions , device-onboarding , gcpiot-24	7	0	October 6, 2025
Automated real-time sensor data pipeline from IoT devices to dashboards Google Cloud IoT use-case , connectivity , python , cloud-functions , bigquery , data-studio , viz-dashboard , gcpiot-25 , real-time-pipeline	7	0	August 5, 2025
Balancing security policy alert sensitivity and operational efficiency Cisco IoT Cloud Connect discussion , security-policy , siem , alerting , cciot-25 , iot-operations , alert-fatigue , threshold-tuning	7	0	June 24, 2025
Security policy blocks Pub/Sub topic access for IoT event forwarding, causing event loss Google Cloud IoT question , iam , python , access-denied , event-forwarding , security-pol , event-processin , pubsub-23 , publisher-role	5	0	April 2, 2025
Automated Azure Monitor alerts for edge device fleet improved uptime by 40% in manufacturing Microsoft Azure use-case , edge-computing , observability , log-analytics , az-2019 , azure-monitor , missed-outages , uptime-improvement , kusto	3	1	July 12, 2025
Automated network policy enforcement using OCI security APIs reduces compliance violations in regulated environments Oracle Cloud use-case , ci-cd , audit , automation , compliance , rest-api , oci-2019 , python , apis	4	0	September 30, 2025

Real-time alerting on unauthorized Pub/Sub topic access attempts using IAM policy monitoring and custom log metrics

Related topics