We’re attempting to deploy a trained predictive maintenance ML model from Vertex AI to our IoT device registry for real-time inference on edge devices. The deployment consistently fails with a permission denied error when trying to access the Vertex AI endpoint.
Error trace:
PermissionDenied: 403 POST https://aiplatform.googleapis.com/v1/projects/iot-prod/locations/us-central1/endpoints
IAM permission 'aiplatform.endpoints.create' denied
at VertexAIClient.deploy(vertex_client.py:245)
The service account has iot.registryEditor role assigned and can manage device registries without issues. However, when we try to deploy the model for predictive maintenance on device telemetry, it blocks at the Vertex AI endpoint creation step. We need this model deployed to analyze sensor data streams and predict equipment failures before they occur. What IAM configuration are we missing for cross-service integration?
The iot.registryEditor role only covers IoT Core operations, not Vertex AI resources. For ML model deployment, your service account needs separate Vertex AI permissions. The error clearly shows it’s missing aiplatform.endpoints.create permission which is part of the Vertex AI User role. You’ll need to grant additional IAM roles that bridge both services for predictive maintenance workflows.
One thing often overlooked - check if your Vertex AI endpoint is trying to access models in a different project or region. Cross-project model deployment requires aiplatform.models.get in the source project where the trained model resides. Also verify that the service account has storage.objects.get if your model artifacts are in Cloud Storage buckets, which is typical for predictive maintenance models trained on historical device data.
Thanks for clarifying. Should I add the full roles/aiplatform.user role or are there more granular permissions? We’re concerned about over-permissioning since this service account will be used in production for automated model deployments across multiple device registries.
I’ll walk through the complete IAM configuration needed for Vertex AI model deployment to IoT device registries, addressing all three focus areas systematically.
Vertex AI Endpoint Permissions:
Your service account needs these specific permissions for endpoint management:
aiplatform.endpoints.create
aiplatform.endpoints.deploy
aiplatform.endpoints.get
aiplatform.endpoints.predict
Create a custom role or use roles/aiplatform.user which includes these. The endpoint permissions are essential for creating the serving infrastructure that your device registry will query for real-time predictions.
Service Account IAM Roles:
You need a multi-layered IAM setup:
-
Primary Service Account (the one making deployments):
roles/iot.admin or roles/iot.registryEditor for device registry operations
- Custom role with Vertex AI permissions listed above
roles/storage.objectViewer on buckets containing model artifacts
-
Service Account Impersonation:
- Grant
roles/iam.serviceAccountUser on the Compute Engine default service account
- This allows your deployment SA to act as the serving SA
-
Model Access:
aiplatform.models.get and aiplatform.models.list in the project where models are trained
- If cross-project: grant these in source project to your deployment SA
Predictive Maintenance Model Deployment:
For the actual deployment workflow:
# Pseudocode - Complete deployment steps:
1. Authenticate using service account with combined permissions
2. Load trained model from Vertex AI Model Registry
3. Create endpoint with machine type suitable for inference load
4. Deploy model to endpoint with traffic split configuration
5. Update IoT device registry config with endpoint URL
6. Configure device telemetry routing to trigger predictions
7. Set up monitoring for model drift and latency
# Reference: Vertex AI Python Client Library v1.25+
Common Pitfalls:
- API Enablement Timing: Enable
aiplatform.googleapis.com, iot.googleapis.com, and storage.googleapis.com at least 5-10 minutes before deployment
- Regional Constraints: Endpoint and model must be in same region. IoT Core device registries can call any region, but latency matters for predictive maintenance
- Quotas: Check Vertex AI endpoint quotas in your project - default is often 10 endpoints per region
- Service Account Keys: If using key files (not recommended), ensure the JSON key is for the correct SA and hasn’t expired
Verification Steps:
- Test endpoint creation independently: `gcloud ai endpoints create --region=us-central1 --display-name=test
- Verify SA permissions: `gcloud projects get-iam-policy PROJECT_ID --flatten=“bindings[].members” --filter=“bindings.members:serviceAccount:YOUR_SA”
- Test model access: Try listing models using the service account credentials
- Check audit logs in Cloud Logging for detailed permission denial reasons
Production Best Practices:
- Use Workload Identity instead of service account keys when possible
- Implement separate service accounts for training vs. serving vs. deployment
- Set up Cloud Monitoring alerts for endpoint health and prediction latency
- Enable request-response logging for debugging model predictions
- Configure auto-scaling on endpoints based on device telemetry volume
Once you’ve applied these IAM roles, your predictive maintenance model should deploy successfully and be accessible from your device registry for real-time inference on sensor data streams.
For production, I’d recommend creating a custom role with only the necessary permissions rather than using predefined roles. You need aiplatform.endpoints.create, aiplatform.endpoints.deploy, and aiplatform.models.get at minimum. Also ensure your service account has iam.serviceAccounts.actAs on the Compute Engine default service account if your endpoint uses it for serving. This follows least-privilege principle while enabling your predictive maintenance pipeline.