We’re seeing a big shift in how our platform team is being pulled into AI adoption conversations. Historically we focused on CI/CD, self-service infrastructure, and dev productivity tooling. Now we’re being asked to stand up MLOps pipelines, govern agent deployments, and provide orchestration for autonomous workflows that span multiple business systems.
The pressure is coming from two directions. Development teams are experimenting with coding assistants and building one-off proof-of-concepts using local scripts and uncoordinated tools. That’s creating sprawl—every team is reinventing prompt orchestration, data access, and safety evaluations. Meanwhile, the business side is asking why pilots aren’t translating into production systems that deliver measurable ROI. The gap isn’t technology; it’s operational structure.
We’re now deciding whether platform engineering should own the agent control plane—identity management for agents, sandboxing, observability, and governance—or if that belongs with data science, IT security, or some new AI-specific team. The risk is that if we don’t take ownership, we end up with shadow AI and fragmented accountability. But if we do, we’re expanding our scope significantly into areas where we don’t yet have deep expertise.
Curious how other platform teams are navigating this. Are you treating agents as infrastructure? Who owns the orchestration layer, and how are you balancing speed with governance?
Security and compliance are the main reasons we got pulled into this. Legal and InfoSec were getting nervous about agents with elevated permissions operating without audit trails. We implemented sandboxing where agents execute in isolated environments with explicit restrictions, and everything generates pull requests that still go through normal code review and CI/CD checks. That preserved our engineering standards while enabling agent-assisted workflows. It also gave us a forcing function to document what access patterns are actually safe.
We’re using a multi-account architecture to separate concerns. One account for data governance, separate accounts for model training versus serving, and strict IAM boundaries between them. That helps with compliance because we can demonstrate clear separation of duties. The trade-off is complexity—we went from managing a few thousand services to nearly double that overnight. We ended up creating an architecture review board that evaluates new AI projects and ensures they use consistent tools and infrastructure.
We adopted Model Context Protocol as our integration layer and it’s been a game changer for interoperability. Both supervised agents (interactive) and autonomous agents (batch-style) can use the same tools and access the same enterprise context—code indexes, historical pull requests, service documentation. That solved the fragmentation problem where different teams were building incompatible agents. Now we have a unified protocol and teams can build domain-specific agents that still plug into common infrastructure.
The cultural shift has been as important as the technical architecture. We had to help development teams understand that we’re not trying to slow them down or own their AI experiments—we’re providing the scaffolding so they can move faster safely. Self-service provisioning through approved catalogs and pre-built templates reduced friction significantly. Teams can still innovate and experiment, but they’re doing it within guardrails that ensure governance and security from day one.
The evaluation piece is critical and often overlooked. We built checks that measure whether agents are producing correct outputs—does the build pass, does the code follow team conventions, does it achieve the business outcome. That shifted evaluation from pass/fail to nuanced measurement of agent behavior. It also gave us a feedback loop to refine prompts and tool integrations over time. Without rigorous evaluation, you’re flying blind on whether agents are actually helping or just creating technical debt.
We took ownership of the orchestration layer about six months ago and it’s been the right call. The key was recognizing that agents need the same kind of standardized plumbing we already provide for app deployments—identity, access controls, audit trails, deployment pipelines. We built an internal catalog where teams register their agents, and every agent gets a unique identity that plugs into our existing IAM infrastructure. That solved the shadow AI problem almost immediately because now we have visibility into what’s running and where.
One thing we learned the hard way: don’t try to build a generic agent platform. We wasted three months building abstractions that were too generic to be useful. What worked was focusing on specific domain problems—code generation workflows, incident triage, data pipeline orchestration—and building deep organizational context into those agents. Once you have a few working examples with real ROI, the patterns become clearer and you can start generalizing.