Agent Ops in the Real World

How you should run AI Agents in Production

Shantanu Ladhwe and Shirin Khosravi Jam

Mar 05, 2026

∙ Paid

Hey there 👋,

Welcome to the detailed blog on AgentOps.

Everyone talks about building “AI agents”.
But how are they operated and managed at scale?

Once an agent leaves the notebook and starts answering real customer questions, you stop thinking in terms of “prompts” and start thinking in terms of:

SLOs and incident runbooks
Scaling, caching, rate limits and cost controls
Secrets, IAM, and data access
Cross-account networking
Rollbacks and safe experiments

In this post, we want to walk through how a real customer-facing AI agent runs in production, and how we think about Agent Ops: the operational stack and practices behind a stateful, LLM-powered service.

We will use concrete components (ECS, Redis, OpenSearch, Postgres, DynamoDB, Datadog, Opik, etc.), but the principles carry over to any modern stack.

What I Mean by “Agent Ops”

When I say Agent Ops, I don’t mean “how to write a better system prompt”.

For me, Agent Ops is everything that happens after you have got a working prototype:

How you deploy, monitor, debug, scale, secure, and iterate on an AI agent that lives inside your production environment.

It sits on top of traditional MLOps / DevOps, but with some extra complexity:

Agents are stateful (conversations, tools, retrieval).
They depend on external LLM APIs and observability systems.
Their behavior is driven not only by code, but also by prompts, configs, and tool wiring.
They often run RAG pipelines and multi-step workflows that can fail in many places.

So let’s look at what this actually looks like for a real customer facing agentic workflow.

High-Level Architecture: The Agent’s “Ops Skeleton”

At a high level, the agent looks like this:

A FastAPI (or similar) application running on EC2 instances or ECS in private subnets
Fronted by an Application Load Balancer (ALB, HTTPS only)
Using:
- Redis (ElastiCache) for caching
- Postgres (RDS) / OpenSearch as a shared vector DB + application state
- DynamoDB for chat history and session memory
- S3 + Athena + Glue + KMS for analytics and data integrations
Integrated with:
- Opik (or similar) for tracing, evaluations and prompt/config management
- Internal systems/data pipelines via SNS / SQS / EventBridge or an event bus
- Model APIs from model providers like OpenAI or Anthropic or Bedrock or orq.ai

During experimentation and in production you often need fallback, orq.ai Router really helps you here.

It is one API endpoint.
300+ models across OpenAI, Anthropic, Google, AWS.
Built-in retries, fallbacks, and caching.
And it's free to get started.
No minimum spend.
Link - Check here

Back: You would have two main environments (could be extended):

Dev (lower autoscaling limits, used for tests and experiments)
Production (higher capacity, stricter SLOs and alerts)

Each environment has its own Postgres, DynamoDB tables and Redis, with separate networking.

Now let’s zoom into the individual components from an Agent Ops point of view.

Important Announcements:

Announcement 1:

On 22nd March, premium subscribers will have a monthly call at 16:00 CET to discuss the topic in this blog further and ask questions.
Zoom meeting invites will follow. 🙂

Now, we walk through each block of Agent Ops discussed above in more detail:

Agent Ops in the Real World

How you should run AI Agents in Production

What I Mean by “Agent Ops”

High-Level Architecture: The Agent’s “Ops Skeleton”

Important Announcements:

1. Serving & Orchestration: The Agent’s Runtime

This post is for paid subscribers