Jam with AI

Jam with AI

Agent Ops in the Real World

How you should run AI Agents in Production

Shantanu Ladhwe's avatar
Shirin Khosravi Jam's avatar
Shantanu Ladhwe and Shirin Khosravi Jam
Mar 05, 2026
∙ Paid

Hey there 👋,

Welcome to the detailed blog on AgentOps.

Everyone talks about building “AI agents”.

But how are they operated and managed at scale?

Once an agent leaves the notebook and starts answering real customer questions, you stop thinking in terms of “prompts” and start thinking in terms of:

  • SLOs and incident runbooks

  • Scaling, caching, rate limits and cost controls

  • Secrets, IAM, and data access

  • Cross-account networking

  • Rollbacks and safe experiments

In this post, we want to walk through how a real customer-facing AI agent runs in production, and how we think about Agent Ops: the operational stack and practices behind a stateful, LLM-powered service.

We will use concrete components (ECS, Redis, OpenSearch, Postgres, DynamoDB, Datadog, Opik, etc.), but the principles carry over to any modern stack.


What I Mean by “Agent Ops”

When I say Agent Ops, I don’t mean “how to write a better system prompt”.

For me, Agent Ops is everything that happens after you have got a working prototype:

How you deploy, monitor, debug, scale, secure, and iterate on an AI agent that lives inside your production environment.

It sits on top of traditional MLOps / DevOps, but with some extra complexity:

  • Agents are stateful (conversations, tools, retrieval).

  • They depend on external LLM APIs and observability systems.

  • Their behavior is driven not only by code, but also by prompts, configs, and tool wiring.

  • They often run RAG pipelines and multi-step workflows that can fail in many places.

So let’s look at what this actually looks like for a real customer facing agentic workflow.


High-Level Architecture: The Agent’s “Ops Skeleton”

At a high level, the agent looks like this:

  • A FastAPI (or similar) application running on EC2 instances or ECS in private subnets

  • Fronted by an Application Load Balancer (ALB, HTTPS only)

  • Using:

    • Redis (ElastiCache) for caching

    • Postgres (RDS) / OpenSearch as a shared vector DB + application state

    • DynamoDB for chat history and session memory

    • S3 + Athena + Glue + KMS for analytics and data integrations

  • Integrated with:

    • Opik (or similar) for tracing, evaluations and prompt/config management

    • Internal systems/data pipelines via SNS / SQS / EventBridge or an event bus

    • Model APIs from model providers like OpenAI or Anthropic or Bedrock or orq.ai

During experimentation and in production you often need fallback, orq.ai Router really helps you here.

It is one API endpoint.
300+ models across OpenAI, Anthropic, Google, AWS.
Built-in retries, fallbacks, and caching.
And it's free to get started.
No minimum spend.

Link - Check here


Back: You would have two main environments (could be extended):

  • Dev (lower autoscaling limits, used for tests and experiments)

  • Production (higher capacity, stricter SLOs and alerts)

Each environment has its own Postgres, DynamoDB tables and Redis, with separate networking.

Now let’s zoom into the individual components from an Agent Ops point of view.


Important Announcements:

Announcement 1:

On 22nd March, premium subscribers will have a monthly call at 16:00 CET to discuss the topic in this blog further and ask questions.

Zoom meeting invites will follow. 🙂


Jam with AI is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Now, we walk through each block of Agent Ops discussed above in more detail:

1. Serving & Orchestration: The Agent’s Runtime

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2026 Shirin Khosravi Jam · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture