Back to Blog
January 23, 2026
6 min read
Shubh Pundir

Building Agentic Workflows for administration that actually run — v1

Diving into the architecture of Mudda AI: Why we separate LLM reasoning from durable system orchestration using Temporal and LangGraph.

This isn’t a chatbot. It’s an agentic workflow system designed to turn vague, human problem statements into durable, auditable, long-running system executions.

I’m building Mudda AI to solve a very unglamorous problem: civic issues don’t fail fast, they fail silently. Water leaks, power outages, road damage — the hardest part isn’t detecting them, it’s orchestrating the messy, long-running administrative workflows that follow.

This isn’t a chatbot. It’s an agentic workflow system designed to turn vague, human problem statements into durable, auditable, long-running system executions — and to do so without relying on LLMs to babysit state.

This post covers the first concrete version of the system: what exists today, why it’s structured this way, and where I drew hard architectural boundaries.

The Problem I’m Actually Solving

Most “Agentic AIs” demos stop at planning. They generate a graph, maybe call a few APIs, and then quietly fall apart when:

  • A step takes 3 days instead of 3 seconds
  • A human approval is required
  • A service is down at 2 in the morning
  • The process needs an audit trail six months later

Civic systems live entirely in this failure mode. So I started from a simple constraint: LLMs may plan. They must never own execution state. Everything that follows is a consequence of that.

High-Level Architecture

At a glance, Mudda AI is split into two distinct orchestration layers:

  1. LLM Orchestration → Reasoning, planning, interpretation
  2. System Orchestration → Execution, retries, waiting, auditability

They serve different failure models, so I refused to collapse them into one abstraction.

HLD — Agentic Workflow v1

HLD — Agentic Workflow v1: Architecture showing the split between LLM and System Orchestration

LLM Orchestration vs System Orchestration

DimensionLLM Orchestration (LangGraph-style)System Orchestration (Temporal.io)
Primary roleInterpret intent, plan workflowsExecute workflows reliably
Typical durationMilliseconds to secondsHours, days, or weeks
State ownershipEphemeral / best-effortDurable and persisted
Failure modelRetry by re-promptingAutomatic retries, resumable
Human-in-the-loopPoor fitFirst-class support
AuditabilityLimitedFull execution history
Cost sensitivityToken-boundInfra-bound
When it breaksSilent or partialObservable and recoverable

Components: What They Are and Why They Live Under System Orchestration

Components are not tools in the LLM sense. I’ve added MCP-like components specifically for system orchestration. Each component behaves like a standalone API or microservice (e.g., sending emails, booking contractors, scheduling inspections, releasing budgets). These are not abstract capabilities; they are concrete system actions.

  • Is invoked like an API call
  • Executes as part of a Temporal workflow
  • Returns a structured response on completion
  • Leaves a linear, traceable execution path inside the system orchestration service
  • Contains sent and received requests’ content

Why they live under Temporal:

LLMs are not designed to wait for days, retry across deploys, track partial completion, or remain legally auditable. So components are executed exclusively under system orchestration (Temporal), where waiting and failure are normal states, not edge cases.

The Two-Agent Planning Engine

At the heart of the system is a deliberately boring but reliable two-agent pipeline running on Gemini 2.0 Flash.

Agent 1: Component Selector

The first agent does nothing except reduce blast radius. It filters a subset of system components based on the user's problem statement. This exists for context control—keeping prompts smaller, plans tighter, and debugging easier.

Agent 2: Workflow Plan Maker

The second agent plans within strict boundaries. It generates a Directed Acyclic Graph (DAG) represented as executable JSON, binding concrete API actions to each step and marking steps requiring human approval.

Why Planning Stops at JSON

This is where most agentic systems get sloppy. LLMs are excellent at interpreting intent and synthesizing plans, but terrible at holding state across days or exactly-once execution.

LLMs generate plans. Systems execute them.
No retries. No waiting. No callbacks inside the model loop.

Temporal.io: System Orchestration That Can Wait

Temporal handle properties that normal job queues don't: steps that take weeks, human-in-the-loop approvals without polling hacks, and partial failures that must resume. It provides durable execution where state survives crashes and deploys.

LLM Orchestration Lives Elsewhere

Separate flows (LangGraph-style) handle RAG calls for legal and policy grounding, prompt injection resistance, and reasoning chains that must stay model-native. These are short-lived, stateless, and cheap to retry.

Backend Structure (Today)

Directory breakdown

backend/
├── main.py
├── temporal_workflows.py
├── temporal_worker.py
├── services/
│   ├── ai_service.py
│   ├── workflow_service.py
│   └── component_service.py
├── models/
├── routers/
└── sessions/

What This Buys Me

  • Workflows survive deploys
  • Failures are replayable
  • AI decisions are inspectable
  • Humans stay in control
  • Cost stays predictable

What Comes Next

This is the foundation. Next iterations will harden plan validation, partial re-planning for failed branches, multi-tenant isolation, and LangSmith integration for evaluation and visibility.

“I’d rather ship a system that waits correctly than one that reasons beautifully and fails quietly.”

Thanks for reading!
Read more articles