Building Agentic Workflows for administration that actually run — v1
Diving into the architecture of Mudda AI: Why we separate LLM reasoning from durable system orchestration using Temporal and LangGraph.
This isn’t a chatbot. It’s an agentic workflow system designed to turn vague, human problem statements into durable, auditable, long-running system executions.
I’m building Mudda AI to solve a very unglamorous problem: civic issues don’t fail fast, they fail silently. Water leaks, power outages, road damage — the hardest part isn’t detecting them, it’s orchestrating the messy, long-running administrative workflows that follow.
This isn’t a chatbot. It’s an agentic workflow system designed to turn vague, human problem statements into durable, auditable, long-running system executions — and to do so without relying on LLMs to babysit state.
This post covers the first concrete version of the system: what exists today, why it’s structured this way, and where I drew hard architectural boundaries.
The Problem I’m Actually Solving
Most “Agentic AIs” demos stop at planning. They generate a graph, maybe call a few APIs, and then quietly fall apart when:
- A step takes 3 days instead of 3 seconds
- A human approval is required
- A service is down at 2 in the morning
- The process needs an audit trail six months later
Civic systems live entirely in this failure mode. So I started from a simple constraint: LLMs may plan. They must never own execution state. Everything that follows is a consequence of that.
High-Level Architecture
At a glance, Mudda AI is split into two distinct orchestration layers:
- LLM Orchestration → Reasoning, planning, interpretation
- System Orchestration → Execution, retries, waiting, auditability
They serve different failure models, so I refused to collapse them into one abstraction.

HLD — Agentic Workflow v1: Architecture showing the split between LLM and System Orchestration
LLM Orchestration vs System Orchestration
| Dimension | LLM Orchestration (LangGraph-style) | System Orchestration (Temporal.io) |
|---|---|---|
| Primary role | Interpret intent, plan workflows | Execute workflows reliably |
| Typical duration | Milliseconds to seconds | Hours, days, or weeks |
| State ownership | Ephemeral / best-effort | Durable and persisted |
| Failure model | Retry by re-prompting | Automatic retries, resumable |
| Human-in-the-loop | Poor fit | First-class support |
| Auditability | Limited | Full execution history |
| Cost sensitivity | Token-bound | Infra-bound |
| When it breaks | Silent or partial | Observable and recoverable |
Components: What They Are and Why They Live Under System Orchestration
Components are not tools in the LLM sense. I’ve added MCP-like components specifically for system orchestration. Each component behaves like a standalone API or microservice (e.g., sending emails, booking contractors, scheduling inspections, releasing budgets). These are not abstract capabilities; they are concrete system actions.
- Is invoked like an API call
- Executes as part of a Temporal workflow
- Returns a structured response on completion
- Leaves a linear, traceable execution path inside the system orchestration service
- Contains sent and received requests’ content
Why they live under Temporal:
LLMs are not designed to wait for days, retry across deploys, track partial completion, or remain legally auditable. So components are executed exclusively under system orchestration (Temporal), where waiting and failure are normal states, not edge cases.
The Two-Agent Planning Engine
At the heart of the system is a deliberately boring but reliable two-agent pipeline running on Gemini 2.0 Flash.
Agent 1: Component Selector
The first agent does nothing except reduce blast radius. It filters a subset of system components based on the user's problem statement. This exists for context control—keeping prompts smaller, plans tighter, and debugging easier.
Agent 2: Workflow Plan Maker
The second agent plans within strict boundaries. It generates a Directed Acyclic Graph (DAG) represented as executable JSON, binding concrete API actions to each step and marking steps requiring human approval.
Why Planning Stops at JSON
This is where most agentic systems get sloppy. LLMs are excellent at interpreting intent and synthesizing plans, but terrible at holding state across days or exactly-once execution.
LLMs generate plans. Systems execute them.
No retries. No waiting. No callbacks inside the model loop.
Temporal.io: System Orchestration That Can Wait
Temporal handle properties that normal job queues don't: steps that take weeks, human-in-the-loop approvals without polling hacks, and partial failures that must resume. It provides durable execution where state survives crashes and deploys.
LLM Orchestration Lives Elsewhere
Separate flows (LangGraph-style) handle RAG calls for legal and policy grounding, prompt injection resistance, and reasoning chains that must stay model-native. These are short-lived, stateless, and cheap to retry.
Backend Structure (Today)
Directory breakdown
backend/ ├── main.py ├── temporal_workflows.py ├── temporal_worker.py ├── services/ │ ├── ai_service.py │ ├── workflow_service.py │ └── component_service.py ├── models/ ├── routers/ └── sessions/
What This Buys Me
- Workflows survive deploys
- Failures are replayable
- AI decisions are inspectable
- Humans stay in control
- Cost stays predictable
What Comes Next
This is the foundation. Next iterations will harden plan validation, partial re-planning for failed branches, multi-tenant isolation, and LangSmith integration for evaluation and visibility.
“I’d rather ship a system that waits correctly than one that reasons beautifully and fails quietly.”