Building Agentic Workflows for administration that actually run — v1

This isn’t a chatbot. It’s an agentic workflow system designed to turn vague, human problem statements into durable, auditable, long-running system executions.

I’m building Mudda AI to solve a very unglamorous problem: civic issues don’t fail fast, they fail silently. Water leaks, power outages, road damage — the hardest part isn’t detecting them, it’s orchestrating the messy, long-running administrative workflows that follow.

This isn’t a chatbot. It’s an agentic workflow system designed to turn vague, human problem statements into durable, auditable, long-running system executions — and to do so without relying on LLMs to babysit state.

This post covers the first concrete version of the system: what exists today, why it’s structured this way, and where I drew hard architectural boundaries.

The Problem I’m Actually Solving

Most “Agentic AIs” demos stop at planning. They generate a graph, maybe call a few APIs, and then quietly fall apart when:

A step takes 3 days instead of 3 seconds
A human approval is required
A service is down at 2 in the morning
The process needs an audit trail six months later

Civic systems live entirely in this failure mode. So I started from a simple constraint: LLMs may plan. They must never own execution state. Everything that follows is a consequence of that.

High-Level Architecture

At a glance, Mudda AI is split into two distinct orchestration layers:

LLM Orchestration → Reasoning, planning, interpretation
System Orchestration → Execution, retries, waiting, auditability

They serve different failure models, so I refused to collapse them into one abstraction.

LLM Orchestration vs System Orchestration

Dimension	LLM Orchestration (LangGraph-style)	System Orchestration (Temporal.io)
Primary role	Interpret intent, plan workflows	Execute workflows reliably
Typical duration	Milliseconds to seconds	Hours, days, or weeks
State ownership	Ephemeral / best-effort	Durable and persisted
Failure model	Retry by re-prompting	Automatic retries, resumable
Human-in-the-loop	Poor fit	First-class support
Auditability	Limited	Full execution history
Cost sensitivity	Token-bound	Infra-bound
When it breaks	Silent or partial	Observable and recoverable

Components: What They Are and Why They Live Under System Orchestration

Components are not tools in the LLM sense. I’ve added MCP-like components specifically for system orchestration. Each component behaves like a standalone API or microservice (e.g., sending emails, booking contractors, scheduling inspections, releasing budgets). These are not abstract capabilities; they are concrete system actions.

Is invoked like an API call
Executes as part of a Temporal workflow
Returns a structured response on completion
Leaves a linear, traceable execution path inside the system orchestration service
Contains sent and received requests’ content

Why they live under Temporal:

LLMs are not designed to wait for days, retry across deploys, track partial completion, or remain legally auditable. So components are executed exclusively under system orchestration (Temporal), where waiting and failure are normal states, not edge cases.

The Two-Agent Planning Engine

At the heart of the system is a deliberately boring but reliable two-agent pipeline running on Gemini 2.0 Flash.

Agent 1: Component Selector

The first agent does nothing except reduce blast radius. It filters a subset of system components based on the user's problem statement. This exists for context control—keeping prompts smaller, plans tighter, and debugging easier.

Agent 2: Workflow Plan Maker

The second agent plans within strict boundaries. It generates a Directed Acyclic Graph (DAG) represented as executable JSON, binding concrete API actions to each step and marking steps requiring human approval.

Why Planning Stops at JSON

This is where most agentic systems get sloppy. LLMs are excellent at interpreting intent and synthesizing plans, but terrible at holding state across days or exactly-once execution.

LLMs generate plans. Systems execute them.
No retries. No waiting. No callbacks inside the model loop.

Temporal.io: System Orchestration That Can Wait

Temporal handle properties that normal job queues don't: steps that take weeks, human-in-the-loop approvals without polling hacks, and partial failures that must resume. It provides durable execution where state survives crashes and deploys.

LLM Orchestration Lives Elsewhere

Separate flows (LangGraph-style) handle RAG calls for legal and policy grounding, prompt injection resistance, and reasoning chains that must stay model-native. These are short-lived, stateless, and cheap to retry.

Backend Structure (Today)

Directory breakdown

backend/
├── main.py
├── temporal_workflows.py
├── temporal_worker.py
├── services/
│   ├── ai_service.py
│   ├── workflow_service.py
│   └── component_service.py
├── models/
├── routers/
└── sessions/

What This Buys Me

Workflows survive deploys
Failures are replayable
AI decisions are inspectable
Humans stay in control
Cost stays predictable

What Comes Next

This is the foundation. Next iterations will harden plan validation, partial re-planning for failed branches, multi-tenant isolation, and LangSmith integration for evaluation and visibility.

“I’d rather ship a system that waits correctly than one that reasons beautifully and fails quietly.”