Definition
Model the agent flow as an explicit graph, state machine, or workflow — rather than letting the LLM decide every transition.
Category: Control structure
When to use
Production systems, resumable tasks, long-running work, agent platforms that need audit trails and permission boundaries.
When not to use
Lightweight demos, pure chat, or exploratory tasks where the flow is entirely unknown.
How to implement
- Model the flow as
nodes + edges + state. - The LLM can choose an edge but cannot bypass state-machine constraints.
- Every node should be replayable, resumable, and cancellable.
- Graph state stores messages, tool calls, task status, and permission state.
Minimal pseudocode
TypeScript
type Node = "plan" | "research" | "code" | "test" | "review" | "final";
type Edge = (state: State) => Node;
while (state.node !== "final") {
const output = await runNode(state.node, state);
state = checkpoint({ ...state, ...output });
state.node = route(state);
}
Recommended trace events
workflow.node.enterworkflow.node.exitworkflow.edge.selectedworkflow.checkpoint.created
Common failure modes
- The graph is too rigid and the agent loses flexibility.
- State fields are scattered, making resume hard.
- LLM routing conflicts with business rules.
Implementation checklist
- Input/output schemas defined.
- Each agent's permission boundary defined.
- Every agent call carries a run id / trace id.
- Failure, timeout, cancel, and retry strategies defined.
- Context passed is the minimum required, not the full history.
- High-risk actions are gated by approval or a verifier.