Definition
Multi-Agent Reinforcement Learning trains multiple agents to collaborate or compete in an environment through rewards. CTDE means Centralized Training, Decentralized Execution.
Category: Learning
When to use
Robotics, games, traffic, scheduling, control systems, research on multi-agent collaborative decision making.
When not to use
Day-to-day LLM coding agent orchestration. If you cannot define a reward and a simulator, don't use MARL.
How to implement
- Define environment, observation space, action space, rewards, and episodes.
- Pick a training paradigm: CTE, CTDE, or DTE.
- Training can use global state; execution only uses local observations.
- For LLM agent platforms, MARL is better suited to policy research than as the main runtime.
Minimal pseudocode
TypeScript
for (const episode of episodes) {
let obs = env.reset();
while (!env.done()) {
const actions = agents.map((a, i) => a.policy(obs[i]));
const { nextObs, rewards } = env.step(actions);
trainer.update({ obs, actions, rewards, nextObs });
obs = nextObs;
}
}
Recommended trace events
marl.episode.startedmarl.step.completedmarl.reward.receivedmarl.policy.updated
Common failure modes
- The reward function is misspecified.
- The training environment doesn't match production.
- Conflating MARL with general LLM orchestration.
Implementation checklist
- Trigger and exit conditions defined.
- Input/output schemas defined.
- Permission, budget, timeout, and retry policies defined.
- Trace events defined.
- Degradation or human-takeover strategies defined.