After three decades of multi-agent research at LIACC, one hard lesson stands out: multi-agent failure modes are rarely what the single-agent safety literature predicts.
Emergent failures we audit for
- Coordination collapse. Agents keep exchanging messages but make no decision. Livelock.
- Convention capture. A norm emerges that optimises for inter-agent compatibility at the expense of the original objective.
- Information cascades. Agents copy a wrong belief that propagates faster than any one agent can self-correct.
- Adversarial takeover. A single compromised or misconfigured agent derails the collective.
The audit
- Can we replay any episode step-by-step with deterministic seeds?
- Does the system have a kill switch at the coordinator level, not per-agent?
- Is there an observable consensus metric that plateaus or diverges?
- Can we inject an adversarial agent and measure the degradation?
- Have we tested with out-of-distribution demand / load / topology?
- Is there a drift monitor on inter-agent message distributions?
- Do we log counterfactual decisions for a 5% sample?
- Does each agent expose a typed contract, not a free-form interface?
Why this matters commercially
The AI Act's high-risk tier will swallow most deployed multi-agent systems over the next three years. Auditability is not optional; the audit we run in-house is the one we'd like a regulator to run on us.