Safety audits for multi-agent systems: a practitioner's checklist

After three decades of multi-agent research at LIACC, one hard lesson stands out: multi-agent failure modes are rarely what the single-agent safety literature predicts.

Emergent failures we audit for

Coordination collapse. Agents keep exchanging messages but make no decision. Livelock.
Convention capture. A norm emerges that optimises for inter-agent compatibility at the expense of the original objective.
Information cascades. Agents copy a wrong belief that propagates faster than any one agent can self-correct.
Adversarial takeover. A single compromised or misconfigured agent derails the collective.

The audit

Can we replay any episode step-by-step with deterministic seeds?
Does the system have a kill switch at the coordinator level, not per-agent?
Is there an observable consensus metric that plateaus or diverges?
Can we inject an adversarial agent and measure the degradation?
Have we tested with out-of-distribution demand / load / topology?
Is there a drift monitor on inter-agent message distributions?
Do we log counterfactual decisions for a 5% sample?
Does each agent expose a typed contract, not a free-form interface?

Why this matters commercially

The AI Act's high-risk tier will swallow most deployed multi-agent systems over the next three years. Auditability is not optional; the audit we run in-house is the one we'd like a regulator to run on us.

Safety audits for multi-agent systems: a practitioner's checklist

Emergent failures we audit for

The audit

Why this matters commercially

Read next

Agentic AI: where the hype breaks and the engineering begins

The EU AI Act at enforcement: a lab's field guide

A day inside the Porto Ring Road digital twin