After three decades of multi-agent research at LIACC, one hard lesson stands out: multi-agent failure modes are rarely what the single-agent safety literature predicts.

Emergent failures we audit for

  • Coordination collapse. Agents keep exchanging messages but make no decision. Livelock.
  • Convention capture. A norm emerges that optimises for inter-agent compatibility at the expense of the original objective.
  • Information cascades. Agents copy a wrong belief that propagates faster than any one agent can self-correct.
  • Adversarial takeover. A single compromised or misconfigured agent derails the collective.

The audit

  1. Can we replay any episode step-by-step with deterministic seeds?
  2. Does the system have a kill switch at the coordinator level, not per-agent?
  3. Is there an observable consensus metric that plateaus or diverges?
  4. Can we inject an adversarial agent and measure the degradation?
  5. Have we tested with out-of-distribution demand / load / topology?
  6. Is there a drift monitor on inter-agent message distributions?
  7. Do we log counterfactual decisions for a 5% sample?
  8. Does each agent expose a typed contract, not a free-form interface?

Why this matters commercially

The AI Act's high-risk tier will swallow most deployed multi-agent systems over the next three years. Auditability is not optional; the audit we run in-house is the one we'd like a regulator to run on us.