We can train a manipulation policy for four billion simulated steps. On the real arm we get four million. That ratio defines the engineering problem.

The three real-world budgets

  • Data budget. The robot runs at 10–50Hz, with safety limits and wear. Collecting a million real transitions is a week of supervised operation.
  • Latency budget. Our policy must decide in under 15ms at control rate. That eliminates most large transformer stacks.
  • Safety budget. Some exploration actions damage hardware. We can't afford the freedom the simulator offers.

What's actually working in 2026

  1. Distilled small policies. Train a big transformer in sim, distil to a 10M-parameter actor that fits in real-time.
  2. Contact-aware domain randomisation. Randomising friction, stiffness, and contact latency in sim closes more of the gap than randomising textures.
  3. Residual policies. Keep a hand-tuned controller as backbone; learn the residual on the real robot.
  4. Imitation from teleoperation. A few thousand high-quality human demonstrations remain one of the best grounding sources.

The one hot take

The frontier of embodied AI is not bigger models. It's better data pipelines from real robots to simulators and back, with latency budgets that ML researchers still underestimate.