Sim-to-real at the edge: the robot-learning bottleneck no one talks about

We can train a manipulation policy for four billion simulated steps. On the real arm we get four million. That ratio defines the engineering problem.

The three real-world budgets

Data budget. The robot runs at 10–50Hz, with safety limits and wear. Collecting a million real transitions is a week of supervised operation.
Latency budget. Our policy must decide in under 15ms at control rate. That eliminates most large transformer stacks.
Safety budget. Some exploration actions damage hardware. We can't afford the freedom the simulator offers.

What's actually working in 2026

Distilled small policies. Train a big transformer in sim, distil to a 10M-parameter actor that fits in real-time.
Contact-aware domain randomisation. Randomising friction, stiffness, and contact latency in sim closes more of the gap than randomising textures.
Residual policies. Keep a hand-tuned controller as backbone; learn the residual on the real robot.
Imitation from teleoperation. A few thousand high-quality human demonstrations remain one of the best grounding sources.

The one hot take

The frontier of embodied AI is not bigger models. It's better data pipelines from real robots to simulators and back, with latency budgets that ML researchers still underestimate.

Sim-to-real at the edge: the robot-learning bottleneck no one talks about

The three real-world budgets

What's actually working in 2026

The one hot take

Read next

Small language models are eating the enterprise

Inside a RoboCup week — a photo diary

Agentic AI: where the hype breaks and the engineering begins