May 11, 2026 · BY Shrey Kothari
Simulating the Physical World
Language models had access to cheap, already-existing data on the internet. As larger models were trained on more text, general capabilities started to emerge.
Robotics has a much harder data problem.
Heraclitus wrote, “No man ever steps in the same river twice, for it’s not the same river and he’s not the same man.” This is especially true for robotics. We are trying to deploy machines into a world that changes every time they act in it.
A robot does not acquire physical intelligence by passively reading about the world. It has to act toward a goal, observe the consequences, recover from its mistakes, and adapt to situations that rarely repeat cleanly.
A useful robot trajectory captures what it perceived, what action it took, and how the world responded to it. This is expensive and slow to collect from the real world; it requires hardware, teleoperators, manual resets, safety checks, and a lot of time. Every unsuccessful attempt incurs a cost, and broken hardware is the least of it.
Simulation changes the shape of the problem.
If robotics is bottlenecked by the cost of physical experience, then 3D simulation is one of the most important ways to turn that into a software and compute problem. Simulated worlds do not need to be perfect to be useful. They need to make embodied experiences cheap enough to scale.
The current sim stack
Early simulators were engineering tools designed to reduce risk and speed up iteration time. Platforms like Webots, Gazebo, and later Google DeepMind’s MuJoCo helped researchers test robot policies without needing to run every experiment on expensive physical hardware.
Over time, simulation engines became more powerful, and the role of sims in the physical AI stack evolved. Today’s ecosystem spans several layers of the robotics workflow. MuJoCo remains a standard for fast physics and contact-rich control, Gazebo is useful for ROS-based engineering workflows, Isaac Sim and Isaac Lab are pushing simulation toward digital twins, GPU-parallel training, and industrial deployment, while tools like SAPIEN and ManiSkill focus on manipulation and high-throughput data collection.
The bottom line is that simulation has become an integral part of the data pipeline. Sims can generate many variations of a task, run policies in parallel, and benchmark a robot before that robot ever makes contact with the real world.
Why sims to begin with?
Sims are useful because they let robotics teams move parts of training, data generation, and evaluation out of the physical world. A robot can learn broad behaviors in simulation before being fine-tuned on hardware. A perception system can train on synthetic labels that would be expensive to collect manually. A robotics team can run repeatable test suites every time a model, planner, or software update changes.
We have seen this firsthand at Antim Labs, where we have deployed simulation environments into robotics CI/CD pipelines and watched teams use them to catch failures before they hit hardware. Without simulation, every robot update has to be tested slowly and expensively in the real world. With sims, teams can catch regressions before deployment.
Across all of these use cases, the limiting factor is environment scale. A team needs enough variation to train policies, enough diversity to generate useful perception data, and enough repeatable scenarios to catch regressions.
A robotics team does not just need one high-fidelity simulated warehouse or one carefully modeled apartment. They need hundreds or thousands of variations across the conditions that change how the robot perceives and acts.
That is where the next bottleneck appears: creating worlds worth simulating.
The missing layer
The simulation stack is getting extremely powerful. Physics is getting faster, rendering is improving, and RL frameworks can run thousands of parallel environments. We can generate logs, evaluate regressions, and train end-to-end policies before ever touching hardware.
If the engines are becoming good enough, the next question is: where do all the worlds come from?
The bottleneck has moved upward.
For LLMs, internet-scale data already existed. Nobody had to manually author billions of documents for pretraining. For robotics, readily-usable data does not exist.
The real world exists, obviously. The simulator-ready version of the real world does not. A simulator needs structured geometry, physical properties, materials, affordances, articulations, semantic labels, collision shapes, object hierarchies, and task definitions. If you want a robot to learn in a warehouse, the simulator needs to understand the warehouse as more than pixels and meshes.
A text-to-3D model that makes a pretty chair is useful for content. Robotics needs a chair with the correct dimensions, collision geometry, physical presence, and relationship to the rest of the scene. A dishwasher needs doors, racks, interior volume, articulation constraints, and reachable surfaces. A cabinet needs drawers with handles, hinges, and storage space.
Enter Gizmo, our world creation layer for robotics simulation. It automatically creates SimReady worlds with dimensioned primitives, articulation, and collision data that engines like Isaac Sim and MuJoCo can simulate.
Until now, teams could spend weeks creating one useful robotics scene. Gizmo compresses that into minutes. This matters because the robotics data problem will not be solved by manually building each environment. It requires millions of taskable environments across homes, warehouses, factories, and the long tail of edge cases inside them.
Sim-to-real is a technical frontier
A trained policy can look strong in simulation and still fail on hardware because contact behaves differently, sensors are noisier, or the model encounters edge cases in the real world that it never saw during training. This is the sim-to-real gap, and it is one of the main reasons robotics data has historically been hard to scale through software alone.
It is a mistake to treat this gap as a permanent wall. The more useful framing is that sim-to-real is a tractable technical frontier, especially in domains where simulation captures the structure of the task well enough for learned behavior to transfer.
Frontier robotics and autonomy companies are already organizing their training and evaluation loops around this idea. In autonomous driving, Waymo has been using their World Model as a controllable generative simulation for generating rare events and stress-testing their systems before deploying it on public roads.1 The same pattern is showing up in embodied robotics. Figure’s Helix 02 S0 controller was trained entirely in simulation across more than 200,000 parallel environments with extensive domain randomization, enabling direct transfer to real robots and generalization across the fleet.2
On the manipulation side, Genesis AI’s recent GENE-26.5 launch emphasizes closed-loop simulation evaluations across large task variations. In one reported setup, each evaluation point represents 200 simulated setups and more than 150 hours of robot execution time; running the full evaluation in the real world would require 2700 human-robot hours.3 Eka Robotics’ launch coverage points in the same direction for dexterity, describing self-directed RL in high-fidelity simulation with realistic joints, motors, and physics.4
The lesson is that sim-to-real becomes more tractable when the simulator preserves the structure of the problem. A robot trained in one clean factory learns that factory. A robot trained across many task-relevant versions of that factory has a better chance of learning the underlying behavior: how to move through the space, react to uncertainty, avoid obstacles, and continue making progress when the environment differs from what it expected.
This does not mean simulation is solved. Dexterous manipulation is still difficult because tiny contact details matter. Deformable materials, fluids, and soft objects are hard to model. Human environments are messier than controlled industrial settings. Real-world data remains the anchor; simulation expands that data into many more training and evaluation scenarios.
The next phase of sim-to-real will be driven less by humans manually guessing what to randomize and more by real deployments feeding back into simulation. When a robot fails in the physical world, that failure point should become the seed for new simulation scenarios. The system should recreate the conditions around the failure, generate useful variants, test whether an updated policy improves, and preserve those scenarios as part of the regression suite.
Automated simulation authoring is key here: the feedback loop stays slow if every failure case has to be rebuilt in sim by hand. This is where a system like Gizmo becomes important. It helps turn real-world failures into sims that can be replayed, varied, and added to the evaluation loop. Over time, the simulator becomes a distribution shaped by the real world rather than a static approximation of it.
The endgame
The real promise of 3D simulation is scalable physical experience, which requires scalable scene creation. The internet gave language models a way to absorb human language at scale. Robots need a way to absorb physical interaction at scale.
Gizmo is our attempt to build that world creation layer, so robotics teams can iterate at software speed. With Gizmo, we want robotics teams to go from an image, floorplan, failure case, or task description to a sim with structure and articulation.
Once robots can learn from millions of worlds instead of thousands of manually collected trajectories, robotics starts to look much less like a hardware problem and much more like a data and compute problem.
