CabiNet: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation

Robotic Rearrangement in the Real World

Our CabiNet model demonstrates Sim2Real transfer, without training on any real data!

Abstract

We address the important problem of generalizing robotic rearrangement to clutter without any explicit object models. We first generate over 650K cluttered scenes— orders of magnitude more than prior work—in diverse everyday environments, such as cabinets and shelves. We render synthetic partial point clouds from this data and use it to train our CabiNet model architecture. CabiNet is a collision model that accepts object and scene point clouds, captured from a single-view depth observation, and predicts collisions for SE(3) object poses in the scene. Our representation has a fast inference speed of 7μs/query with nearly 20% higher performance than baseline approaches in challenging environments. We use this collision model in conjunction with a Model Predictive Path Integral (MPPI) planner to generate collision-free trajectories for picking and placing in clutter. CabiNet also predicts waypoints, computed from the scene’s signed distance field (SDF), that allows the robot to navigate tight spaces during rearrangement. This improves rearrangement performance by nearly 35% compared to baselines. We systematically evaluate our approach, procedurally generate simulated experiments, and demonstrate that our approach directly transfers to the real world, despite training exclusively in simulation. Robot experiments in completely unknown scenes and objects are shown in the supplementary video.

Procedural Synthetic Data

We train our CabiNet network with only synthetic data. We generate over 650K cluttered scenes, which is six orders of magnitude more scene data than prior work on learning rearrangement in simulation. We sample 2.5 million synthetic point clouds.

synthetic dataset of shelves, cubbies, drawers, tabletop, etc.

Scene Collision Data

We scale up neural collision checking by 30X [2], with over 60 billion scene-object collision queries. Red means the object is colliding with the scene, Green for collision free.

Supplementary Video

BibTeX

@InProceedings{murali2023cabinet,                                            
      Title                    = {{CabiNet}: Scaling Neural Collision Detection for Object Rearrangement with Procedural Scene Generation},
      Author                   = {Murali, Adithyavairavan and Mousavian, Arsalan and Eppner, Clemens and Fishman, Adam and Fox, Dieter},
      Booktitle                = {Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)},
      Url                      = {https://arxiv.org/abs/2304.09302},
      Year                     = {2023},
      Month                    = {May}
     }