The video for the paper "Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks."

The Fall 2020 semester was an especially busy one, since I was involved in multiple paper submissions with my outstanding collaborators. Five preprints are now available, and this post summarizes each of these, along with some of the backstories behind the papers. In all cases, arXiv should have the most recent, up to date version of the paper.

The bulk of this work was actually done in Spring 2020, but we’ve made some significant improvements in the latest version on arXiv by expanding the experiments and improving the writing. The main idea in this paper is to use dense object descriptors (see my blog post here) in simulation to get correspondences between two different images of the same object, which in our case would be fabrics. If we see two images of the same fabric, but where the fabric’s appearance may be different in the two images (e.g., having a fold versus no fold), we would like to know which pixels in image 1 correspond to pixels in image 2, in the sense that the correspondence will give us the same part of the fabric. We can use the learned correspondences to design robot policies that smooth and fold real fabric, and we can even do this in real environments with the aid of domain randomization.

I was originally hoping to include this paper in our May 2020 BAIR Blog post on fabric manipulation, but the blog authors and I decided against this, since this paper doesn’t neatly fit into the “model-free” vs “model-based” categorization.

This paper proposes Intermittent Visual Servoing (IVS), a framework which uses a coarse controller in free space, but employs imitation learning to learn precise actions in regions that have the highest accuracy requirements. Intuitively, many tasks are characterized by some “bottleneck points”, such as tightening a screw, and we’d like to specialize the learning portion for those areas.

To benchmark IVS, we test on a surgical robot, and train it to autonomously perform surgical peg transfer. For some context: peg transfer is a task commonly used as part of a curriculum to train human surgeons for robot surgery. Robots are commonly used in surgery today, but in all cases, these involve a human manipulating tools, which then cause the surgical robot to move in known directions. This process is specifically referred to as “teleoperation.”

For our automated surgical robot on peg transfer, we show high success rates, and transferability of the learned model across multiple surgical arms. The latter is a known challenge as different surgical arm tools have different mechanical properties, so it was not clear to us if off-the-shelf IVS could work, but it did!

This paper is an extension of our ISMR 2020 and IEEE RA-Letters 2020 papers, which also experiment with surgical peg transfer. It therefore relates to the prior paper on Intermittent Visual Servoing, though I would not call it an extension of that paper, since we don’t actually apply IVS here, nor do we test transferability across different surgical robot arms.

In this work, we use depth sensing, recurrent neural networks, and a new trajectory optimizer (thanks to Jeff Ichnowski) to get an automated surgical robot to outperform a human surgical resident on the peg transfer task. In this and our ISMR 2020 paper, Danyal Fer acted as the human surgical resident. For our ISMR 2020 paper, we couldn’t get the surgical robot to be as good as him on peg transfer, prompting this frequent internal comment among us: Danyal, how are you so good??

Well, with the combination of these new techniques, plus terrific engineering work from postdoc Minho Hwang, we finally obtained accuracy and timing results at or better than those Danyal Fer obtained. I am looking forward to seeing how far we can push ahead in surgical robotics in 2021.

This shows a cool application of using a UR5 arm to perform high speed dynamic rope manipulation tasks. Check out the video of the paper (on the project website), which comes complete with some Indiana Jones style robot whipping demonstrations. We also name the proposed learning algorithm in the paper using the INDY acronym, for obvious reasons.

The first question I would have when thinking about robots whipping rope is: how do we define an action? We decided on a simple yet flexible enough approach that worked for whipping, vaulting, and weaving tasks: a parabolic action motion coupled with a prediction of the single apex point of this motion. The main inspiration for this came from the “TossingBot” paper from Andy Zeng, which used a similar idea for parameterizing a tossing action. That brings us to the fifth and final paper featured in this blog post …

Here, we finally have the one paper where I’m the first author, and the one for which I expended the bulk of my research efforts. (You can imagine what my work days were like last fall, with me working on this paper in the mornings and afternoons, followed by the other papers above in the evenings.) This paper came out of my Summer 2020 virtual internship with Google Robotics, where I was hosted by the great Andy Zeng. Before the internship began, Andy and I knew we wanted to work on deformable object manipulation, and we thought it would be nice to show a robot manipulating bags, since that would be novel. But we weren’t sure what method to use to train the robot.

Fortunately, at that time, Andy was hard at work on something called Transporter Networks. It ended up as one of the top papers presented at CoRL 2020. Andy and I hypothesized that Transporter Networks could work well on a wide range of deformable object manipulation tasks. So, I designed over a dozen simulated environments using PyBullet that included the full suite of 1D, 2D, and 3D deformables. We were actually thinking of using Blender before the internship, but at some point I realized that Blender would not be suitable. Pivoting to PyBullet, though painful initially, proved to be one of the best decisions we made.

While working on the project, Andy and I wanted to increase the flexibility of Transporter Networks to different task specifications. That’s where the “goal-conditioned” version came from. There are multiple ways of specifying goals; here, we decided to specify an image of the desired rearrangement configuration.

Once we had the architectures and the simulated tasks set up, it was a matter of finding the necessary compute to run the experiments, and iterating upon the design and tasks.

I am very pleased with how the paper came out to be, and I also hope to release a more detailed blog post about this paper, both here and on the BAIR and Google AI blogs. I also really enjoyed working with this team; I have not met any of the Google-affiliated authors in person, so I look forward to the day when the pandemic subsides.

I hope you find these papers interesting! If you have questions or would like to discuss topics in this papers further, feel free to reach out.