Yikes! It has been a while since being active on this blog. The reason for my posting delay is, as usual, research deadlines. As I comment here, I still have a blogging addiction, but I force myself to prioritize research when appropriate. In order to keep my monthly blogging streak alive, here are three relevant updates. First, I recently wrapped up and made public two research projects. Second, I have, hopefully, a rough agenda for what I aim to accomplish this year. Third, there are several new BAIR Blog posts that we should read.

The two research projects are:

The bed-making paper will be at ISRR 2019, October 6 to 10. In other words, it is happening very soon! It will be in Hanoi, Vietnam, which is exciting as I have never been there. The only Asian country I have visited before is Japan.

We recently submitted the other project, on fabric smoothing, to arXiv. Unfortunately, we got hit with the dreaded “on hold” flag, so it may be a few more days before it gets officially released. (This sometimes happens for arXiv submissions, and we are not told the reason for why.)

I spent much of 2018 and early 2019 on the bed-making project, and then the first nine months of 2019 on fabric smoothing. These projects took an enormous amount of my time, and I learned several lessons, two of which are:

• Having good experimental code practices is a must. The stuff in my linked blog post has helped me constantly throughout my research, which is why I have it on record here for future reference. I’m amazed that I rarely employed them (except perhaps version control) before coming to Berkeley.

• Don’t start with deep reinforcement learning if imitation learning has not been tried. In the second project on fabric smoothing, I sunk about three months of research time attempting to get deep reinforcement learning to work. Then, with lackluster results, I switched to using DAgger, and voila, that turned out to be good enough for the project!

You can find details on DAgger from the official AISTATS 2011 paper, though much of the paper is for theoretical analysis on bounding regret. The actual algorithm is dead simple. Using the notation from the Berkeley DeepRL course, we can define DAgger as a four step cycle that gets repeated until convergence:

• Train $\pi_\theta(\mathbf{a}_t \mid \mathbf{s}_t)$ from demonstrator data $\mathcal{D} = \{\mathbf{o}_1, \mathbf{a}_1, \ldots, \mathbf{o}_N, \mathbf{a}_N\}$.
• Run $\pi_\theta(\mathbf{a}_t \mid \mathbf{s}_t)$ to get an on-policy dataset $\mathcal{D}_\pi = \{\mathbf{o}_1, \ldots, \mathbf{o}_M\}$.
• Ask a demonstrator to label $\mathcal{D}_\pi$ with actions $\mathbf{a}_t$.
• Aggregate $\mathcal{D} \leftarrow \mathcal{D} \cup \mathcal{D}_{\pi}$ and train again.

The DeepRL class uses a human as the demonstrator, but we use a simulated one, and hence we nicely avoid the main drawback of DAgger.

That’s it! DAgger is far easier to use and debug compared to reinforcement learning. As a general rule of thumb, imitation learning is easier than reinforcement learning, though it does require a demonstrator.

For the 2019-2020 academic year, I have many research goals, most of which build upon the prior two works or my other ongoing (not yet published) projects. I hope to at least know more about the following:

• Simulator Quality and Structured Domain Randomization. I think simulation-to-real transfer is one of the most exciting topics in robotics. There are two “sub-topics” within this that I want to investigate. First, given the inevitable mismatch between simulator quality and the real world, how do we properly choose the “right” simulator for sim-to-real? During the fabric smoothing project, one person suggested I use ARCSim instead of our in-house simulator. We tried ARCSim briefly, but it was too difficult to implement grasping. If we use lower quality simulators, then I also want to know if there are ways to improve the simulator in a data-driven way.

The second sub-topic I want to know more about is the kind of specific, or “structured”, domain randomization that should be applied for tasks. In the fabric smoothing project, I randomized camera pose, colors, and brightness, but this was done in an entirely heuristic manner. I wonder if there are principled ways to decide on what randomization to use given a computational budget. If we had enough computational power, then of course, we can just try everything.

• Combining Imitation Learning (IL) and Reinforcement Learning (RL). From prior blog posts, it is hopefully clear that I enjoy combining these two fields. I want to better understand how to optimize this combination of IL and RL to accelerate training of new agents and to reduce exploration requirements. For applications of these algorithms, I have gravitated towards fabric manipulation. It fits both of the two research projects described earlier, and it may be my niche.

For 2019-2020, I also aim to be more actively involved in advising undergraduate research. This is a new experience for me; thus far, my interaction with undergraduate researchers has been with the fabric smoothing paper where they helped me implement chunks of our code base. But now, there are so many ideas I want to try with simulators, IL, and RL, and I do not have time to do everything. It makes more sense to have undergraduates take on a lead role for some of the projects.

Finally, there wasn’t much of a post-project deadline reprieve because I needed to release a few BAIR Blog posts, which requires considerable administration. We have had several posts released in a close span over the last two weeks. The posts were ready for a long time (minus the formatting needed to get it on the actual website) but I was consumed with working on the projects, to the tune of working 14-15 hours a day, that I had to ask blog post authors to postpone. My apologies!

Here are some recent posts that are worth reading:

• A Deep Learning Approach to Data Compression by Friso Kingma. I don’t know much about the technical details, unfortunately, but data compression is an important application.

• rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch by Adam Stooke. I am really interested in trying this new code base. By default, I use OpenAI baselines for reinforcement learning. While I have high praise for the project overall, baselines has disappointed me several times. You can see my obscenely detailed issue reports here and here to see why. The new code base, rlpyt, (a) uses the more debugging-friendly PyTorch, (b) also has parallel environment support, (c) supports more algorithms than baselines, and (d) may be more optimized in terms of speed (though I will need to benchmark).

• Sample Efficient Evolutionary Algorithm for Analog Circuit Design by Kourosh Hakhamaneshi. Circuit design is unfortunately not in my area, but it is amazing to see how Deep Learning and evolutionary algorithms can be used in many fields. If there are any remaining low-hanging fruits in Deep Learning research, it is probably in applications to areas that are, on the surface, far removed from machine learning.

As a sneak preview, there are at least two more BAIR blog posts that we will be releasing next week.

Hopefully this year will be a fruitful one for research and advising. Meanwhile, if you are attending ISRR 2019 soon and want to chat, please contact me.