About a week and a half ago, I carefully read the Policy Distillation paper from DeepMind. The algorithm is easy to understand yet surprisingly effective. The basic idea is to have student and teacher agents (typically parameterized as neural networks) acting on an environment, such as the Atari 2600 games. The teacher is already skilled at the game, but the student isn’t, and need to learn somehow. Rather than run standard deep reinforcement learning, DeepMind showed that simply running supervised learning where the student trains its network to match a (tempered) softmax of the Q-values of the teacher is sufficient to learn how to play an Atari 2600 game. It’s surprising that this works; for one, Q-values are not even a probability distribution, so it’s not straightforward to conclude that a student trained to match the softmaxes would be able to learn a sequential decision-making task.
It was published in ICLR 2016, and one of the papers that cited this was Born Again Neural Networks (to appear in ICML 2018), a paper which I blogged about recently. The algorithms in these two papers are similar, and they apply in the reinforcement learning (PD) and supervised learning (BANN) domains.
After reading both papers, I developed the urge to understand all the Policy Distillation follow-up work. Thus, I turned to Google Scholar, one of the greatest research conveniences of modern times; as of this writing, the Policy Distillation paper has 68 citations. (Google Scholar sometimes has a delay in registering certain citations, and it also lists PhD theses and textbooks, so the previous sentence isn’t entirely accurate, but it’s close enough.)
I resolved to understand the main idea of every paper that cited Policy Distillation, especially with how relevant the paper is to the algorithm. I wanted to understand if papers directly extended the algorithm, or if they simply cited it as related work to try and boost up the citation count for DeepMind.
I have never done this before to a paper with more than 15 Google Scholar citations, so this was new to me. After spending a week and a half on this, I think I managed to get the gist of Policy Distillation’s “follow-up space.” You can see my notes in this shareable PDF which I’ve hosted on Dropbox. Feel free to send me recommendations about other papers I should read!