# My PhD Qualifying Exam (Transcript)

To start off my 2020 blogging, here is the much-delayed transcript of my PhD qualifying exam. The qualifying exam is a Berkeley-wide requirement for PhD students, and varies according to the department. You can find EECS-specific details of the exam here, but to summarize, the qualifying exam (or “quals” for short) consists of a 50-60 minute talk to four faculty members who serve on a “quals committee.” They must approve of a student’s quals talk to enable the student to progress to “candidacy.” That’s the point when, contingent on completion of academic requirements, the student can graduate with approval from the PhD advisor. The quals is the second major oral exam milestone in the Berkeley EECS PhD program, the first of which is the prelims. You can find the transcript of my prelims here.

The professors on my qualifying exam committee were John Canny, Ken Goldberg, Sergey Levine, and Masayoshi Tomizuka.

I wrote this transcript right after I took this exam in April of 2018. Nonetheless, I cannot, of course, guarantee the exact accuracy of the words uttered.

## Scheduling and Preparation

During a meeting with Professor Canny in late 2017, when we were discussing my research progress the past semester, I brought up the topic of the qualifying exam. Professor Canny quickly said: “this needs to happen soon.” I resolved to him that it would happen by the end of the spring 2018 semester.

Then, I talked with Professor Goldberg. While seated by our surgical robot, and soon after our ICRA 2018 paper was accepted, I brought up the topic of the quals, and inquired if he would be on my committee. “It would be weird if I wasn’t on the committee” he smiled, giving approval.1 “Will it be on this stuff?” he asked, as he pointed at the surgical robot. I said no, since I was hoping for my talk to be a bit broader than that, but as it turned out, I would spend about 30 percent of my talk on surgical robotics.

Next, I needed to find two more professors to serve on the quals committee. I decided to ask Professor Sergey Levine if he would serve as a member of the committee.

Since Berkeley faculty can be overwhelmed with email, I was advised from other students to meet professors in office hours to ask about quals. I gambled and emailed Professor Levine instead. I introduced myself with a few sentences, and described the sketch of my quals talk to him, and then politely asked if he would serve on the committee.

I got an extremely quick response from Professor Levine, who said he already knew who I was, and that he would be happy to be on the committee. He additionally said it was the “least he could do” because I am the main curator for the BAIR blog, and he was the one who originally wanted the BAIR Blog up and running.

A ha! There’s a lesson here: if you want external faculty to serve on a committee, make sure you help curate a blog they like.

Now came the really hard part: the fourth committee member. To make matters worse, there is (in my opinion) an unnecessary rule that states that one has to have a committee member outside of EECS. At the time of my exam, I barely knew any non-EECS professors with the expertise to comment on my research area.

I scrolled through a list of faculty, and decided to try asking Professor Masayoshi Tomizuka from the Mechanical Engineering department. In part, I chose him because I wanted to emphasize that I was moving in a robotics direction for my PhD thesis work. Before most of my current robotics research, I did a little theoretical machine learning research, which culminated in a UAI 2017 paper. It also helped that his lab is located next to Professor Goldberg’s lab, so I sometimes got a peek at what his students were doing.

I knew there was a zero percent chance that Professor Tomizuka would respond to a cold email, so I went hunting for his office hours.2 Unfortunately, the Mechanical Engineering website had outdated office hours from an earlier semester. In addition, his office door also had outdated office hours.

After several failed attempts at reaching him, I emailed one of his students, who provided me a list of times. I showed up at the first listed time, and saw his office door closed for the duration of the office hours.

This would be more difficult than I thought.

Several days later, I finally managed to see Professor Tomizuka while he was walking to his office with a cup of coffee. He politely allowed me to enter his office, which was overflowing with books and stacks of papers. I don’t know how it’s possible to sift through all of that material. In contrast, when I was at Professor Levine’s office, I saw almost nothing but empty shelves.

Professor Tomizuka, at the time, was a professor at Berkeley for 44 years (!!!) and was still supervising a long list of PhD students. I explained to him about my qualifying exam plan. He asked a few questions, including “what questions do you want me to ask in your exam?” to which I responded that I was hoping he would ask about robot kinematics. Eventually, he agreed to serve on the committee and wrote my name on a post-it note for him to remember.

Success!

Well, not really — I had to schedule the exam, and that’s challenging with busy professors. After several failed attempts at throwing out times, I asked if the professors could provide a full list of their constraints. Surprisingly, both Professor Levine and Professor Tomizuka were able to state their constraints on each day of the week! I’m guessing they had that somewhere on file so that they could copy and paste it easily. From there, it was straightforward to do a few more emails to schedule the exam, which I formally booked about two months in advance.

Success!

All things considered, I think my quals exam scheduling was on the easier side compared to most students. The majority of PhD students probably also have difficulty finding their fourth (or even third) committee members. For example, I know one PhD student who had some extreme difficulty scheduling the quals talk. For further discussion and thoughts, see the end of this post.

I then needed to do my preparation for the exam. I wrote up a set of slides for a talk draft, and pitched them to Professor Canny. After some harsh criticism, I read more papers, did more brainstorming, and re-did my slides, to his approval. Professor Goldberg also generally approved of my slides. I emailed Professor Levine about the general plan, and he was fine with a “40-50 minute talk on prior research and what I want to do.” I emailed Professor Tomizuka but he didn’t respond to my emails, except to one of them a week before to confirm that he would show up to the talk.

I gave two full-length practice talks in lab meetings, one to Professor Goldberg’s lab, and then to Professor Canny’s lab. The first one was hideous, and the second was less hideous. In all, I went through twelve full-length talks talks to get the average below 50 minutes, which I was told is the general upper bound for which students should aim.

Then, at long last, Judgment Day came.

## The Beginning

Qualifying exam date: Tuesday April 24, 2018 at 3:00pm.

Obviously, I showed up way in advance to inspect the room that I had booked for the quals. I checked that my laptop and adapters worked with the slide system set in the room. I tucked in my dress shirt, combed my hair, cleaned my glasses for the tenth time, and stared at a wall.

Eventually, two people showed up: the sign language interpreters. One was familiar to me, since she had done many of my interpreting services in the past. The other was brand new to me. This was somewhat undesirable. Given the technical nature of the topic, I explicitly asked Berkeley’s Disabled Students’ Program to book only interpreters that had worked with me in the past. I provided a list of names more than two weeks in advance of the exam, but it was hard for them to find a second person. It seems like, just as with my prelims, it is difficult to properly schedule sign language interpreting services.

Professor Levine was the first faculty member to show up in the qualifying exam room. He carried with him a folder of my academic materials, because I had designated him as the “chair” of the quals committee (which cannot be one’s advisor). He said hello to me, took a seat, and opened my folder. I was not brave enough to peek into the files about me, and spent the time mentally rehearsing my talk.

Professor Tomizuka was the next to show up. He did not bring any supplies with him. At nearly the same time, Professor Canny showed up, with some food and drink. The three professors quickly introduced each other and shook their hands. All the professors definitely know each other, but I am not sure how well. There might be a generational gap. Professor Levine (at the time) was in his second year as a Berkeley faculty member, while Professor Tomizuka was in his 44th year. They quickly got settled in their seats.

At about 3:03pm, Professor Levine broke the painfully awkward silence: “are we on Berkeley time?”3

Professor Canny [chuckling]: “I don’t think we run those for the qualifying exam …”

Professor Levine [smiling]: “well, if any one professor is on Berkeley time then all the others have to be…”

While I pondered how professors who had served on so many qualifying exam committees in the past had not agreed on a settled rule for “Berkeley-time,” Professor Goldberg marched into the room wearing his trademark suit and tie. (He was the only one wearing a tie.)

“Hey everyone!” he smiled. Now we could start.

Professor Levine: “Well, as the chair of the committee, let’s get started. We’re going to need to talk among ourselves for a bit, so we’ll ask Daniel to step out of the room for a bit while we discuss.”

Gulp. I was already getting paranoid.

The sign language interpreters asked whether they should go out.

Professor Goldberg agreed: “Yeah, you two should probably leave as well.”

As I walked out the room, Professor Goldberg tried to mitigate my concerns. “Don’t worry, this is standard procedure. Be ready in five minutes.”

I was certainly feeling worried. I stood outside, wondering what the professors were plotting. Were they discussing how they would devour me during the talk? Would one of them lead the charge, or would they each take turns doing so?

I stared at a wall while the two sign language interpreters struck up a conversation, and commented in awe about how “Professor Goldberg looks like the typical energetic Berkeley professor.” I wasn’t interested in their conversation and politely declined to join since, well, I had the qualifying exam now!!

Finally, after what seemed like ten minutes — it definitely was not five — Professor Goldberg opened the door and welcomed us back in.

It was time.

## During The Talk

The professors nodded and stared at me. Professor Goldberg was smiling, and sat the closest to me, with notebook and pen in hand.

My talk was structured as follows:

• Part I: introduction and thesis proposal
• Part II: my prior work
• Part III: review of relevant robot learning research
• Part IV: potential future projects

I gave a quick overview of the above outline in a slide, trying to speak clearly. Knowing the serious nature of the talk, I had cut down on my normal humor during my talk preparation. The qualifying exam talk was not the time to gamble on humor, especially since I was not sure how Professor Tomizuka or Professor Levine would react to my jokes.

Things were going smoothly, until I came to my slide about “robot-to-robot teaching.” I was talking in the context of how to “transfer” one robot policy to another robot, a topic that I had previously brainstormed about with both Professor Goldberg and Professor Canny.

Professor Goldberg asked the first question during the talk. “When you say robot-to-robot teaching, why can’t we just copy a program from one robot to another?” he asked.

Fortunately this was a question I had explicitly prepared myself for during my practice talks.4

“Because that’s not teaching, that’s copying a program from one to another, and I’m interested in knowing what happens when we teach. If you think of how humans teach, we can’t just copy our brains and embed them into a student, nor do we write an explicit program of how we think (that would be impossible) and tell the student to follow it. We have to convey the knowledge in a different manner somehow, indirectly.”

Professor Goldberg seemed to be satisfied, so I moved on. Whew, crisis averted.

I moved on, and discussed our surgical robotics work from the ICRA 2018 paper. After rehashing some prior work in calibrating surgical robots, and just as I was about to discuss the details on our procedure, Professor Tomizuka raised his hand. “Wait can you explain why you have cheaper sensors than the prior work?”

I returned to the previous slide. “Prior work used these sophisticated sensors on the gripper which allows for better estimates of position and orientation” I said, pointing at an image which I was now thankful to have included. I provided him with more details on the differences between prior work and our work.

Professor Tomizuka seemed about half-satisfied, but motioned for me to continue with the talk.

I went through the rest of my talk, feeling at ease and making heavy eye contact with the professors, who were equally attentive.

No further interruptions happened.

When I finished the talk, which was right about 50 minutes, I had my customary concluding slide of pictures of my collaborators. “I thank all my collaborators,” I said. I then specifically pointed to the two on the lower right: pictures of Professor Canny and Professor Goldberg. “Especially the two to the lower right, thank you for being very patient with me.” In retrospect, I wish I had made my pictures of them bigger.

“And that’s it,” I said.

The professors nodded. Professor Goldberg seemed like he was trying to applaud, then stopped mid-action. No one else moved.

## Immediately After The Talk

Professor Levine said it was time for additional questions. He started by asking: “I see you’ve talked about two kinds of interactive learning, one with an adversary, one with a teacher. I can see those going two different directions, do you plan to try and do both and then converge later?”

I was a little confused by this question, which seemed open-ended. I responded: “yes there are indeed two ways of thinking of interactive teaching, and I hope to pursue both.” Thinking again at my efforts at implementing code, I said “from my experience, say with Generative Adversarial Networks as an example, it can be somewhat tricky to get adversarial learning to work well, so perhaps to start I will focus on a cooperative teacher, but I do hope to try out both lines of thinking.”

I asked if Professor Levine was satisfied, since I was worried I didn’t answer well enough, and I assumed he was going to ask something more technical. In addition, GANs are fairly easy to implement, particularly with so many open-source implementations nowadays for reference. Surprisingly, Professor Levine nodded in approval. “Any other questions?”

Professor Goldberg had one. “Can you go back to one of the slides you said about student’s performance? The one that said if the student’s performance is conveyed with $P_1$ [which may represent trajectories in an environment] and from that the teacher can determine the student’s weakest skill so that the next set of data $P_2$ from the student shows improvement …””

I flipped back briefly to the appropriate slide. “This one?”

Professor Goldberg: “yes, that one. This sounds interesting, but you can think of a problem where you teach an agent to improve upon a skill, but then that results in a deterioration of another skill. Have you thought about that?”

“Yes, I have,” I said. “There’s actually an interesting parallel in the automated curriculum papers I’ve talked about, where you sample goals further and further away so you can learn how to go from point $A$ to point $B$. The agent may end up forgetting how to go from point $A$ to a point that was sampled earlier in the sequence, so you need to keep a buffer of past goals at lower difficulty levels so that you can continually retrain on those.”

Professor Goldberg: “sounds interesting, do you plan to do that?”

“I think so, of course this will be problem dependent,” I responded, “so I think more generally we just need a way to detect and diagnose these, by repeatedly evaluating the student on those other skills that were taught earlier, and perhaps do something in response. Again problem dependent but the idea of checking other skills definitely applies to these situations.”

Professor Levine asked if anyone had more questions. “John do you have a question?”

“No,” he responded, as he finished up his lunch. I was getting moderately worried.

“OK, well then …” Professor Levine said, “we’d now like Daniel to step outside the room for a second while we discuss among ourselves.”

I walked outside, and both of the interpreters followed me outside. I had two interpreters booked for the talk, but one of them (the guy who was new to me) did not need to do any interpreting at all. Overall, the professors asked substantially fewer questions than I had expected.

## The Result

After what seemed like another 10 minutes of me staring at the same wall I looked at before the talk, the door opened. The professors were smiling.

Professor Levine: “congratulations, you pass!”

All four approached me and shook my hand. Professor Canny and Professor Tomizuka immediately left the room, as I could tell they had other things they wanted to do. I quickly blurted out a “thank you” to Professor Canny for his patience, and to Professor Tomizuka for simply showing up.

Professor Goldberg and Professor Levine stayed slightly longer.

While packing up, Professor Levine commended me. “You really hit upon a lot of the relevant literature in the talk. I think perhaps the only other area we’d recommend more of is the active learning literature.”

Professor Goldberg: “This sounds really interesting, and the three year time plan that you mention for your PhD sounds about right to get a lot done. In fact think of robot origami, John mentioned that. You’ve seen it, right? I show it in all the talks. You can do robot teaching on that.”

“Um, I don’t think I’ve seen it?” I asked.

Professor Goldberg quickly opened up his laptop and showed me a cool video of a surgical robot performing origami. “That’s your PhD dissertation” he pointed.

I nodded, smiling hard. The two professors, and the sign language interpreters, then left the room, and I was there by myself.

Later that day, Professor Levine sent a follow-up email, saying that my presentation reminded him of an older paper. He made some comments about causality, and wondered if there were opportunities to explore that in my research. He concluded by praising my talk and saying it was “rather thought-provoking.”

I was most concerned about what Professor Canny thought of the talk. He was almost in stone-cold silence throughout, and I knew his opinion would matter greatly in how I could construct a research agenda with him in the coming years. I nervously approached Professor Canny when I had my next one-on-one meeting with him, two days after the quals. Did he think the talk was passable?? Did he (gulp) dislike the talk and only passed me out of pity? When I asked him about the talk …

He shrugged nonchalantly. “Oh, I thought it was very good.” And he pointed out, among other things, that I had pleasantly reminded him of another colleague’s work, and that there were many things we could do together.

Wait, seriously?? He actually LIKED the talk?!?!?!?

I don’t know how that worked out. Somehow, it did.

## Retrospective

I’m writing this post more than 1.5 years after I took the actual exam. Now that some time has passed here are some thoughts.

My main one pertains to why we need a non-EECS faculty member. If I have any suggestion for the EECS department, it would be to remove this requirement and to allow the fourth faculty to be in EECS. Or perhaps we can allow faculty who are “cross-listed” in EECS to count as outside members. The faculty expertise in EECS is so broad that it probably is not necessary to reach out to other departments if it does not make sense for a given talk. In addition, we also need to take an honest look as to how much expertise we can glean from someone in a 1.5-hour talk, and if it makes sense to ask for 1.5 hours of that professor’s time when that professor could be doing other, more productive things for his/her own research.

I am fortunate that scheduling was not too difficult for me, and I am thankful to Professor Tomizuka for sitting in my talk. My concern, however, is that some students may have difficulty finding that last qualifying exam member. For example, here’s one story I want to share.

I know an EECS PhD student who had three EECS faculty commit to serving on the quals committee, and needed to find a fourth non-EECS faculty. That student’s advisor suggested several names, but none of the faculty responded in the affirmative. After several months, that student searched for a list of faculty in a non-EECS department.

The student found one faculty who could be of interest, and who I knew served as an outside faculty member on one EECS quals before. After two weeks of effort (due to listed office hours that were inaccurate, just as I experienced), the student was able to confirm to get a fourth member. Unfortunately, this happened right when summer began, and the faculty on the student’s committee were traveling and never in the same place at the same time. Scheduling would have to be put off until the fall.

When summer ended and fall arrived, that student was hoping to schedule the qualifying exam, but was no longer able to contact the fourth non-EECS faculty. After several futile attempts, the student gave up and tried a second non-EECS faculty, and tentatively got confirmation. Unfortunately, once again, the student was not able to contact the faculty member again when it was time to schedule.

It took several more months before the student, with the advisor’s help, was able to find that last, elusive faculty member to serve on the committee.

In all, it took one year for that student to get a quals committee set up! That’s not counting the time that the student would then need to schedule it, which normally has to be done 1 or 2 months in advance.

Again, this is only one anecdote, and one story might not be enough to spur a change in policy, but it raises the question as to why we absolutely need an “outside” faculty member. That student’s research is in a very interesting and important area in EECS, but it’s also an area that isn’t a neat fit for any other department, and it’s understandable that faculty who are not in the student’s area would not want to spend 1.5 hours listening to a talk. There are many professors within EECS that could have served as the fourth faculty, so I would suggest we change the policy.

Moreover, while I don’t know if this is still the current policy, I read somewhere once that students can only file their dissertations at least two semesters after their qualifying exam. Thus, significant delays in getting the quals exam done could delay graduation. Again, I am not sure if this is still the official policy, so I will ask the relevant people in charge.

Let’s move on to some other thoughts. During my quals, the professors didn’t bring a lot of academic material with them, so I am guessing they probably expected me to pass. I did my usual over-preparation, but I don’t think that’s a bad thing. I was also pitching a research direction that (at the time) I had not done research in, but it looks like that is also acceptable for a quals, provided that the talk is of sufficient quality.

I was under a ridiculous amount of stress in the months of February, March, and April (until the quals itself), and I never want to have to go through months like those again. It was an incredible relief to get the quals out of the way.

Finally, let me end with some acknowledgments by thanking the professors again. Thank you very much to the professors who served on the committee. Thank you, Professors John Canny, Ken Goldberg, Sergey Levine, and Masayoshi Tomizuka, for taking the time to listen to my talk, and for your support. I only hope I can live up to your expectations.

1. At the time, I was not formally advised by him. Now, the co-advising is formalized.

2. I felt really bad trying to contact Professor Tomizuka. I don’t understand why we have to ask professors we barely know to spend 1.5 hours of their valuable time on a qualifying exam talk.

3. Classes at UC Berkeley operate on “Berkeley time,” meaning that they start 10 minutes after their official starting time. For example, a class that lists a starting time of 2:30pm starts at 2:40pm in practice.

4. As part of my preparation for the qualifying exam, I had a list of about 50 questions that I felt the faculty would ask.

# All the Books I Read in 2019, Plus My Thoughts

There are 37 books listed here, which is similar to past years (34, 43, 35). Here is how I categorized these books:

• China (7 books)
• Popular Science (9 books)
• American History and Current Events (4 books)
• Self-Improvement (6 books)
• Dean Karnazes Books (3 books)
• Yuval Noah Harari Books (3 books)
• Miscellaneous (5 books)

For all of these I put the book’s publication date in parentheses after the title, since it’s important to know when a book was published to better understand the historical context.

This page will maintain links to all my reading list posts. In future years, I’ll try and cut down on the length of these summaries, since I know I am prone to excessive rambling. We’ll see if I am successful!

Books I especially liked have double asterisks by their name.

## Group 1: China

For a variety of reasons, I resolved that in 2019, I would learn as much as I could about China’s history, economy, political structure, and current affairs. A basic knowledge of the country is a prerequisite for being able to adequately discuss China-related issues today. I successfully read several books, which I am happy about, though I wanted to read about double the number that I did. As usual, my weakness is being interested in so many subjects that it’s impossible for me to focus on just one.

• ** China’s Economy: What Everyone Needs to Know ** (2016) is by Arthur R. Kroeber, a Westerner who has lived in Beijing since 2002. Describing China as “formally centralized, but in practice highly decentralized,” Kroeber drives us through a fascinating whirlwind of the world’s most populous country, discussing the Chinese Communist Party, Chinese leaders, Chinese growth relative to other Asian economies (Taiwan, South Korea, and Japan), State Owned Enterprises, the Cultural Revolution, how the political system works, how business and finance work, Chinese energy consumption, Chinese meat consumption (which, thankfully, is leveling out) demographics, the shift from rural to urban, and so forth. There’s a lot to process, and I think Kroeber admirably provides a balanced overview. Some of the economic discussion comes from Joe Studwell’s book on How Asia Works, which I read last year. The book is mostly objective and data-driven, and Kroeber only occasionally injects his opinions. American nativists would disagree with some of Kroeber’s opinions. For example, Americans often criticize China for excessive government protection of Chinese businesses, but Kroeber counters that every country has incentives to protect their businesses. Conversely, the Chinese government might not fully agree with Kroeber’s criticism of the one-child policy (but maybe not, given that the policy is no longer active), or Kroeber’s claim that it would be difficult for technological innovation and leadership to come from a country whose government does not permit free speech and heavily censors Internet usage. The book’s appendix raises the intriguing question of whether the government manipulates economic statistics. Kroeber debunks this, and one reason is the obvious: no one who has lived or visited China’s cities can deny rapid growth and improvement. Finally, Kroeber ponders about the future of China, and in particular US-China relations. He urges us (i.e., mostly Western readers) not to view China’s rise as foreboding a repeat of Nazi Germany or Communist Soviet Union, and thinks that an “accommodation can be reached under which China enjoys increased prestige and influence […], but where the US-led system remains the core of the world’s political and economic arrangements.” That is definitely better than a different scenario where war occurs between US and China.

• ** Environmental Pollution in China: What Everyone Needs to Know ** (2018) is the third “What Everyone Needs to Know” book variant about China that I’ve read, by Daniel K. Gardner, Professor of History at Smith College. This one narrows the scope to China’s environment, which is inevitably tied to its economy and government. It is, as Gardner frequently preaches, of importance to us because China’s environment affects the world in many ways. China’s pollutants go into the atmosphere and spread to other countries. China’s purchasing power also means that if it is low on food or other resources, it may buy from other countries and push prices up, potentially adding to instability for those countries with fragile governments. Much of the discussion is about air, which makes sense due to its direct visibility (remember the “airpocalypse”?), but equally important to consider are soil and water quality, both of which look distressing due to chemicals and other heavy metals, and of course climate change. Understanding and improving China’s environment has potential to benefit China and others, and Gardner does a nice job educating us on the important issues and the relevant — but sometimes searing — statistics. I left the book impressed with how much content was packed in there, and I am thinking of ways for cooperation between the United States and China. In particular, I was encouraged by how there is an environmental movement gaining momentum in China,3 and I am also encouraged by their expanding nuclear power program, since that uses less carbon than coal, oil, or natural gas. Unfortunately, and rather surprisingly for a book published in 2018, I don’t think there’s any mention of Donald Trump, who isn’t exactly a fan of China or climate-related issues. I mean, for God’s sake, he tweeted the preposterous claim that global warming was a hoax invented by the Chinese. I can only hope that post-Trump, saner heads will soon work with China to improve its environment.

This includes books with a psychology bent, such as those from Steven Pinker.

• ** The Better Angels of Our Nature: Why Violence has Declined ** (2011) needs no introduction. The Bill Gates-endorsed, 700+ page magnus opus by Pinker, and which I managed to read in bits and pieces over the course of two busy months, describes how humans have grown steadily less and less violent over the course of our several million year history. This is in contrast to many commentators nowadays, who like to highlight every bit of violence happening in the modern world while longing for a more “romantic” or “peaceful” past. Pinker thoroughly and embarrassingly demolishes such arguments by providing compelling quantitative and qualitative evidence that violence was much, much more prevalent before the modern era. In years past, life expectancy was lower, a far greater percentage of people died due to homicide and war, and practices such as torture and unusual punishment were more common and accepted by society. This is just a fraction of what’s in the book. I recommend it to everyone I know. Since I read Pinker’s Enlightenment Now last year, which can be thought of as a successor to this book, I was already somewhat familiar with the themes here, but the book still managed to blow my mind about how much violence there was before my time. It also raises some interesting moral dilemmas, because while World War II did kill a lot of people, what might matter more is the number of deaths relative to the world or country population at that time, and by that metric there are many other incidents throughout history that merit our attention. Probably the only downside of Better Angels from a reader’s perspective is that the later parts of the book can be a bit dry since it presents some of the inner workings of the brain because Pinker wanted to discuss the science of why current circumstances might be more favorable to reducing violence. That is a tricky subject to describe to a non-technical audience. I view myself as technically-minded, though not in the sense that I know much about how the brain works internally,4 and even I found this section somewhat tough going. The overall lesson that I learned, though, is that I believe Pinker is right about humans and violence. He is also right that we must understand the causes of violence and how to encourage trends that have shown to reduce it. I remain optimistic.

• Artificial Intelligence: What Everyone Needs to Know (2016) is by entrepreneur Jerry Kaplan, who got his PhD in computer science (focusing on NLP) from the University of Pennsylvania in 1979. It is in the “What Everyone Needs to Know” series. Kaplan presents the history and research frontiers of AI, and then wades into AI philosophy, AI and the law, the effect of AI on jobs and society, and the risks of superintelligence. I knew most of the book’s material due to my technical background in AI and my reading of popular science books which cover such topics. Thus, I did not learn as much from this book as I do with others, but that doesn’t mean it’s bad for a general audience. I do think the discussion of free well and consciousness could be reduced a bit in favor of extra focus on imitation and reinforcement learning, which are among the hottest research fields in AI. While this book isn’t entirely about the research frontiers, the omission of those is a bit surprising even when considering the 2016 date. The book is on the shorter side at 200 pages so perhaps a revised edition could add 10-20 more pages to the research frontiers of AI? There are also some other surprising omissions — for example, the famous AlexNet paper is not mentioned. In general, I might recommend more focus on current frontiers in AI and not on speculation of the future.

• Astrophysics for People in a Hurry (2017) by scientist and science popularizer Neil deGrasse Tyson, is a slim book5 where each chapter is on a major theme in astrophysics. Example include exoplanets, dark energy, dark matter, and what’s “between” planets and galaxies. I am familiar with some concepts at a high-level, most of which can be attributed to Lisa Randall’s two recent books that I read, and Tyson’s book served as a helpful refresher. Tyson boasts that Astrophysics for People in a Hurry is short, so there are necessarily going to be limitations in what he can present, but I think there is a niche audience that this book will reach. In addition, it is written in Tyson’s standard wit and humor, such as “I don’t know about you, but the planet Saturn pops into my mind with every bite of a hamburger” and “The system is called the Sagittarius Dwarf but should probably have been named Lunch”, since dwarf planets can get consumed by larger planets, i.e., “planet cannibalism”, get it?? The main benefit is probably to pique the reader’s curiosity about learning more, which could be said for any book, really. In addition, I will give a shout-out to Tyson for mentioning in the final chapter that we must never cease our scientific curiosity, for if we do, we risk regressive thinking that the world revolves around us. (Please read the final chapter to fully understand.)

• ** Life 3.0: Being Human in the Age of Artificial Intelligence ** (2017) by MIT theoretical physicist — and a welcome recent entrant to AI — Max Tegmark, clicked on all the right cylinders. Think of it as a more accessible and mainstream version of Nick Bostrom’s Superintelligence, which itself wasn’t too shabby! The “Life 3.0” part refers to Tegmark’s classification of life as three tiers: Life 1.0 is simple life such as bacteria that can evolve but cannot change its hardware or software, and thus will not be able to change its behavior beyond what evolution has endowed it with. Life 2.0 represents humans: we can change our software by changing our behavior based on past experience, but we are limited by our “hardware” of being human, beyond basic stuff like hearing aids (that I wear), which can be argued as a “hardware upgrade”, but are minor in the grand scheme of a human design. In contrast, Life 3.0 not only can learn like humans, but can also physically upgrade its own hardware. The possibilities for Life 3.0 are endless, and Tegmark takes us on wonderful thought experiments: what kind of world do we want from a superintelligent agent? How can it use the resources in the cosmos (i.e., all the universe)? These are relevant to the question of how we design AI now, because by driving the agenda, we can increase the chances of attaining the kind of future we want. He gave a captivating keynote talk about some of this material at IJCAI 2018 in his home country of Sweden, which you can see from my earlier blog post. Having been a committed AI researcher for the past five years, I recognized many of the well-known names from Tegmark in his commentary and the pictures from the two conferences he features in the book.6 I am inspired by Tegmark’s body of work, both in the traditional academic sense of research papers but also in the sense of “mainstreaming” AI safety and getting the top researchers together to support AI safety research.7 The book manages to make the reader ponder about the future of life. That’s the name of an organization that Tegmark helped co-found. I will heed the advice from his epilogue about being optimistic for the future of life, and how I can help drive the agenda for the better. Overall, Life 3.0 is one of my favorites, just like it is for former President Barack Obama, and might have been my favorite this year.

Update 01/04/2020: yikes! A reader informed me of this blog post which claims that Why We Sleep is filled with scientific errors. That post has gotten a fair amount of attention. I’m … honestly not sure what to think of this. I will have to go through it in more detail. I also urge Professor Walker to respond to the claims in that blog post.

• Blueprint: How DNA Makes Us Who We Are (2018) by behavioral geneticist Robert Plomin of King’s College London is about DNA and its ability to predict psychological traits. This is what Plomin means by “makes us who we are” in the subtitle, which he repeats throughout the book. The first part summarizes the literature and research results on how DNA can be used to predict traits, including those that seem environmental, such as educational attainment. The presence of identical twins has been a boon to genetics research, as they are the rare cases of when two people are 100 percent similar genetically. The second part discusses how “polygenic scores”9 computed from DNA samples can be used for “fortune-telling” or predicting traits. This is not my field, and I trust Plomin when he says that the science is essentially settled on whether heritability exists. Nonetheless, this book will be controversial; right on cue, there’s a negative review of the book which brings up precisely the points I am worried about: eugenics, designer babies, and so on. To his credit, Plomin keeps emphasizing that all DNA can do is make probabilistic (and not actual) predictions, and that there are an enormous spread of outcomes. Plomin is also right to say that: “The genome genie is out of the bottle and, even if we tried, we cannot stuff it back in” near the end of the book. Trying to hide science that’s already been made public is virtually impossible, as the Soviets demonstrated back in the early days of the Cold War when they stole nuclear weapons technology from the United States. But I worry that Plomin still did not sufficient assuage the concerns of readers, particularly those of (a) parents and potential parents, and (b) policy makers concerned about consequences for inequality and meritocracy. Though, to be clear, I am fine with these results and trust the science, and it’s also blindingly obvious that if we end up equalizing opportunity and education among an entire population, we will end up increasing the relative impact of genetics on final performance. Blueprint is a necessary book to read to understand the implications of the current genomics and DNA revolution.

• The Deep Learning Revolution: Artificial Intelligence Meets Human Intelligence (2018) was an instant-read for me the moment I saw the book at the MIT Press Booth at ICRA 2019. It is written by Distinguished UC San Diego Professor Terence Sejnowski, who also holds a chaired position at the Salk Institute and is President of the Neural Information Processing Systems foundation. That’s a lot of titles! I recognized Sejnowski’s name by looking at various NIPS (now NeurIPS) conference websites and seeing him as the president. From a technical sense, I remember he was among the team that refined Independent Component Analysis. I have a very old blog post about the algorithm, dating back to the beginning of my Berkeley era. He also worked with neural networks at a time when it was thought not to be a fruitful path. That the 2018 Turing Award went to Hinton, Bengio, and LeCun shows how much things have changed. The book talks about Sejnowski’s experience, including times when others said they “hated his work” – I was familiar with some of the history of Deep Learning, but Senjowski brings a uniquely personal experience to the reader. He’s also knowledgeable about other famous scientists, and mentions the pioneers in Deep Learning, Reinforcement Learning, and Hardware. He concludes by marveling about the growth of NeurIPS. The main downside is that the book can sometimes seem like a hodgepodge of things together without much connection among the topics, and there are some typos which hopefully will be corrected in future editions. There is, of course, the usual adage that it’s hard to know a topic that Sejnowski talks about without already knowing it beforehand, but every popular science book would suffer from that problem. I would later attend NeurIPS 2019, as I wrote here, where I saw him and a few others featured in his book. I wish I could attain a fraction of Sejnowski’s academic success.

## Group 3: American History and Current Events

• ** Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians who Helped Win the Space Race ** is the 2016 bestselling book which inspired the movie of the same name. I’m not a movie person — since December 2015 I have watched a total of one movie in four years — but I am a book person, so I read the book, by Margot Lee Shetterly. I started right after making my Apollo 11 post, because I figured there was never going to be a better time for me to read it, and I’m glad I did. It chronicles the lives of Dorothy Vaughn, Katherine Coleman (still alive at the age of 101!), Mary Jackson, Christine Darden, and a few others, who were female African American mathematicians working at Langley and then NASA, helping America win the Space Race over the Soviet Union in the 1960s. Hidden Figures compellingly describes what life must have been like for them in the 1960s; when reading the book, I often got distracted due to fantasizing different 1960s-era scenarios in mind. The book discusses the career trajectories of the women, assigned as “mathematicians,” and concrete scenarios such as how Katherine Johnson’s work helped John Glenn orbit the Earth. If there’s one thing I was slightly disappointed about, it was that there wasn’t a whole lot about the actual Apollo 11 mission to land on the moon, except for a bit in the final chapter, but perhaps it was hard to find documentation or evidence for the women’s contributions to that project, as compared to Glenn’s orbit. I agree with Shetterly in that these stories are inspiring but not well known prior to this book, and that clearly justifies the need for Hidden Figures. I was reading this at a time when I was laboring heavy hours in a workweek to meet some research deadlines, and one thing that helps drive me is knowing that I have plenty of opportunity here at Berkeley, and I can’t waste it.

• ** American Dialogue: The Founders and Us ** (2018) by Joseph J. Ellis, Professor Emeritus at Mount Holyoke College, considers the question: What would the founders think? The book features four of them. Each has one major theme presented in a historical context and then a modern context. In order of discussion: Thomas Jefferson on race, John Adams on equality, James Madison on law, and George Washington on foreign policy. Ellis presents the history and circumstances of these four men in a concise yet informative and fascinating manner. My biggest takeaway are all the contradictions inherent in our founders. Thomas Jefferson opposed a biracial America and, while he wanted to free slaves, he also made it clear that the goal was to deport them to some undetermined location to keep America “pure.” At the same time he said that, he had a biracial slave mistress, and an extended family of slaves at home. Hypocritical is too kind of a word. This is also relevant to the famous “all men are created equal” phrase in the constitution … whatever happened to Native Americans or African Americans? Or, of course, women. (Hey, founders, I’m very impressed with your ability to ignore half of the population!) Meanwhile, in law, we have the whole “originalist” vs “living Constitution” debate … yet Ellis makes a convincing case that Justice Scalia’s District of Columbia v Heller opinion was highly political whereas Justice Steven’s dissent was originalist. (How often do we hear about the “well regulated militia” in the debates about the second amendment?) As Ellis keeps reminding us, we live in an America that is far different from what the founders lived in, so rather than view the founders as mythological creatures with the brilliance to write a Constitution that should last forever, we should instead view them as highly fallible men who debated and argued while drafting the Constitution, and could not have foretold what the future would hold. Argument, debate, and dialogue is their ultimate legacy.

## Group 4: Self-Improvement

• Infinite Possibilities: The Art of Living Your Dreams (2009) is by Michael Dooley, a former tax accountant who now spends his time discussing daily “notes on the universe” and other things as explained on his website. Dooley’s chief claim from the start is that thoughts become things. Dooley argues we have to believe in and think about our goals, before we can attain them. Inifinite Possibilities is written in a motivational style, trying to urge the reader to do stuff, think positively, and follow your dreams. There are some good points in this book, and I appreciate Dooley revealing that even a deeply spiritual man like him suffers from similar things I do, like feeling guilty when relaxing and vacationing. The downside is that I disagree with the rationale for his beliefs in Infinite Possibilities. Dooley argues, for instance, that space and time operate via thoughts turning into things; but they actually operate by the laws of physics, and someone thinking about something can’t guarantee that the event will actually happen! Dooley counters this by claiming that we think about so many things that not all can be true, but that is cherry-picking. I am a vocal advocate of rigorous, empirical, controlled experiments, over high-level motivational comments. Unfortunately, this book doesn’t cite any studies or even a cursory glance at the literature in neuroscience, cognitive science, psychology, and other fields that could bolster some of Dooley’s claims. There is certainly an audience for Dooley’s book, as evident by his hundreds of thousands of email subscribers, but it is not my style.

• Getting to Yes: Negotiating Agreement without Giving In (editions in 1983, 1991, and 2011 – I read the 2011 one).11 The three authors are Roger Fisher, a former Harvard law professor, William Ury, a distinguished fellow of “the Harvard Negotiation Project” (surprisingly, that’s a thing), and in later editions, Bruce Patton (also a distinguished fellow of the Harvard Negotiation Project). Getting to Yes is a classic book on negotiation skills, which has become increasingly important with flatter hierarchies in work environments, which induces more discussions among people of equal status. The book starts off by warning us not to bargain over positions. That would be the classic “he said $X$, she said $Y$, so we’ll split the difference and do $\frac{X+Y}{2}$”, which is bad for a number of reasons. Here’s an obvious one: someone clever could just start with a more extreme position to get a desired quantity! Instead, the authors give us a four point method: (1) separate — or more politely, disentangle — the people from the problem, (2) focus on interests, not positions, (3) invent options for mutual gain, and (4) insist on objective criteria. Then they tell us what to do with people who won’t play nice (e.g., “best alternative to a negotiated agreement”) and then answer common questions from readers. Their advice seems sound! I can see why it works in theory. That said, the book has several weaknesses, but some are inherent to this kind of genre. First, I do not think the examples are fully fleshed through. Perhaps fewer examples would be better, and maybe it would be feasible to contrast those with failed negotiations? The book sounds scholarly, but it doesn’t cite much research except for some of the authors’ other books. Also, I don’t think this will appease people nowadays who talk about marginalized people and say that “the moderate stance is taking an extreme political position…” Fortunately, I think the book does a fine job in the delicate case of dealing with a more powerful negotiator.

• ** 24/6: The Power of Unplugging One Day a Week ** (2019) is a new book by famous film-maker and Internet pioneer Tiffany Shlain, who I know because she is married to one of my PhD advisors. Needless to say, 24/6 was an instant read for me when it was published. Fortunately, Ken Goldberg brought a copy to the lab. When I opened it, I found a hand-written note from Ms. Shlain addressed to me, saying that I was “the most prolific reader in Ken’s lab”.13 Thank you! The book resonated with me because, Like Ms. Shlain, I am deeply connected to the world and rely heavily on the Internet for my day-to-day duties. I also have this long-running blog, which probably makes me even more closely attached to the Internet compared to other computer scientists in my generation. This book discusses how she and her family takes 24 hours off a week, from Friday night to Saturday night, and unplug. This means no electronics. For calls, they use their landline phone, and for writing stuff, it’s paper and pen. This is inspired by the Jewish “sabbath” but as Shalin repeatedly emphasizes, it’s not a Jewish thing but one that can apply to a variety of religions, including the church I go to (atheism). 24/6 has many examples of Shalin’s activities during her sabbaths, some of which were known to me beforehand. She also proposes practical tips on making a 24/6 life happen in today’s world, with testimonials from her readers. The easiest way for me to follow this is, like her, to have a 24/6 break from Friday night to Saturday night, and use that time for, well, reading physical books instead of e-books, long-distance running, and cooking the next salad dish. I hope I can keep it up!

## Group 5: Dean Karnazes Books

All three of these books are by ultramarathoner Dean Karnazes. He is perhaps the ultramarathoner best known to the general public. While Karnazes is not the best ultramarathoner, he’s a very good one. (This article shows some context on the “controversy” surrounding Karnzes.) I first saw the name “Dean Karnazes” in an email advertisement for a running race in the Bay Area. It showed a picture of him shirtless (no surprise) and quickly recapped some of his eye-popping achievements: that he’s run in conditions ranging from 120 degree temperatures in Death Valley to freezing temperatures in Antarctica, that he once ran 350 miles continuously, and that he once ran 50 marathons in 50 days in 50 states. One Google search led to another, and I found myself reading his books.

• ** Ultramarathon Man: Confessions of an All-Night Runner ** is the 2005 biography of ultramarathoner Dean Karnazes, and the one that catapulted him to fame. In Ultramarathon Man, Karnazes describes how he had an epiphany when he turned 30 to start running for the first time since high school, to give him satisfaction and meaning that he wasn’t getting from his corporate job. The book describes four main running races: the Western States 100, Badwater, a run at the South Pole, and then a 200-mile race. The Western States 100 run was his first 100-mile ultramarathon and describes all the setbacks, pitfalls, and dangers that he and other runners faced, such as disfigured feet, bad urine, and dehydration. But Western States 100 probably pales in difficulty compared to Badwater, a 135 mile run in 120 degree weather in Death Valley in July. Ouch! Karnazes actually dropped out in his first attempt, came back to finish and eventually won the 2004 race outright. His race in Antartica was equally dangerous, for obvious reasons: there was frostbite, and he nearly got lost. The last one was a 200-mile “relay” race that he ran solo, whereas other teams had 12 alternating runners. Karnazes’ purpose was to raise some money for a young girl’s health condition. It’s very touching that he is inspired to run “to give the gift of life,” especially considering how his sister died in a tragic car accident while a teenager. The main feeling I had after finishing this book was: inspiration. As of December 2019, I have run seven half-marathons, and I will add some marathons in the coming years. Health permitting, I will be a runner for life. If there’s any ultramarathon I’d run, it would be the San Francisco one, which gives a break of a few hours between two consecutive 26.2 mile runs. Perhaps I’ll see Karnazes there, as I think he still lives in San Francisco.

• 50/50: Secrets I Learned Running 50 Marathons in 50 Days — and How You Too Can Achieve Super Endurance! (2008), written by Dean Karnazes and Matt Fitzgerald, describes Dean Karnazes’ well publicized 50 marathons in 50 states in 50 days quest.14 This is the best reference for it. I think there was other information online at some point, but that was back in 2006. NorthFace sponsored Karnazes — in part due to the publication of Ultramarathon Man — and provided him with a support team for travel to races and to monitor his health. Karnazes’ target pace was 4 hours for each marathon, and he kept remarkably well at it. The average time of his 50 marathons was 3:53:14. Most of the 50 races were not actual “live marathons” since those usually happen on weekends. The weekday races were simulated like a normal marathon and run on the same course, but with only minimal police protection and a smaller group of volunteer runners that signed up to run with Karnazes. There are many great stories here, such as a Japanese man who signed up on a whim to impress his new lover, and how former Arkansas Governor Michael Huckabee joined him for the races in Arkansas and in New York City. Incidentally, the last race was the live 2006 New York City marathon, which he ran in 3:00:30, a very respectable time! After the celebration, the next day Karnazes said he felt lousy. So … he went for a run. He said he spent forty days almost entirely outside, running from New York City back to the starting line of the Lewis and Clark marathon in Missouri?!? How is that possible? Sorry, I don’t believe this one iota. Finally, the book is scattered with running tips from Karnazes, though most are generic “marathon advice” that can be easily found outside of this book. Three pieces of advice I remember are: (a) tips on how to avoid getting sick during a race, (b) stop heel-striking, and (c) don’t drink water for the last hour before a race.

• Run! 26.2 Stories of Blisters and Bliss (2011) is yet another Dean Karnazes book, consisting of “26.2 chapters” on various short stories throughout Karnazes’ running career, not including those in his prior books. For example, he recalls the Badwater races he ran after his first, failed attempt (covered in Ultramarathon Man), including one where he ran naked after he found out his father needed heart surgery. Strangely, he never mentions the 2004 edition of Badwater, which is the one he actually won. He also never mentions his continuous 350 mile run done over three nights without sleep, though he does refer to run of the same length in Australia over six days. Karnazes also mentions his two failures at Leadville, the first due to altitude, and the second due to a torn meniscus. He then ignored his doctor’s instructions to stop running! I disagree. I like running but I am not willing to do lasting damage to myself. Run! is a reasonably nice supplement to better understand the highly unusual nature of Karnazes’ life. Some stories seem a bit fragmented, with only a few pages to digest them before moving on to the next. The book is on the short side so I’m in favor of adding rather than removing content. I believe Karnaes’ first book, Ultramarathon Man, is the best, followed by 50/50, and then this one. I am fine reading all of them, but for those who aren’t running fanatics, I recommend sticking with Ultramarathon Man and leaving this one aside. The book’s cover is a picture of him shirtless which I found to be a bit self-centered, though to be fair Karnazes doesn’t write like a someone trying to inflate his ego — he explicitly states in his book that he runs for personal goals, not to brag to others.

## Group 6: Yuval Noah Harari Books

I’m glad I finally read Yuval Noah Harari’s books. Somehow, he takes us through mind-blowing journeys across history, current events, and the future, and delivers highly thought-provoking perspectives. All of his books are about 400 pages, but for “academic-style” books, they honestly don’t feel like slogs at all. His English writing is also beautiful, and reminds me of Steven Pinker’s writing style. All of this is from someone who works less than me and spends 1-2 hours each day meditating.

• ** Sapiens: A Brief History of Humankind ** (2011, US Edition 2015) is a lovely book that somehow covers the entire history of humanity, from our Neanderthal ancestors to modern-day humans. Thus, Sapiens must necessarily sacrifice depth in favor of breadth. That’s fine with me, as I can pick other books from my reading list that can go into more depth on a subset of topics. Harari does a great job describing our ancestors in such vivid and sometimes quirky language. I especially enjoyed his descriptions on what life was like as a forager, where wild, “natural” food was available — provided you could find it — and infectious diseases were nonexistent. Consider the contrast, Harari argues, with agriculture, which forced us to settle into fixed communities with animals. Not only did disease spread, but domesticated animals themselves became an evolutionary tragedy: they are technically “successful” in reproducing themselves, but they live such miserable lives. (Harari also discusses our treatment of animals in his other books, and due to his research, he now strives to avoid anything to do with the meat industry.) I was also delighted to see that Sapiens covers happiness and the decline of violence. These are similar themes present in Steven Pinker’s books of Better Angels and Enlightenment Now. The Hebrew edition of Sapiens was published in 2011, the same year Better Angels came out, so perhaps Harari and Pinker independently synthesized the research literature on the decline of violence? They seem to have a fair amount of common interests (and common readers, like me), so perhaps they collaborate in their academic lives? Collaboration, after all, is an example of human communication and cooperation, which Harari states as perhaps the definitive advantage of our species over others.

• 21 Lessons for the 21st Century (2018) is the third book by Yuval Noah Harari, and once again, somehow Harari manages to blend complex concepts and “how did I not realize that earlier?” ideas into wonderfully simple language. Harari divides his third book into 21 chapters, each with a particular “lesson” or “theme” for us to ponder. This is about the present, whereas his prior books talk about the past and future, but this book has quite some overlap with Homo Deus, such as with the “fly and the bull” metaphor about terrorism. Nonetheless, there is certainly enough new material to be worthy of its own book. Chapters include those on terrorism, as suggested earlier, along with those such as war (never underestimate human stupidity!), liberty, equality, work, ignorance, education, and so forth. Harari concludes with two interesting chapters, on (a) how to find meaning in life, which includes discussions on suffering and has persuaded me that meaning can be found in reducing suffering, and (b) his own solution to facing information overload in the 21st century: meditation. Perhaps I should get around to practicing meditation, since it would be good for me to figure out how to keep my mind concentrated on one topic (or no topic!), rather than the present state where my mind repeatedly jumps around from subject to subject. Now for the bad news: it seems like, at least if the Wikipedia page is right, that for the Russian translation, Harari authorized the removal of some passages critical of the Russian government. I will call it it out like it is: hypocrisy. I don’t know why he did that; if I were in his position, I would get all the Russian experts I know to confirm that the Russian translation actually contains the criticism of Russia, and I would refuse to authorize the translation if it removed them. Putin is the kind of person who would be happy to create the kind of heavy surveillance state that Harari criticizes in the book when discussing the loss of liberty. To sum it up: an excellent book, and one which will probably persuade me to try out meditating, but poor hypocrisy.

## Group 7: Miscellaneous

I put a few books here that didn’t fit nicely in any of the earlier categories.

• It’s Not Yet Dark: A Memoir (2017) is a short and sweet memoir of Irish Filmmaker Simon Fitzmaurice, about his life as a filmmaker living with Amyotrophic Lateral Sclerosis (i.e., Lou Gehrig’s disease). He was diagnosed in 2008, and given four years to live. Despite this, he made it to late 2017 before passing away, and in that time he and his wife gave birth to more children, for five in all. In addition, he wrote the film My Name is Emily using eye-gaze technology. It’s Not Yet Dark poignantly describes how Fitzmaurice’s muscles and body motions progressively broke down, and how he needed a ventilator to breathe. There was some pushback, he recalls, from some people in his Irish hospital about whether it makes sense to “ventilate” someone with ALS, but Fitzmaurice convinced them that he wanted to live. The book describes in succinct yet surprising detail what it’s like to live with ALS, and also how to appreciate life. I’m regularly terrified that I’ll be in good health until I turn, say, 35, and then am suddenly stricken with ALS, which is why I will always try to cherish the present.

• ** Educated: A Memoir ** is a lovely, best-selling 2018 memoir by Tara Westover. The Bill Gates-endorsed book shows how Tara, born to “survivalists” (her wording) in Idaho, grew up without going to school. While technically she was “home schooled,” her family was ultra religious and tried avoiding other activities most of us do in the modern era without much questioning, such as going to the doctor and buying insurance. After some inspiration from an older brother, Westover studied hard for the ACT to get into Brigham Young. Despite being Mormon15 herself, she could not fit in with other students, who viewed her as strange and too devout. In class, Westover didn’t know what the word “Holocaust” meant, and asked that question aloud, to bewildering reactions. (“That’s not a joke” she was told.) I’m amazed she managed to actually get decent grades. In fact, she won a Gates Cambridge scholar and would get a PhD in history from Cambridge. The journey was not easy. Whenever she came back home, she faced a violent brother who would attack and cut her, and her parents would take her brother’s side. Her parents also tried to get her out of the PhD program, insulting those “socialists.” Eventually, Westover started to be open with her friends and collaborators about her background. At the end of the book, she reveals that she could not abide to what her parents were asking her to do, and her family bisected into two, with the PhDs (including her) on one end, and the others (including her parents) on the other. They are not on speaking terms, and I think that’s fine. I would never want to socialize with people like her parents. I did some Googling and found that a lawyer defending her parents said “42% of the children have PhDs.” While that is true, it is in spite of what her parents did, or because her parents starved their children of education — not because they were “better” at preparing their children for PhDs! Educated is the epitome of the memoir I like reading: one which appreciates the power of education and gives me a perspective on someone who has lived a vastly different life than I would ever want to live.

• India in the 21st Century: What Everyone Needs to Know (2018) by Mira Kamdar is another “What Everyone Needs to Know” book, structured as a list of question-and-answer sections. Kamdar was a member of the Editorial Board of the New York Times from 2013-2017, and currently is an author and provides expert commentary on India. The book reviews the history of the Indian territory, its early religions and ethnic groups, and the British control that lasted until India’s independence in 1947. While some of the history felt a bit dry, it still seems valuable to know, particularly when Kamdar describes famous and powerful people of India, such as Prime Ministers Jawaharlal Nehru and Indira Gandhi, and the famous Mahatma Gandhi. I’m embarrassed to say this, but before reading Kamdar’s book, I thought Indira was related to Mahatma. Oops! Indira was actually the daughter of Nehru and married someone with a last name of “Gandhi.” Anyway, the most interesting portions of the book to me were those that listed the challenges that India faces today. India will soon be the most populous country in the world,16 which will strain its water, food, and energy needs. Unlike China, which has a rapidly aging population, India has a far larger group of younger people, which means it doesn’t need to provide as much elderly care, but it does need to find jobs, jobs, and jobs. If the government fails to do so, it may face protests and anarchy. In addition, India (despite once having a female Prime Minister) still has quite retrograde views on women. I want India to be known for a great place for women to visit, rather than a place where women get gang-raped when they board buses. To make matters worse, sex preferences have resulted in more young men than women, just as in China. The current leader, Narendra Modi, faces these and other challenges, such as dealing with a rapidly-growing China and a hostile Pakistan. I am not a fan of Modi’s “Hindu nationalism”17 that Kamdar mentions; I think unchecked nationalism is one of the biggest dangers to world peace. Kamdar’s last question is a bit strange: Will India’s Bengal tiger become extinct? But, I see her reason: India was able to make progress in rescuing the tiger from the brink of extinction. This gives hope that India will rise to the occasion for bigger challenges in this century. I sure hope so.

Whew, 2019 was a good year for reading. Now, onto 2020 and a new decade!

1. Or more accurately, The Great Leap Backwards. The Great Leap Forward was one of the biggest tragedies in the history of the human race.

2. We should be clear on what the “leader of China” means. There have been five major “eras” of leadership in Chinese history since the founding of the People’s Republic in 1949: the Mao Zedong era (1949 to 1976), the Deng Xiaoping era (1978 to 1992), the Jiang Zemin era (1992 to 2002), the Hu Jintao era (2002 to 2012), and finally the Xi Jinping era (2012 to present). The years that I’ve put here are only approximations, because there are three main positions to have to be considered the “ultimate” (my informal term, for lack of a better option) leader in China and these men sometimes did not have control of all positions simultaneously. In addition, they can often play a huge role after their formal retirement. Incidentally, the three positons are: General Secretary of the Communist Party, Chairman of the Central Military Commission (which controls the army) and State President (to control the government). In practice, the first two are more important than the third for the purpose of ruling power. As of this writing in late 2019, Xi Jinping holds all three positions.

3. In China, it is safer to protest about environmental-related issues because protestors can align their objectives with the Chinese Communist Party and frame it as improving the country. It is far different from protesting over more politically sensitive issues, such as asking for democracy in China. Yeah, don’t do that!

4. No, understanding neural networks does not mean we understand how the human brain works.

5. Hence the “People in a Hurry” in the title. My hardcover copy is a little over 200 pages, but the margins are super-thin, so it’s probably equivalent to a “120-page book.” It’s definitely the second-shortest book that I have read this year, with the book It’s Not Yet Dark having the honor of the shortest of them all. Pinker’s Better Angels is, of course, the longest in this list, followed by (I think) Henry Kissinger’s book about China.

6. Thankfully, Tegmark put the names of the conference attendees in the picture caption. It’s definitely a veritable who’s who in Artificial Intelligence! I only wish I could join them one day.

7. Probably the chief downside of Life 3.0, and one which might be a target of criticism from AI researchers, is the heavy discussion on what a superintelligent agent can do is vastly premature; it’s basically the same argument against Nick Bostrom’s work. Still, I argue that there are many pressing AI safety issues right now that the subject of “AI safety” must be a current research agenda.

8. I probably should have expected this, but at the beginning of Why We Sleep, there is a disclaimer which states that the book is not meant to be used for professional medical advice.

9. When reading the book, I was struck by similarities between polygenic scores and Deep Learning. Polygenic scores rely on large-scale studies and the results can only be interpreted by the end outcome from the human’s experience. That is, to my knowledge, we can’t look at a gene and interpret its actual effects on the bloodstream, muscle movements, brain cells, and other body parts of humans. We can only look at a person’s years of education or height to see which set of genes can explain the variance in these qualities. Thus, it’s not as interpretable as we would like. Interpretability is a huge issue in Deep Learning, which has (as we all know) also benefited from the Big Data era.

10. Cohen mentions Anne Gorsuch, who was the Environmental Protection Agency administrator during Reagan’s presidency. I recognized her name instantly, because in 2017, her son Neil Gorsuch, was successfully nominated to the United States Supreme Court. Remember, Cohen’s book was published in 1995.

11. The first edition of the book had some “sexist language” according to the authors. Uh oh. I suspect the “sexist language” has to do with the negotiations about divorce settlements. Earlier editions might have assumed that the (former) wife was relying on the (former) husband for income. Or more generally, the book may have assumed that the men were always the breadwinners of the family.

12. With one exception: I have not read his book on how to be a high school superstar.

13. If you are a member of Ken Goldberg’s lab and would like to dispute this “most read” label, send me your reading list. I don’t mean to say this in a competitive manner; I am legitimately curious to see what books you read so that I can jump start my 2020 book reading list.

14. I’m a bit confused why the title isn’t 50/50/50, as that would be more accurate, and the fact that Karnazes ran in 50 states matters since all the travel eats up potential recovery and sleep time.

15. At the start of the book, Westover mentions that this is not a book about Mormonism and she “disputes connections” between Mormonism and the actions of people in this book. My guess is that she did not want to offend Mormons who are far less extreme as her parents. But we can run an experiment to see if there’s a connection between religion and the activities of certain people. We need a random sample of Mormons, and a random sample of non-Mormons, and measure whatever we are considering (I know this is not easy but science isn’t easy). I don’t know what would be the outcome of a study if such exists, but the point is we can’t unilaterally dispute connections without rigorous, scientific testing. It is disappointing to see this phrase at the beginning of the book.

16. Kamdar explicitly says in the book that sometime in 2017, India surpassed China to be the world’s most populous country. Most online sources, however, seem to still have China slightly ahead. Either way, India is clearly going to be the most populous country for much of the 21st century.

17. Since the book was published, Modi has presided over power and Internet outages in Kashmir, and a controversial Indian citizenship law that arguably discriminates against Muslims. The prospects of peace between India and Pakistan, and within India as well among those of different religions, appears, sadly, remote.

18. Yes, that’s another CFR fellow! I read a lot of their books — and no, it’s not on purposes — I usually don’t find out until I buy the book and then read the author biographies. It’s probably that the genre of books I read includes those which require specialized expertise in an area that relates to foreign affairs.

19. I read this book on the return flight from the ISRR 2019 conference. In one of my blog posts on the conference, I stated that “I will never tire of telling people how much I disapprove of Kim Jong Un.”

20. If I were President of the United States, one of my first foreign policy priorities would be to turn South Korea and Japan into strong allies, while also reassuring both countries that they are under our nuclear umbrella.

# Thoughts After Attending the Neural Information Processing Systems (NeurIPS) 2019

At long last. It took forever, but for the first time, I attended the largest and most prestigious machine learning conference, Neural Information Processing Systems (NeurIPS), held in Vancouver, Canada, from December 8-14. According to the opening video, last year in Montreal — the same place that hosted ICRA 2019 — NeurIPS had over 10,000 attendees. Tickets for NeurIPS 2018 sold out in 12 minutes, so for this year, NeurIPS actually used a lottery system for people who wanted to come. (The lottery was not for those contributing to the conference, who received a set of reserved tickets.) About 15,000 entered the lottery, and the total number of attendees was somewhere between 12,500 and 13,000.

I was only there from December 11 through 14, because the first few days were for industry-only events or tutorial talks. While those might be interesting, I also had to finish up a paper submission for a medical robotics conference. I finally submitted our paper on the night of December 10, and then the next morning, I had an early flight from San Francisco to Vancouver. My FitBit reported just 3 hours and 32 minutes of sleep, admonishing me to “Put Sleep First.” I know, I apologize. In addition, I did not have a full conference paper at NeurIPS, alas; if I did, I probably would have attended more of the conference. I had a workshop paper, which is the main reason why I attended. I am still trying to get my first full NeurIPS conference paper … believe me, it is very difficult, despite what some may say. It’s additionally tricky because my work is usually better suited for robotics conferences like ICRA.

The flight from San Francisco to Vancouver is only about 2.5 hours, and Vancouver has a halfway-decent public transportation system (BART, are you paying attention?). Thus, I was able to get to the conference convention center while it was still morning. The conference also had a luggage check, which meant I didn’t have to keep dragging my suitcase with me. Thank you!

NeurIPS 2019 was organized so that December 10-12 were the “real” (for lack of a better word) conference, with presentations and poster sessions from researchers with full, accepted conference papers. The last two days, December 13 and 14, were for the workshops, which also have papers, though those do not go through as intensive a peer-review process.

By the time I was ready to explore NeurIPS, the first of two poster sessions was happening that day. The poster sessions were, well, crowded. I don’t know if it was just me, but I was bumping into people constantly and kept having to mutter “sorry” and “excuse me.” In fact, at some point, the poster sessions had to be closed to new entrants, prompting attendees to post pictures of the “Closed” sign on Twitter, musing stuff like “Oh baby, only at NeurIPS would this happen…“.

For the 1-1.5 hours that I was at each poster session, which are formally for 2 hours each but in practice lasted about 3 hours, I probably was able to talk to only 4-5 people in each session. Am I the only one who’s struggling to talk to researchers during poster sessions?

Given the difficulty of talking to presenters at the poster session, I decided to spend some time at the industry booths. It was slightly less crowded, but not that much. Here’s a picture:

The industry and sponsors session, happening in parallel with the poster session, on December 11.

You can’t see it in the above photo, but the National Security Agency (!!) had a booth in that room. I have a little connection with the NSA: they are funding my fellowship, and I used to work there. I later would meet a former collaborator of mine from the NSA, who I hadn’t seen in many years but instantly recognized when I saw that collaborator roaming around. However, I have had no connection with the NSA for a long time and know pretty much nothing about what they are doing now, so please don’t ask me for details. While I was there I also spoke with researchers from DeepMind and a few other companies. At least for DeepMind, I have a better idea of what they are doing.

I had a pre-planned lunch with a group, and then we attended Bengio’s keynote. Yes, that Bengio who also spoke at ICRA 2019. He is constantly asked to give talks. Needless to say, the large room was packed. Bengio gave a talk about “System I and System II” in Deep Learning. Once again, I felt fortunate to have digested Thinking, Fast and Slow earlier, as you can see in my 2017 book reading list. You can find the SlidesLive recording of his talk online. There was another poster session after the talk (yes, more bumping into people and apologizing) and then I got some food at a cocktail-style dinner event that evening.

The second day was similar to the first, but with two notable differences. First, I attended a town hall meeting, where NeurIPS attendees were able to voice their concerns to the conference organizers. Second, in the evening, there was a Disability in AI event, which is a newer affinity group like the Queer in AI and Black in AI groups. At those two events, I met some of the people who I had been emailing earlier to ask about and arrange closed captioning on videos and sign language interpreting services. The Disability in AI panel talked about how to make the conference more accessible to those with disabilities. The panel members spoke about their experiences with disabilities — either personal or from a friend/relative — some of which were more severe than others. There’s some delicacy needed when describing one’s disability, such as to avoid insulting others who might have a more severe form of the disability and to avoid revealing disabilities that are hidden (if that’s important, for me it’s the opposite), but I think things proceeded OK.

I used a mix of captioning and sign language interpreting services at NeurIPS. You can find videos of NeurIPS talks on SlidesLive, complete with (some) closed captioning, but it’s not the best. The interface for the captions seems pretty unusable — it strangely was better during live recordings, when the captioning was automated. Scrolling through the myriad of workshop and conference videos on SlidesLive is also annoying. This week, I plan to write some feedback to SlidesLive and the NeurIPS conference organizers offering some advice.

I requested the interpreting for specific events where I would be walking around a lot, such as in the poster sessions, and it worked pretty well considering the stifling crowds. There was also another student at the conference who brought a team of two interpreters, so on occasion we shared the services if we were in the same events or talks. The panel discussed the idea of having a permanent sign language interpreting service from NeurIPS, which would certainly make some of my conference preparation easier! One person at the Disability in AI panel noted that “this conference is so large that we actually have two people using sign language interpreters” which is pretty much unheard of for an academic conference that doesn’t specialize in access technology or HCI more broadly.

It was nice to talk with some of the organizers, such as NeurIPS treasurer Marian Stewart Bartlett of Apple, who knew me before I had introduced myself. I also knew a little about Bartlett since she was featured in NeurIPS President Terrence Sejnowski’s Deep Learning book. Sejnowski was also briefly at the Disability in AI reception.

For the last two days of NeurIPS (December 13 and 14), we had workshops. The workshops might be the best part of NeurIPS; there are so many of them covering a wide variety of topics. This is in contrast to some other conferences I’ve attended, where workshops have been some of the least interesting or sparsely-attended portions of the conference. I don’t mean to say this negatively, it’s just my experience at various conferences. You can find the full list of workshops on the conference website, and here are the ones that seemed most interesting to me:

• Learning with Rich Experience
• Retrospectives: A Venue for Self-Reflection in ML Research
• Machine Learning for Autonomous Driving
• Bayesian Deep Learning
• Robot Learning: Control and Interaction in the Real World
• Tackling Climate Change with Machine Learning
• Fair ML in Health Care
• Deep Reinforcement Learning

I attended portions of two workshops on December 13: “Learning with Rich Experience” and “Retrospectives.” The former featured talks by Raia Hadsell of DeepMind and Pieter Abbeel of UC Berkeley. By “rich experience,” I think the workshop focuses on learning not just from images, but also videos and language. Indeed, that seems to have been featured in Hadsell and Abbeel’s talks. I would also add that John Canny has a few ongoing projects that incorporate language in the context of explainable AI for autonomous driving.

The retrospectives workshop was quite a thrill. I was there for three main reasons: (a) to understand the perspective of leaders in the ML community, (b) because many of the presenters are famous and highly accomplished, and (c) the automated captioning system would likely work better for these talks than those with more dense, technical terms. Some of the talks were by:

• Emily Denton, a research scientist at Google, who has done a lot of ground-breaking work in Generative Adversarial Networks (GANs). Her talk was largely a wake-up call to the machine learning community in that we can’t ignore the societal effects of our research. For example, she called out a full conference paper at NeurIPS 2019 which performed facial reconstruction (not recognition, reconstruction) from voice.
• Zachary Lipton, a professor at CMU and well-known among the “debunking AI hype” community. I’m embarrassed that my only interaction with him is commenting on his book reading list here. I’m probably the only person in the world who engages in that kind of conversation.
• David Duvenaud, a professor at the University of Toronto whose paper on Neural Ordinary Differential Equations (ODEs) won the best paper award at NeurIPS 2018 and has racked up over 200 citations as of today. Naturally, his talk was on all the terrible things people have said about his work, including himself but also some journalists. Seriously, did a journalist really say that Duvenaud invented the concept of an ODE?!?!? They date back to the 1600s if not earlier.

Jürgen Schmidhuber also gave a talk in this workshop.

Jürgen Schmidhuber giving a talk about Predictability Minimization and Generative Adversarial Networks at the "Retrospectives in Machine Learning" workshop. Sorry for the terrible quality of the photo above. I tried to do a panorama which failed badly, and I don't have another photo.

I don’t know why this workshop was assigned to be in a such a small room; I’m sitting in the back row in that photo. I think those who got actual chairs to sit on were in the minority. A few minutes after I took the photo above, Yoshua Bengio came and sat in front of me on the table, next to my iPad which was spitting out the SlidesLive captions. If Bengio was fuming when Schmidhuber dismissed GANs as a “simple application” of his 90s-era idea, he didn’t show it, and politely applauded with the rest of us after Schmidhuber’s talk.

In case you are new to this history, please see this NYTimes article and this Quora post for some context on the “Schmidhuber vs Hinton/LeCun/Bengio/Goodfellow” situation regarding GANs and other machine learning concepts, particularly because GANs are mentioned as one of Bengio’s technical contributions in his Turing Award citation.

Sometime in the middle of the workshop, there was a panel where Bengio, along with a few other researchers, talked about steps that could be done to improve the overall process of how research and science gets done today. Some of the topics that came up were: removing best paper awards, eliminating paper reviews (!!), and understanding how to reduce stress for younger researchers. It was refreshing to see Bengio talk about the latter topic about the pressure graduate students face, and Bengio also acknowledged that paper citations can be problematic. To put this in perspective, Bengio had the most Google Scholar citations in all of 2018, among all computer scientists, and I’m sure he was also the most cited across any field. As of today (December 22, 2019) Google Scholar shows that Bengio has 62,293 citations in 2018 and then 73,947 in 2019. Within 10 years, I would not be surprised if he is the most cited person of all time. There are a few online rankings of the most cited scholars, but most are a few years old and need updating. Joelle Pineau of McGill University brought up some good points in that while we may have high stress in our field, we are still far more fortunate than many other groups of people today, prompting applause.

Finally on the last day of the conference, the Deep Reinforcement Learning (DeepRL) workshop happened. This was one of the most, if not the most, popular NeurIPS workshop. It featured more than 100 papers, and unlike most workshop papers which are 2-4 pages, the DeepRL papers were full 8-page length papers, like normal conference papers. The workshop has a program committee size rivaling that of many full conferences! The highlights of the DeepRL workshop included, of course, AlphaStar from DeepMind and Dota2 from OpenAI. For the latter, OpenAI finally released their monstrous 66-page paper describing the system. Additionally, OpenAI gave a presentation about their Rubik’s cube robot.

NeurIPS 2019 concluded with a closing reception. The food and drinks were great, and amounted to a full dinner. During the closing reception, while music was playing nearby, Andrew Ng in his famous blue shirt attire was politely taking pictures with people who were lining up to meet him. I was tempted to take a picture of him with my phone but decided against it — I don’t want to be that kind of person who takes pictures of famous people. For his sake, I hope Ng wasn’t standing there for the entire four-hour reception!

Overall, after my four-day NeurIPS experience, here are my thoughts about networking:

• I think I was better than usual at it. NeurIPS is so large, and Berkeley is so well-represented, that there’s a good chance I’ll see someone I know when roaming around somewhere. I usually try to approach these people if I see them alone. I spoke with people who I had not seen in many years (sometimes as high as six years!), most of who were at Berkeley at some point.
• In a handful of cases, I made an appointment to see someone “at this coffee break” or “at this poster session”. Those require lots of preparation, and are subject to last-minute cancellations. I probably could have done a better job setting pre-arranged meetings, but the paper deadline I had just before coming meant I was preoccupied with other things.
• I tried to talk to anyone who was willing to talk with me, but the quality of my conversations depended on the person. I was approached by someone who is doing an online master’s program at a different university. While we had a nice conversation, there is simply no way that I would ever be collaborating with that person in the future. In contrast, it is much easier for me to talk at length with robotics PhD students from Stanford, CMU, or MIT.

In the morning of December 15, I explored Vancouver. Given my limited time, I decided to go for a run. (Yes, what a big surprise.) I hope I can come back here next year, and do more extensive running in Stanley Park. NeurIPS 2020 will return to this same exact place. My guess is that by booking two years in a row, NeurIPS could save money.

A morning run in Stanley Park, in chilly Vancouver weather.

NeurIPS 2019 did not have any extracurricular highlights like the visits to Skansen or City Hall that we had at IJCAI 2019, or like the dinner reception at ICRA 2018, but the real advantage of NeurIPS is that I think the caliber of science is higher compared to other conferences.

The convention center seemed fine. However, I didn’t see a lot of extra space, so I don’t know how much more NeurIPS can absorb when it returns to Vancouver in 2020.

Remember how I wanted to come back to Sydney? NeurIPS 2021 is going to be held there, so perhaps I can return to Sydney. Additionally, according to some discussion at the town hall meeting mentioned earlier, NeurIPS will be held in New Orleans in 2022 and 2023, and then it will be in San Diego in 2024. I am wondering if anyone knows how to find statistics on the sizes and capacities of convention centers? A cursory search online didn’t yield easily digestible numbers.

In terms of “trends,” there are too many to list. I’m not going to go through a detailed list of trends, or summaries of the most interesting papers that I have seen, because I will do that in future blog posts. Here are higher-level trends and observations:

• Deep reinforcement learning remains hugely popular, though still highly concentrated within institutions such as Google, DeepMind, OpenAI, Stanford, and Berkeley.
• Meta-learning remains popular and is fast-growing.
• Fairness and privacy are fast-growing and becoming extremely popular, especially with (a) reducing societal biases of machine learning systems, and (b) health care in all aspects. In addition, it is no longer an excuse to say “we are just scientists” or “we were not aware of machine learning’s unintended consequences”. This must be part of the conversation from the beginning.
• Climate change is another fast-growing topic, though here I don’t know what the trend is like, since I don’t read papers about climate change and machine learning. I didn’t attend the climate change workshop since it conflicted with the DeepRL workshop, but I hope there was least some work that combines machine learning with nuclear energy. Nuclear energy is one of the most critical and readily usable “carbon-free” technologies we have available.
• Industry investment in machine learning continues to be strong. No signs of an “AI Winter” to me … yet.
• Diversity and inclusion, transparency, and fairness are critical. To get some insights, I encourage you to read the NeurIPS medium blog posts.

It’s great to see all this activity. I’m also enjoying reading other people’s perspectives on NeurIPS 2019, such as those from Chip Huyen. Let me know if I’m missing any interesting blog posts!

You can find some of the pictures I took at NeurIPS in my NeurIPS 2019 Flickr album. They are arranged in roughly chronological order, but there’s some random-ness to the ordering. Sorry about that. In the meantime, there are still several other NeurIPS-related topics that I hope to discuss. Please stay tuned for some follow-up posts.

# Dense Object Nets and Descriptors for Robotic Manipulation

Machine learning for robotic manipulation is a popular research area, driven by the combination of larger datasets for robot grasping and the ability of deep neural networks to learn grasping policies from complex, image-based input, as I described in an earlier blog post. In this post, I review two papers from the same set of authors at MIT’s Robot Locomotion Group that deal with robotic manipulation. These papers use a concept that I was not originally familiar with: dense object descriptors. I’m glad I read these papers, because the application of dense object descriptors for robotic manipulation seems promising, and I suspect we will see a myriad of follow-up works in the coming years.

# Paper 1: Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation (CoRL 2018)

This paper, by Florence, Manuelli, and Tedrake, introduced the use of dense descriptors in objects for robotic manipulation. It was honored with the best paper award at CoRL 2018 and got some popular, high-level press coverage.

The authors start the paper by wondering about the “right object representation for manipulation.” What does that mean? I view “representation” as the way that we encode data which we then pass as input to a machine learning (which means deep learning) algorithm. In addition, it would be ideal if this representation could be learned or formed in a “self supervised” manner. Self supervision is ideal for scaling up datasets, since it means manual labeling of the data is unnecessary. I’m a huge fan of self supervision, as evident by my earlier post on “self supervision” in machine learning and robotics.

The paper uses a dense object net to map a raw, full-resolution RGB image to a “descriptor image.” (Alternatively, we can call this network a dense descriptor mapping.) Concretely, say that function $f(\cdot)$ is the learned dense descriptor mapping. For an RGB image $I$, we have:

for some dimension $D$, which in this paper is usually $D=3$, but they test with some larger values, and occasionally with $D=2$.

I originally thought the definition of $f(I)$ must have had a typo. If we are trying to map a full resolution RGB image $I$ to some other “space” for machine learning, then surely we would want to decrease the size of the data, right? Ah, but after reading the paper carefully, I now understand that they need to keep the same height and width of the image to get pixel correspondences.

The function $f(I)$ maps each pixel in the original, three-channel image $I$, to a $D$-dimensional vector. The authors generally use $D=3$, and compared to larger values of $D$, using $D=3$ has the advantage in that descriptors can be visualized easily; it means the image is effectively another $H\times W\times 3$-dimensional image, so upon some normalization (such as to convert values into $[0,255]$) it can be visualized as a normal color image. This is explained in the accompanying video, and I will show a figure later from the paper.

What does the data and then the loss formulation look like for training $f$? The data consists of tuples with four elements: two images $I_a$ and $I_b$, then two pixels on the images, $u_a$ and $u_b$, respectively. Each pixel is therefore a 2-D vector in $\mathbb{R}^2$; in practice, each value in $u_a$ or $u_b$ can be rounded to the nearest pixel integer. We write $f(I_a)(u_a)$ for the channel values at pixel location $u_a$ in descriptor image $f(I_a)$. For example, if $f$ is the identity function and $I_a$ a pure white image, then $f(I_a)(u_a) = [255,255,255]$ for all possible values of $u_a$, because a white pixel value corresponds to 255 in all three channels.

There are two loss functions that add up to one loss function for the given image pair:

and

The final loss for $(I_a,I_b)$ is simply the sum of the two above:

Let’s deconstruct this loss function. Minimizing this loss will encourage $f$ to map pixels such that they are close in descriptor space with respect to Euclidean distance if they are matches, and far away — by at least some target margin $M$ — if they are non-matches. (We will discuss what we mean by matches and non-matches shortly.) The target margin was likely borrowed by the famous hinge loss (or “max margin” loss) that is used for training Support Vector Machine classifiers.

Here are two immediate, related thoughts:

• This is only for one image pair $(I_a,I_b)$. Surely we want more data, so while it isn’t explicitly stated in the paper, there must be an extra loop that samples for the image pair, and then samples the pixels in them.

• But how many pixels should we sample for a pair of images? The authors say they generate about one million pixel pairs! So, if we want to split our matches and non-matches roughly evenly, this just means $N_{\rm matches} \approx 500,000$ and $N_{\rm non-matches} \approx 500,000$. Thus, any two images provide a huge training data, since the data has to include pixels, and thus we can literally randomly draw the pixels from the two images.

To be clear on the credit assignment, the above math is not due to their paper, but actually from prior work (Schmidt et al., ICRA 2017). The authors of the CoRL 2018 paper use this formalism to apply it to robotic manipulation, and provide some protocols that accelerate training, to which we now turn.

The image above concisely represents several of the paper’s contributions with respect to improving the training process of descriptors, and particularly in the realm of robotic manipulation. Here, we are concerned with grasping various objects, so we want descriptors to be consistent among objects.

A match for images $I_a$ and $I_b$ at pixels $u_a$ and $u_b$ therefore means that the pixels located at $u_a$ and $u_b$ point to the same part of the object. A non-match is, well, basically everything else. In the image above, matching pairs of pixels are in green, and non-matching pairs are in red.

Mandatory “rant-like” side comment: I really wish the colors were different. Seriously, almost ANY other color pairing is better than red-green. I wish conference organizers could ban pairings of red-green in papers and presentations.

There are several problems with simply randomly drawing pixels $u_a$ and $u_b$. First, in all likelihood we will get a non-match (unless we have a really weird pair of images), and thus the training data is heavily skewed. Second, how do we ensure that matches are actually matches? We can’t have humans label manually, as that would be horrendously difficult and time-consuming.

Some of the related points and contributions they made were:

• By using prior work on 3D reconstruction and 3D change detection, the authors are able to isolate the pixels that correspond to the actual object. These pixels, whether or not they are matches (and it’s important to sample both matches and non-matches!), are usually more interesting than background pixels.

• It is beneficial to use domain randomization, but it should be done on the background so that the learned descriptors are not dependent on background to figure out locations and characteristics of objects. Note how the previous point about masking the object in the image enables background domain randomization.

• There are several strategies to enforce that the same function $f$ can apply to different object classes. An easy one is if images $I_a$ and $I_b$ have only one object each, and those objects are of different classes. Thus, every pair of sampled pixels among those two images is a non-match (as I believe all background pixels are considered non-matches).

There are a variety of additional contributions they make to the training process. I encourage you to read the paper to check out the details.

The majority of the experiments in the paper are for validating that the resulting descriptors make sense. By that, I mean that the descriptors are consistent across objects. For example, the same shoe, when seen from different camera perspectives, should have descriptors that are able to match the different components of the shoe.

The above image is illuminating. They use descriptors with $D=3$ and are able to visualize the descriptor images, shown in the second and fourth rows. Note that the colors in the descriptor images should not be interpreted in any way other than the fact that they indicate correspondence. That is, it would be equally appealing and satisfying to see the same descriptor images above, except with all the yellows replaced with greens, all the purples replaced with blue, and so on. What matters is that, among different images of the same object, we see the same color pattern for the objects (and ideally the background).

In addition, other ablation experiments show that their proposed improvements to the training process actually help. This is great stuff!

Their last experiment shows a real-world robot grasping objects. They are not learning a policy; given a target to grasp, they execute an open loop trajectory. What’s interesting from their experiment is that they can use descriptors to grasp the same part of an object (e.g., a shoe) even if the shoe is seen at different camera angles or from different positions. It even works when they use different shoes, since those still have the same general structure of a “shoe class” and thus descriptors can be consistent even among different class attributes.

# Paper 2: Self-Supervised Correspondence in Visuomotor Policy Learning (arXiv 2019)

This paper can be viewed as a follow-up to the CoRL 2018 paper; unsurprisingly, it is by the same set of authors. Here, the focus is on using dense descriptors for training a visuomotor policy. (By “visuomotor” we mean a robot which sets “motor torques” based on image-based data.) The CoRL 2018 paper, in contrast, focused on simply getting accurate correspondences set up among objects in different images. You can find the arXiv version here and the accompanying project website here.

I immediately found something I liked in the paper. In the figure above, to the left, you see the most common way of designing a visuomotor policy. It involves passing the image through a CNN, and then getting a feature vector $\mathbf{z} \in \mathbb{R}^Z$. Then, it is concatenated with other non-image based information, such as end-effector information and relevant object poses. I believe this convention started with the paper by (Levine, Finn, et al., JMLR 2016), and indeed, it is very commonly used. For example, the Sim-to-Real cloth manipulation paper (Matas et al., CoRL 2018) used this convention. It’s nice when researchers think outside of the box to find a viable alternative.

Concretely, we get the action from the policy and the past set of observations via $\mathbf{a}_t = \pi_\theta (\mathbf{o}_{0:t})$, and we have

representing the observation space. The usual factorization is:

where $Z$ is of much smaller dimensionality than the size of the full image $\mathbf{o}_{\rm image}$ (height times width times channels). This is a logical factorization that has become standard in the Deep Learning and Robotics literature.

Now, what is the main drawback of this approach? (There better be one, otherwise there would be no need to modify the architecture!) Florence and Manuelli argue that we should try and use correspondence information when training policies. Right now, doing end-to-end learning is popular, as are autoencoding methods, but why not explicitly enforce correspondence information? One can do this by enforcing $\mathbf{z}$ to encode pose information via setting an appropriate loss function with a target vector that has actual poses.

I was initially worried. Why not automatically learn $\mathbf{z}$ end-to-end? It seems risky to try and force $\mathbf{z}$ to have some representation. Poses, to be sure, are intuitively ideal, but if there’s anything machine learning has taught us over the past decade, it is probably that we should favor letting the data automatically determine latent features. The argument in the paper seems to be that learning intermediate representations (i.e., the descriptors) with surrogate objectives is better with less data, and that’s a fair point.

Prior work has not done this because:

• Prior work generally focuses on rigid objects, and pose estimation does not apply to deformable objects. I think “pose estimation” relies on assuming rigid objects. Knowing the 6 DoF pose of any point on the object means we know the full object configuration, assuming its shape is known beforehand.

• While other prior work interprets $\mathbf{z}$ as encoding spatial information, it is not trained directly for correspondence.

The authors propose directly using dense correspondence models in the learning process. They suggest four options, showing that a lot is up to discretion of the designer (but I don’t see any extensive comparisons among their four methods). Let there be a dense descriptor pre-trained model $f_{\theta_v}^{\rm dense}(\cdot)$ that was trained as in their CoRL 2018 paper. We have:

which provides the predicted location of descriptors and is used in three of their four proposed ways of incorporating correspondence with descriptors. We have $\mathbf{z} \in \mathbb{R}^{P \times D}$ where $P$ is the number of descriptors and $D$ is the descriptor dimension, usually two or three. Descriptors can be directly interpreted as 2D pixels or 3D coordinates, making $\mathbf{z}$ highly interpretable — a good thing as “interpretability” of feature vectors is something that everyone gets frustrated about in Deep Learning.

This raises an interesting question: how do we actually get $\{d_1, \ldots, d_P\}$? We can get a fixed reference image, say of the same object we’re considering, except in a different pose (that’s the whole point of using correspondences). Descriptors can also be optimized by backpropagation. Given the number of descriptors, which is a hyperparameter, the descriptors are combined with the image input to get $\mathbf{z}$. This “combination” is done with a “spatial softmax” operation. Like the normal softmax, the spatial softmax operation has no parameters but is differentiable. Hence, the objective used in the overall, outer loss function (which is behavior cloning, as the authors later describe) is used to pass though gradients via backpropagation, and then the spatial softmax is the local operation passing gradients back to the descriptors, which are directly adjusted via gradients. The spatial softmax operation is denoted with $f^C$, and the reference for it is attributed to (Levine, Finn, et al., JMLR 2016).

They combine correspondence with imitation learning, by using behavior cloning with a weighted average of $L_1$ and $L_2$ losses — pretty standard stuff. Remember again that for merging their work with descriptors, they don’t need to use behavior cloning, or imitation learning for that matter. It was probably just easiest for them to get interesting robotics results that way.

Their action space is

where $\mathcal{A} = SE(3) \times \mathbb{R}^+$. For more details, see the paper.

Some of their other contributions have to do with the training process, such as proposing a novel data augmentation technique to prevent cascading errors, and a new technique for multi-camera time synchronized dense spatial correspondence learning. The latter is used to help train in dynamic environments, whereas the CoRL 2018 paper was limited to static environments.

They perform a set of simulated and then real experiments:

• Simulated Experiments: these involve using the DRAKE simulator. I haven’t used it before, but I want to learn about it. If it is not proprietary like MuJoCo, then perhaps the research community can migrate to it? They benchmark a variety of methods. (Strangely, some numbers are missing from Table I. I can understand why some are not tested, but not all of them.) They have many methods, with the differences arising from how each acquires $\mathbf{z}$. That’s the point of their experiments! Due to the simulated environments, they can encode ground truth positions and poses in $\mathbf{z}$ as an upper-bound baseline.

The experiments show that their methods are better than prior work, and are nearly as good as the ones with ground truth in $\mathbf{z}$. There is also some nice analysis involving the convex hull of the training data (which is applicable because of the 2D nature of the table). If data is outside of that convex hull, then effectively we see an “out of distribution” data point, and hence policies have to generalize. Policies with 3D information seem to be better able to extrapolate outside the training distribution than those with only 2D information.

• Real-World Experiments: for these, they use a Kuka IIWA LBR robot with a parallel jaw gripper. As shown in the images below, they are able to get highly accurate descriptors. Essentially, one point on one object should be consistently labeled as the corresponding point on the object if it is in a different location, or if we use similar objects in the same class, such as using a different shoe type for descriptors trained on shoe-like objects.

They argue their method is better because they use correspondence — fair enough. For the experiment setup, their method is already near the limit of what can be achieved, since results are close to those of baselines with ground truth information in $\mathbf{z}$.

# Closing Thoughts

Some thoughts and takeaways I have from reading these two papers above:

• Correspondence is a fundamental concept for computer vision. Because we want robots to learn things from raw images, it therefore seems logical that correspondence is also important for robotic manipulation. Correspondence will help us figure out how to manipulate objects in a similar way when they are oriented at different poses and perspectives.

• Self supervision is more scalable for large datasets than asking humans to manually label. Figuring out ways to automate labeling must be an important component of any proposed descriptor-based technique.

• I am still confused about how exactly we can get pixel correspondences via depth images, camera poses, and camera intrinsics, as described in the paper. It makes sense to me with some vague intuition, but I need to code and experience the full pipeline myself to actually understand.

# International Symposium on Robotics Research (ISRR) 2019, Day 5 of 5

On October 10, the last official day of ISRR 2019, we had a day-long excusion to Halong Bay. I did not request remote captioning for this day because I do not know how it could possibly work for an outdoor drive and cruise with no WiFi, and I would rather be taking pictures with my phone than reading my iPad in detail.

We had a two-hour bus ride from the hotel in Hanoi to Halong Bay. I sat near the front and was able to understand the words our tour guide was saying. He was an amusing and engaging local who spoke fluent English. He gave a 10-minute history of Vietnam and commented on the wars with France (1946 to 1954) and America (1964 to 1975).

After he finished his historical account, he said we were free to ask him questions. I immediately asked him how Vietnamese think of United States President Donald Trump.

He replied with a mix of both amusement and puzzlement: “Donald Trump is very … uh … strange. He’s like … uh … an actor. He’s … very different from other leaders.”

That is certainly accurate. He said that when Trump and Kim visited the hotel we were at for their nuclear “summit,” local Vietnamese were all clamoring to get a view of the two leaders. He then concluded his answer to my question by saying that Vietnamese are not very political. Uh oh, I thought, though I did not press him on the issue.

After the bus ride, and a stop by a jewelry store (some conference attendees bought jewelry for their spouses) we finally arrived at Halong Bay. The area we went to seemed like a tourist destination, with lots of tall and nice-looking buildings compared to downtown Hanoi. I also noticed, however, that while the outsides of the buildings looked great, the insides looked like they were run down or under construction. I am not sure what the plan is with with Halong Bay, but I hope these buildings are under construction (rather than abandoned).

The tour guides split us into several groups, and each group went on a small cruise ship. On the ship, we ate a Vietnamese lunch, which included some similar dishes we had earlier at the conference, such as prawns and squid. Those two dishes are really popular in Vietnam! It is a lot different from my seafood diet in America, which I associate with “Salmon” or “Halibut.” We took a 30-minute tour of a cave, and then we went back on our boats to return to the buses, which brought us back to Hanoi.

At Hanoi, I was persuaded by a few other students to join them for dinner at the same place Barack Obama famously ate when he visited Vietnam. Unsurprisingly, the restaurant is filled with pictures of Obama and even has a menu item named “Combo Obama,” representing what he ate.

On the following day, October 11, I performed some final sight-seeing of Hanoi, and finally got to try out their famous (and delicious) coconut coffee, which blends coconut and black coffee. I also toured the Vietnamese Museum of National History. Most of the exhibits concerned Vietnam’s fights against foreign invaders, most notably the French and then (obviously) the Americans. After I spent an hour walking through the museum, I thought in awe about Vietnam’s transformation from war-torn territory to a rapidly developing country. Given all the diplomatic difficulties the United States has with countries such as Russia, China, North Korea, Iran, and Syria, the improved US-Vietnam relations give me hope that one day we can consider these countries allies, rather than adversaries.

On my trip back, I had a long layover at Incheon, so I first napped for a few hours in the “nap area” and then went to the Skydeck lounge to catch up on email, administrative work, and (obviously) writing these blog posts. It cost me 48 USD to stay in the Skydeck Lounge for six hours, but I think it was mostly worth the price, and essentially anyone with a boarding pass (even economy passengers like me) can access it. It is not as good as the Asiana Business Class lounge, but it is good enough for me.

Once the time came, I boarded my flight back to San Francisco, to return to normal life.

# International Symposium on Robotics Research (ISRR) 2019, Day 4 of 5

The third full conference day was much easier on me, because I did not have to think about rehearsing my talk.

For today, I also did something I wish I had done earlier: taking pictures of students giving talks, and then emailing them the pictures. I sent all emails by the end of the day, and eventually heard back from all the recipients with appreciation. I hope they post them on their websites. I am not sure why I did not do this for all the student presenters, because this seems like an obviously easy way to “network” with them. I might be seeing these students in future conferences or employment settings.

The captioners struggled to understand some of the faculty speakers. They also told me two new issues: that there was an echo from the room, and that every time I type something into my iPad (e.g., when switching tabs) they hear it and it overrides the microphone’s sound. I am at a loss on why there was an echo in the room, and I was wondering why I did not know about the “iPad typing issues” beforehand. Once again, having some kind of checklist where I can go through common issues would be great.

Fortunately, the captioners were able to understand Peter Corke’s talk today, and his was among the most relevant to my research area. (Incidentally, Peter Corke was the chair for ICRA 2018 in Brisbane, which I wrote about in several blog posts here.) Hence, I enjoyed Corke’s talk; he contrasted the computer vision and robotics fields by describing the style of papers in each field, and proposed several “assertions” about how the robotics community can make more research progress, similar to how the computer vision community made substantial progress with ImageNet competitions.

Before the talks concluded, Oussama Khatib made a few announcements. He presented a few slides about the history of ISRR and the closely related conference on experimental robotics, ISER. He then made the grand reveal for where ISRR 2021 would be located. (Remember, this conference only happens once every two years.)

And … drum roll please: ISRR 2021 will be located in Zurich, Switzerland, from July 19 to 23! It will also be co-located with a few other robotics conferences at that time, along with a “Joint Robotics Congress” which I hope means we can talk with some policy makers from certain countries. I hope I can submit to, and attend, ISRR 2021!

We wrapped up the day with the farewell reception, which was a full dinner at the conference hotel (the Sofitel Legend Metropole). This was a fixed set menu of Vietnamese food, and included:

• Crab soup, with the usual broth that’s standard in Vietnamese cuisine. Again, I suspect it is some kind of fish sauce.

• Chicken salad with onions, sprouts, and herbs.

• Fried prawns with passion fruit sauce and vegetable fried rice. These prawns were huge!

• Sticky rice and lotus desserts.

• Unlimited refills for beer and wine.

The seating situation was ideal for me, because I was sitting at a table in the corner, and only had one person, another student, next to me. A second person next to me would hypothetically increase the sound nearby by nearly a factor of two. The student was nice and I was able to communicate reasonably well. During the dinner, the captioners did a great job recording the conversations happening at my table. I applaud them for their performance that night. Discussions ranged from food in Vietnam, aspects of various robotics conferences, how to get in PhD programs, how to read research papers, details about Berkeley itself, and a bunch of other things I can’t remember.

After these great meals, I conclude that ISRR, though it may be a small conference, is leaving a strong impression for high quality food.

# International Symposium on Robotics Research (ISRR) 2019, Day 3 of 5

Video of my talk at ISRR 2019. The YouTube version is here. Courtesy of Masayuki Inaba.

The second conference day proceeded in a similar manner as the first day, with a set of alternating faculty and then paper talks. Some issues that came up were self-driving cars (e.g., in Henrik Christensen’s talk), climate change (with the obligatory criticism of Donald Trump) and faculty taking leaves to work in industry (which has hurt academia). I also enjoyed Frank Park’s talk on model-based physics, which cited some of the domain randomization work that is essential to what I am doing lately.

In a shameless plug, the highlight of the day was me. OK, only joking, only joking.

Like the other paper presenters, I gave a rapid 5 minute presentation on my ISRR 2019 paper about robot bed-making. It’s really hard to discuss anything substantive in 5 minutes but I hope I did a reasonable job. I cut down my humor compared to my UAI 2017 talk, so there was not as much lauging from the audience. In my talk slides, I referenced a few papers by other conference attendees, which hopefully they appreciated. I will keep this strategy in mind for future talks, in case I know who is attending the talk.

The good news is that I have my talk on video, and you can see it at the top of this post. The video is courtesy of Professor Masayuki Inaba of the University of Tokyo. He was sitting next to me in the front row, and I saw that he was recording all the presentations with his phone. He graciously gave me the video of my talk. It is dark, but that’s due to the lighting situation in the room; it was also wreaking havoc on my attempts to get high quality pictures of presenters.

In the rare cases when I have a video of one of my talks, I always get nervous when watching it, because I see countless things that make me feel embarrassed. Fortunately, from looking at the video above, I don’t think I made a fool out of myself. What I like are the following:

• My talk was five minutes flat, exactly the time limit. No, I am not good enough to normally hit the allotted time limit exactly. (I know the video is 5:04 but if you get rid of the 0.5 seconds at the start and the 3.5 seconds at the end, that’s the span of my actual talk.) Before giving this talk, I performed an estimated 20 practice talks total, about 8 of which involved this exact talk after several improvements and iterations, and my average time was 5:10.

• I did a reasonably good job looking at the audience, and in addition, I looked at a variety of different directions (and not just to one person, for example). Spending an entire talk looking at one’s laptop is a sure way to make a talk boring and dis-engaging.

• My speaking volume seems to be at roughly the right level. It is tricky because I was also wearing a microphone, but I don’t think people in the audience missed stuff I was saying.

• I did not project many “uhm”s or other sounds that are indicative of not knowing what to say.

Here are some things I think I could do better:

• I am not sure if my body movement is ideal. I normally have so much energy when I’m giving a talk that I can’t help but move around a lot. (I get slightly nervous in the minutes before my talk begins, but I think this is natural.) I think I did a reasonable job not moving side-to-side too much, which is a huge bad habit of mine. But I feel a bit embarrassed by my hand movement, since it seems like I perform an endless sequence of “stop sign like” movements.

• Finally, I am not sure if this is just the way I talk or due to the microphone or video recording issues, but the automated captions did not perform as well as I would have hoped. True, it was correct in some areas, but I think if I had not given this talk, I would have a hard time understanding what I was saying!

I think that’s how I would assess myself. I will keep this for future reference when giving talks in the future.

Before coming to ISRR, I did not know each paper talk would have an additional minute left over for questions. Hopefully this can be clarified in future ISRR. We had questions after this, and one thing bears comment. I had spoken to the person managing the “robot learning” talks (the one my paper was in) that I was deaf and asked him to come next to me to repeat any questions from the audience. When the first person asked a question, I asked him to repeat it to me. But before he could do that, Ken instead came bursting forward and effectively took his spot, and repeated the question. He would do that for the other two questions. I appreciate Ken’s prompt response. Audience questions are a vanishingly small fraction of my conference experience, but they present the greatest difficulty when there is not an extra person around for assistance.

Later in my session, there was also another paper from Ken Goldberg’s lab about cloud robotics, with Nan Tian as the lead author.

We then had the interactive sessions, and here Ken stuck around by our station, helping to communicate with some of the other people. The first person who came to our station immediately rebuked me and vigorously pointed at my video. He said: That is not a bed! That is a table! That is not a bed! That is a TABLE! True, our “bed” is from a table, so I guess he was technically right?

After the interactive session, we had the banquet. This was in a reasonably nice looking building, with air conditioning machines rather than the fans that are ubiquitous in street restaurants of Hanoi. The conference chair asked that faculty and students try to sit next to each other, rather than split off into faculty-only or student-only groups.

I courageously tried most of the fixed set menu even if the food was not visually appealing to me. The food appeared to be, in order, crab soup (with fish sauce?), Vietnamese pomelo, squids with celery, prawns, and some chicken soup. I was struck by how much more “experienced” some of the other conference attendees were at eating the food. For example, I don’t eat prawns very much, so I was intently watching how others took apart the prawns and removed the meat with their utensils.

The restaurant was near the top of a building with different restaurants on each row, so I was able to take some nice pictures of Hanoi’s evening scene and all the pedestrians and motorcycles moving around. It was beautiful.

# International Symposium on Robotics Research (ISRR) 2019, Day 2 of 5

Before going to the conference room, I ate an amazing breakfast at the hotel’s buffet, which was on par with the breakfast from the Sydney hotel I was at for UAI 2017. I always face a dilemma for these cases as to when I should make yet another trip to get a new serving of fresh food. I voraciously ate the exotic fruits, such as dragonfruit and the super ripe, Vietnam-style mangoes, which are different from the mangoes I eat in Berkeley, California. Berries are the main fruits that I eat on a regular basis, but I put that on hold while I was here. I also picked up copies of an English-language newspaper about Vietnam, and would read those every morning during my stay.

After breakfast, I went to the main ISRR conference room at the hotel. I was 30 minutes early and among the first in the room, but that was because (a) I wanted to get a seating spot at the front, and (b) I needed to test my remote captioning system. I wanted to test the system with a person from Berkeley, where it was evening at the time. For this, I put a microphone at the table where the speakers would present, and set up my iPad to wirelessly connect to it. I next logged into a “meeting group” via an app on my iPad, and the captions would appear on a separate website URL on my iPad. After a few minutes, we agreed that it was ready.

Oussama Khatib, of Stanford University, started off the conference with a 30-minute talk about his research. I am aware of some of his work and was able to follow the slides reasonably well. The captioners immediately told me they had trouble with his accent. I was curious where Khatib was from, so I looked him up. He was raised in Aleppo, Syria, the city made famous by its recent destruction and warfare.

I see. A Stanford Professor was able to emerge from Aleppo in the 1950s and 1960s. I don’t know how this could happen today, and it’s sad when the government of Syria ruins opportunities for its own citizens to become reknowned world leaders. It is completely unacceptable that Bashar al-Assad is still in power. I know the phrase I’m about to say has gotten politically unpalatable in some circles, but regime change must happen in Syria.

Some of the subsequent talks were easier for the captioners to understand. Unfortunately we ran into a few more technical issues (not counting the “accent” one), such as:

• WiFi that sometimes disconnected.

• Audio that sounded inaudible with lots of “coughs” and “people nearby” according to the captioners, even though at the time they told me this, the current speaker was a foot away from the microphone I had placed on the table at the front, and no one was within 10 feet of the microphone — or coughing.

• Audio that seemed to have lots of feedback, before the captioners realized that they had to do something on their end to mute a microphone.

Technical difficulties are the main downside of remote captioning systems, and have happened every time I use remote captioning. I am not sure why there isn’t some kind of checklist for addressing common cases.

Anyway, ISRR 2019 has three main conference days, each of which consist of a series of 30-minute faculty talks, and two sets of 10 talks corresponding to accepted research papers. (Each paper talk is just 5 minutes.) After each set of 10 talks, we had “interactive sessions,” which are similar to poster sessions. There were six of these sessions, and hence 6 times 10 means there were 60 papers total at ISRR 2019. It’s a lot smaller than ICRA!

ISRR also has a notable “bimodal” age distribution of its attendees. Most of the paper presenters were young graduate students, and most of the faculty were senior. There was a notable lack of younger faculty. Also, of the 100-150 attendees that were there, my guess is that the gender distribution was roughly 15% female, 85% male. The racial composition was probably 50% White, 40% Asian, and 10% “Other”.

I couldn’t get the remote captioning working on the interactive sessions — there was a “pin” I was supposed to use, but it was not turning on no matter what I tried — so I mostly walked around and observed the posters. I also ate a lot of the great food at the interactive sessions, including more dragon fruit. The lunch after that was similarly scrumptious. Naturally, it was a buffet. ISRR definitely doesn’t shy back at providing high quality food!

For talks, the highlight of the day was, as expected, Prof. Ken Goldberg’s keynote talk. Ken gave one that was of a slightly different style compared to the other faculty talks; his weaved together his interests in art, philosophy, agriculture, robotics, and AI ethics.

Our lab also presented a paper that day on area contact models for grasping; Michael Danielczuk presented this work. I don’t know too much about the technical details, unfortunately. It seems like the kind of paper that Ken Goldberg and John Canny might have collaborated on if they were graduate students.

The conference did not provide dinner that night, but fortunately, a group of about 24 students gathered at the hotel lobby, and someone found a Vietnamese restaurant that was able to accommodate all of us. Truth be told, I was too full from all the food the conference provided, so I just ordered a small pork spring roll dish. It was piping hot that night, and the restaurant did not have adequate air conditioning, so I was feeling the heat. After we ate, I went and wandered around the lake near the hotel, snapping pictures with my phone. I wanted to make the most of my experience here.

# International Symposium on Robotics Research (ISRR) 2019, Travel and Day 1 of 5

A random 25-second video I took with my iPhone of the traffic in Hanoi, Vietnam (sound included).

I just attended the 2019 International Symposium on Robotics Research (ISRR) conference in Vietnam. It was a thrilling and eye-opening experience. I was there to present the robot bed-making paper, but I also wanted to make sure I got a taste of what Vietnam is like, given the once-in-a-lifetime opportunity. I will provide a series of blog posts which describe my experience at ISRR 2019, in a similar manner as I did for UAI 2017 and ICRA 2018.

There are no direct flights from San Francisco to Vietnam; most routes stop at one of the following cities: Seoul, Hong Kong, Taipei, or Singapore. I chose the Seoul route (technically, this means stopping at Incheon International Airport) due to cost and ideal timing. I was fortunate not to pick Hong Kong, given the current protests.

I arrived in Incheon at 4:00AM and it was nearly deserted. After roaming around a bit to explore the airport, which is regarded as one of the best in the world, I found a food court to eat, and ordered a beef stew dish. When I got it, there was a small side dish that looked like noodles, but had a weird taste. I asked the waitress about the food. She excused herself to bring a phone, which showed the English translation: squid.

Aha! I guess this is how I will start eating food that I would ordinarily not be brave enough to eat.

I used my Google Translate Pro app to tell her “Thank You”. I had already downloaded Google Translate and signed up for the 7 day free trial. That way, I could use the offline translation from English to Korean or English to Vietnamese.

I next realized that I could actually shower at Incheon for free, even as a lowly economy passenger. I showered, and then explored the “resting area” in the international terminal. This is an entire floor with a nap area, lots of desks and charging stations, some small museum-like exhibits, and a “SkyDeck” lounge that anyone (even in economy class) can attend. I should also note that passengers do not need to go through immigration at Incheon if connecting to another international flight. I remember having to go through immigration in Vancouver even though I was only stopping there to go to Brisbane. Keep that in mind in case you are using Incheon airport. It’s a true international hub.

I flew on Asiana Airlines, which is one of the two main airlines from South Korea, with the other being Korean Air. According to some Koreans I know, they are roughly equal in quality, but Korean Air is perhaps slightly better. All the flight attendants I spoke to were fluent in English, as that seems to be a requirement for the job.

As I began to board my flight to Hanoi, I looked through the vast windows of the terminal to see mountains and clouds. The scene looked peaceful. It’s hard to believe that just a few miles north lies North Korea, led by the person who I consider to be the worst modern leader today, Kim Jong Un.

I will never tire of telling people how much I disapprove of Kim Jong Un.

I finally arrived in Hanoi, Vietnam on Saturday October 5. I withdrew some Vietnamese Dong from an ATM, and spoke (in English) with a travel agent to book a taxi to my hotel. We were able to arrange the details for a full round trip. It cost 38 USD, which is a bargain compared to how much a similar driving distance would cost in the United States.

The first thing I noticed after starting the taxi ride was: Vietnam’s traffic!! There were motorcycles galore, brushing up just a few centimeters away from the taxi and other cars on the road. Both car drivers and motorcyclists seemed unfazed at driving so close to each other.

I asked the taxi driver how many years he has been driving. He initially appeared confused by my question, but then responded with: two.

Well, two is better than zero, right?

The taxi driver resumed driving to the hotel, whisking out his smart phone to make a few calls along the way. I also saw a few nearby motorcyclists looking at their smartphones. Uh oh.

And then there is the honking. Wow. By my own estimation, I have been on about 150 total Uber or Lyft rides in my life, and in that single taxi ride to the hotel in Hanoi, I experienced more honks than all those Uber or Lyft rides combined.

I thought, in an only half-joking sense, that if I were in Nguyễn Phú Trọng’s position, the first thing I would do is to strictly enforce traffic laws.

We survived the ride and arrived at the hotel: the Sofitel Legend Metropole, a 5-star luxury hotel with French roots. I was quickly greeted by a wonderful hostess who led me to my room. She spoke flawless English. Along the way, I asked her where Kim Jong Un and Donald Trump had met during their second (and unsuccessful) nuclear summit.

She pointed to the room that we had just walked by, saying that they met there and ate dinner.

I didn’t have much to do that day, as it was approaching late afternoon and I was tired from my travel, so I slept in for a bit. I generally prefer sleeping in early for the first day, since it’s easy to sleep a few extra hours to adjust to a new time zone.

The following day, Sunday October 6, was officially the first day of the conference, but the only event was a welcome reception in the evening (at the hotel). Thus, I explored Hanoi for most of the day. And, apparently I lucked out: despite the stifling heat, there was a parade and celebration happening in the streets. Some may have had to do with the timing of October 10, 2019 as the 65th anniversary of Vietnam’s liberation from French rule.

On the streets, only one local talked to me that day; a boy who looked about twelve years old asked “Do you speak English?” I said yes, but unfortunately the parade in the background meant it was too noisy for me to understand most of the words he was saying, so I politely declined to continue the conversation, and the boy scurried around to find a person nearby who did not look Vietnamese. And there were a lot of us that day. Incidentally, walking across the streets was much easier than usual, because the police had blocked off the roads from traffic. Otherwise, we would have had a nightmare trying to navigate through a stream of incoming motorcyclists, most of whom do not slow down when they see a pedestrian in front of them.

After enough time in the heat, I cooled down by exploring an air-conditioned museum: the Vietnamese Women’s Museum. The museum described the traditional ways of family life in Vietnam, with the obligatory (historical) marriage and family rituals. It also honored Vietnamese women who served in the American War. We, of course, call this the Vietnam War.

I finally attended the Welcome Reception that evening. It was cocktail style, with mostly meat dishes. (Being a vegetarian in Asia — with the exception of India — is insanely difficult.) I spoke with the conference organizers that day, who seemed to already know me. Perhaps it was because Ken Goldberg had mentioned me, or perhaps because I had asked them about some conference details so that I could effectively use a remote captioning system that Berkeley would provide me, as I will discuss in the posts to come.

# Two Projects, The Year's Plan, and BAIR Blog Posts

Yikes! It has been a while since being active on this blog. The reason for my posting delay is, as usual, research deadlines. As I comment here, I still have a blogging addiction, but I force myself to prioritize research when appropriate. In order to keep my monthly blogging streak alive, here are three relevant updates. First, I recently wrapped up and made public two research projects. Second, I have, hopefully, a rough agenda for what I aim to accomplish this year. Third, there are several new BAIR Blog posts that we should read.

The two research projects are:

The bed-making paper will be at ISRR 2019, October 6 to 10. In other words, it is happening very soon! It will be in Hanoi, Vietnam, which is exciting as I have never been there. The only Asian country I have visited before is Japan.

We recently submitted the other project, on fabric smoothing, to arXiv. Unfortunately, we got hit with the dreaded “on hold” flag, so it may be a few more days before it gets officially released. (This sometimes happens for arXiv submissions, and we are not told the reason for why.)

I spent much of 2018 and early 2019 on the bed-making project, and then the first nine months of 2019 on fabric smoothing. These projects took an enormous amount of my time, and I learned several lessons, two of which are:

• Having good experimental code practices is a must. The stuff in my linked blog post has helped me constantly throughout my research, which is why I have it on record here for future reference. I’m amazed that I rarely employed them (except perhaps version control) before coming to Berkeley.

• Don’t start with deep reinforcement learning if imitation learning has not been tried. In the second project on fabric smoothing, I sunk about three months of research time attempting to get deep reinforcement learning to work. Then, with lackluster results, I switched to using DAgger, and voila, that turned out to be good enough for the project!

You can find details on DAgger from the official AISTATS 2011 paper, though much of the paper is for theoretical analysis on bounding regret. The actual algorithm is dead simple. Using the notation from the Berkeley DeepRL course, we can define DAgger as a four step cycle that gets repeated until convergence:

• Train $\pi_\theta(\mathbf{a}_t \mid \mathbf{s}_t)$ from demonstrator data $\mathcal{D} = \{\mathbf{o}_1, \mathbf{a}_1, \ldots, \mathbf{o}_N, \mathbf{a}_N\}$.
• Run $\pi_\theta(\mathbf{a}_t \mid \mathbf{s}_t)$ to get an on-policy dataset $\mathcal{D}_\pi = \{\mathbf{o}_1, \ldots, \mathbf{o}_M\}$.
• Ask a demonstrator to label $\mathcal{D}_\pi$ with actions $\mathbf{a}_t$.
• Aggregate $\mathcal{D} \leftarrow \mathcal{D} \cup \mathcal{D}_{\pi}$ and train again.

The DeepRL class uses a human as the demonstrator, but we use a simulated one, and hence we nicely avoid the main drawback of DAgger.

That’s it! DAgger is far easier to use and debug compared to reinforcement learning. As a general rule of thumb, imitation learning is easier than reinforcement learning, though it does require a demonstrator.

For the 2019-2020 academic year, I have many research goals, most of which build upon the prior two works or my other ongoing (not yet published) projects. I hope to at least know more about the following:

• Simulator Quality and Structured Domain Randomization. I think simulation-to-real transfer is one of the most exciting topics in robotics. There are two “sub-topics” within this that I want to investigate. First, given the inevitable mismatch between simulator quality and the real world, how do we properly choose the “right” simulator for sim-to-real? During the fabric smoothing project, one person suggested I use ARCSim instead of our in-house simulator. We tried ARCSim briefly, but it was too difficult to implement grasping. If we use lower quality simulators, then I also want to know if there are ways to improve the simulator in a data-driven way.

The second sub-topic I want to know more about is the kind of specific, or “structured”, domain randomization that should be applied for tasks. In the fabric smoothing project, I randomized camera pose, colors, and brightness, but this was done in an entirely heuristic manner. I wonder if there are principled ways to decide on what randomization to use given a computational budget. If we had enough computational power, then of course, we can just try everything.

• Combining Imitation Learning (IL) and Reinforcement Learning (RL). From prior blog posts, it is hopefully clear that I enjoy combining these two fields. I want to better understand how to optimize this combination of IL and RL to accelerate training of new agents and to reduce exploration requirements. For applications of these algorithms, I have gravitated towards fabric manipulation. It fits both of the two research projects described earlier, and it may be my niche.

For 2019-2020, I also aim to be more actively involved in advising undergraduate research. This is a new experience for me; thus far, my interaction with undergraduate researchers has been with the fabric smoothing paper where they helped me implement chunks of our code base. But now, there are so many ideas I want to try with simulators, IL, and RL, and I do not have time to do everything. It makes more sense to have undergraduates take on a lead role for some of the projects.

Finally, there wasn’t much of a post-project deadline reprieve because I needed to release a few BAIR Blog posts, which requires considerable administration. We have had several posts released in a close span over the last two weeks. The posts were ready for a long time (minus the formatting needed to get it on the actual website) but I was consumed with working on the projects, to the tune of working 14-15 hours a day, that I had to ask blog post authors to postpone. My apologies!

Here are some recent posts that are worth reading:

• A Deep Learning Approach to Data Compression by Friso Kingma. I don’t know much about the technical details, unfortunately, but data compression is an important application.

• rlpyt: A Research Code Base for Deep Reinforcement Learning in PyTorch by Adam Stooke. I am really interested in trying this new code base. By default, I use OpenAI baselines for reinforcement learning. While I have high praise for the project overall, baselines has disappointed me several times. You can see my obscenely detailed issue reports here and here to see why. The new code base, rlpyt, (a) uses the more debugging-friendly PyTorch, (b) also has parallel environment support, (c) supports more algorithms than baselines, and (d) may be more optimized in terms of speed (though I will need to benchmark).

• Sample Efficient Evolutionary Algorithm for Analog Circuit Design by Kourosh Hakhamaneshi. Circuit design is unfortunately not in my area, but it is amazing to see how Deep Learning and evolutionary algorithms can be used in many fields. If there are any remaining low-hanging fruits in Deep Learning research, it is probably in applications to areas that are, on the surface, far removed from machine learning.

As a sneak preview, there are at least two more BAIR blog posts that we will be releasing next week.

Hopefully this year will be a fruitful one for research and advising. Meanwhile, if you are attending ISRR 2019 soon and want to chat, please contact me.

# Sutton and Barto's Reinforcement Learning Textbook

It has been a pleasure reading through the second edition of the reinforcement learning (RL) textbook by Sutton and Barto, freely available online. From my day-to-day work, I am familiar with the vast majority of the textbook’s material, but there are still a few concepts that I have not fully internalized, or “grokked” if you prefer that terminology. Those concepts sometimes appear in the research literature that I read, and while I have intuition, a stronger understanding would be preferable.

Another motivating factor for me to read the textbook is that I work with function approximation and deep learning nearly every day, so I rarely get the chance to practice, or even review, the exact, tabular versions of the algorithms I’m using. I also don’t get to review the theory on those algorithms, because I work in neural network space. I always fear I will forget the fundamentals. Thus, during some of my evenings, weekends, and travels, I have been reviewing Sutton and Barto, along with other foundational textooks in similar fields. (I should probably update my old blog post about “friendly” textbooks!)

Sutton and Barto’s book is the standard textbook in reinforcement learning, and for good reason. It is relatively easy to read, and provides sufficient justification and background for the algorithms and concepts presented. The organization is solid. Finally, it has thankfully been updated in 2018 to reflect more recent developments. To be clear: it is not a deep reinforcement learning textbook, but knowing basic reinforcement learning is a prerequisite before applying deep neural networks, so it is better to have one textbook devoted to foundations.

Thus far, I’ve read most of the first half of the book, which covers bandit problems, the Markov Decision Process (MDP) formulation, and methods for solving (tabular) MDPs via dynamic programming, Monte Carlo, and temporal difference learning.

I appreciated a review of bandit problems. I knew about the $k$-armed bandit problem from reading papers such as RL-squared, which is the one that Professor Abbeel usually presents at the start of his meta-RL talks, but it was nice to see it in a textbook. Bandit problems are probably as far from my research as an RL concept can get, despite how I think they are more widely used in industry than “true” RL problems, but nonetheless I think I’ll briefly discuss them here because why not?

Suppose we have an agent which is taking actions in an environment. There are two cases:

• The agent’s action will not affect the distribution of the subsequent situation it sees. This is a bandit problem. (I use “situation” to refer to both states and the reward distribution in $k$-armed bandit problems.) These can further be split up as nonassociative or associative. In the former, there is only one situation in the environment. In the latter, there are multiple situations, and this is often referred to as contextual bandits. A simple example would be if an environment has several $k$-armed bandits, and at each time, one of them is drawn at random. Despite the seemingly simplicity of the bandit problem, there is already a rich exploration-exploitation problem because the agent has to figure out which of $k$ actions (“arms”) to pull. Exploitation is optimal if we have one time step left, but what if we have 1000 left? Fortunately, this simple setting allows for theory and extensive numerical simulations.

• The agent’s action will affect the distribution of subsequent situations. This is a reinforcement learning problem.

If the second case above is not true for a given task, then do not use RL. A lot of problems can be formulated as RL — I’ve seen cases ranging from protein folding to circuit design to compiler optimization — but that is different from saying that all problems make sense in a reinforcement learning context.

We now turn to reinforcement learning. I list some of the relevant notation and equations. As usual, when reading the book, it’s good practice to try and work out the definitions of equations before they are actually presented.

• They use $R_{t+1}$ to indicate the reward due to the action at time $t$, i.e., $A_t$. Unfortunately, as they say, both conventions are used in the literature. I prefer $R_t$ as the reward at time $t$, partially because I think it’s the convention at Berkeley. Maybe people don’t want to write another “+1” in LaTeX.

• The lowercase “$r$” is used to represent functions, and it can be a function of state-action pairs $r : \mathcal{S} \times \mathcal{A} \to \mathbb{R}$, or state-action-state triples $r : \mathcal{S} \times \mathcal{A} \times \mathcal{S} \to \mathbb{R}$, where the second state is the successor state. I glossed over this in my August 2015 post on MDPs, where I said: “In general, I will utilize the second formulation [the $r(s,a,s’)$ case], but the formulations are not fundamentally different.” Actually, what I probably should have said is that either formulation is valid and the difference likely comes down to whether it “makes sense” for a reward to directly depend on the successor state.

In OpenAI gym-style implementations, it can go either way, because we usually call something like: new_obs, rew, done, info = env.step(action), so the new observation new_obs and reward rew are returned simultaneously. The environment code therefore decides whether it wants to make use of the successor state or not in the reward computation.

In Sutton and Barto’s notation, the reward function can be this for the first case:

where we first directly apply the definition of a conditional expectation, and then do an extra marginalization over the $s’$ term because Sutton and Barto define the dynamics in terms of the function $p(s’,r | s,a)$ rather than the $p(s’|s,a)$ that I’m accustomed to using. Thus, I used the function $p$ to represent (in a slight abuse of notation) the probability mass function of a reward, or reward and successor state combination.

Similarly, the second case can be written as:

where now we have the $s’$ given to us, so there’s no need to sum over it. If we are summing over possible reward values, we will also use $r$.

• The expected return, which the agent wants to maximize, is $G_t$. I haven’t seen this notation used very often, and I think I only remember it because it appeared in the Rainbow DQN paper. Most papers just write something similar to $\mathbb{E}[\sum_{t=0}^{\infty} \gamma^tR_t]$, where the sum starts at 0 because that’s where the agent starts.

Formally, we have:

where we might have $T=\infty$, or $\gamma=1$, but both cannot be true. This is their notation for combining episodic and infinite-horizon tasks.

Suppose that $T=\infty$. Then we can write the expected return $G_t$ in a recursive fashion:

From skimming various proofs in RL papers, recursion frequently appears, so it’s probably a useful skill to master. The geometric series is also worth remembering, particularly when the reward is a fixed number at each time step, since then there is a sum and a “common ratio” of $\gamma$ between successive terms.

• Finally, we have the all important value function in reinforcement learning. These are usually state values or state-action values, but others are possible, such as advantage functions. The book’s notation is to use lowercase letters, i.e.: $v_\pi(s)$ and $q_\pi(s,a)$ for state and state-value functions. Sadly, the literature often uses $V_\pi$ and $Q_\pi(s,a)$ instead, but as long as we know what we’re talking about, the notation gets abstracted away. These functions are:

for all states $s$, and

for all states and action pairs $(s,a)$. Note the need to have $\pi$ under the expectation!

That’s all I will bring up for now. I encourage you to check the book for a more complete treatment of notation.

A critical concept to understand in reinforcement learning is the Bellman equation. This is a recursive equation that defines a policy with respect to itself, effectively providing a “self consistency” condition (if that makes sense). We can write the Bellman equation for the most interesting policy, the optimal one $\pi_*(s)$, as

where

• in (i), we apply the recurrence on $G_t$ as described earlier.
• in (ii), we convert the expectation into its definition in the form of a sum over all possible values of the probability mass function $p(a,r,s’|s)$ and the subsequent value being taken under the expectation. The $r$ is now isolated and we condition on $s’$ instead of $s$ since we’re dealing with the next return $G_{t+1}$.
• in (iii) we use the chain rule of probability to split the density $p$ into the policy $\pi_*$ and the “rest of” $p$ in an abuse of notation (sorry), and then push the sums as far to the right as possible.
• in (iv) we use the fact that the optimal policy will take only the action that maximizes the value of the subsequent expression, i.e., the expected value of the reward plus the discounted value after that.
• finally, in (v) we convert the $G_{t+1}$ into the equivalent $v_*(s’)$ expression.

In the above, I use $\sum_{x,y}$ as shorthand for $\sum_x\sum_y$.

When trying to derive these equations, I think the tricky part comes when figuring out when it’s valid to turn a random variable (in capital letters) into one of its possible instantiations (a lowercase letter). Here, we’re dealing with policies that determine an action given a state. The environment subsequently generates a return and a successor state, so these are the values we can sum over (since we assume a discrete MDP). The expected return $G_t$ cannot be summed over and must remain inside an expectation, or converted to an equivalent definition.

In the following chapter on dynamic programming techniques, the book presents the policy improvement theorem. It’s one of the few theorems with a proof in the book, and relies on similar “recursive” techniques as shown in the Bellman equation above.

Suppose that $\pi$ and $\pi’$ are any pair of deterministic policies such that, for all states $s \in \mathcal{S}$, we have $q_\pi(s,\pi’(s)) \ge v_\pi(s)$. Then the policy $\pi’$ is as good as (or better than) $\pi$, which equivalently means $v_{\pi’}(s) \ge v_\pi(s)$ for all states. Be careful about noticing which policy is under the value function.

The proof starts from the given and ends with the claim. For any $s$, we get:

where

• in (i) we expand the left hand side by definition, and in particular, the action we condition on for the Q-values are from $\pi’(s)$. I’m not doing an expectation w.r.t. a given policy because we have the action already given to us, hence the “density” here is from the environment dynamics.
• in (ii) we remove the conditioning on the action in the expectation, and make the expectation w.r.t. the policy $\pi’$ now. Intuitively, this is valid because by taking an expectation w.r.t. the (deterministic) $\pi’$, given that the state is already conditioned upon, the policy will deterministically provide the same action $A_t=\pi’(s)$ as in the previous line. If this is confusing, think of the expectation under $\pi’$ as creating an outer sum $\sum_{a}\pi’(a|s)$ before the rest of the expectation. However, since $\pi’$ is deterministic, it will be equal to one only under one of the actions, the “$\pi’(s)$” we’ve been writing.
• in (iii) we apply the theorem’s assumption.
• in (iv) we do a similar thing as (i) by expanding $q_\pi$, and conditioning on random variables rather than a fixed instantiation $s$ since we are not given one.
• in (v) we apply a similar trick as earlier, by moving the conditioning on the action under the expectation, so that the inner expectation turns into “$\mathbb{E}_{\pi’}$”. To simplify, we move the nner expectation out to merge with the outermost expectation.
• in (vi) we recursively expand based on the inequality of (ii) vs (v).
• then finally, after repeated application, we get to the claim.

One obvious implication of the proof above is that, if we have two policies that are exactly the same, except for one state where $\pi’(s) \ne \pi(s)$, then if the condition holds in the theorem above, $\pi’$ is a strictly better policy.

The generalized policy iteration subsection in the same chapter is worth reading. It describes, in one page, the general idea of learning policies via interaction between policy evaluation and policy improvement.

I often wished the book had more proofs of its claims, but then I realized it wouldn’t be suitable as an introduction to reinforcement learning. For the theory, I’m going through Chapter 6 of Dynamic Programming and Optimal Control by Dimitri P. Bertsekas.

It’s a pleasure to review Sutton and Barto’s book and compare how much more I know now than I did when first studying reinforcement learning in a clumsy on-and-off way from 2013 to 2016. Coming up next will be, I promise, discussion of the more technical and challenging concepts in the textbook.

# Domain Randomization Tips

Domain randomization has been a hot topic in robotics and computer vision since 2016-2017, when the first set of papers about it were released (Sadeghi et al., 2016, Tobin et al., 2017). The second one was featured in OpenAI’s subsequent blog post and video. They would later follow-up with some impressive work on training a robot hand to manipulate blocks. Domain randomization has thus quickly become a standard tool in our toolkit. In retrospect, the technique seems obviously useful. The idea, as I’ve seen Professor Abbeel state in so many of his talks, is to effectively randomize aspects of the training data (e.g., images a robot might see) in simulation, so that the real world looks just like another variation. Lilian Weng, who was part of OpenAI’s block-manipulating robot, has a good overview of domain randomization if you want a little more detail, but I highly recommend reading the actual papers as well, since most are relatively quick reads by research paper standards. My goal in this post is not to simply rehash the definition of domain randomization, but to go over concepts and examples that perhaps might not be obvious at first thought.

My main focus is on OpenAI’s robotic hand, or Dactyl as they call it, and I lean heavily on their preprint. Make sure you cite that with OpenAI as the first author! I will also briefly reference other papers that use domain randomization.

• In Dactyl there is a vision network and a control policy network. The vision network takes Unity-rendered images as input, and outputs the estimated object pose (i.e., a quaternion). The pose then gets fed into the control policy, which also takes as input the robot fingertip data. This is important: they are NOT training their policy directly from images to actions, but from fingertips and object pose to action. Training PPO — their RL algorithm of choice — directly on images would be horrendous. Domain randomization is applied in both the vision and control portions.

I assume they used Unity due to ease of programmatically altering images. They might have been able to do this in MuJoCo, which comes with rendering support, but I’m guessing it is harder. The lesson is to ensure that whatever rendering software one is using, make sure it is easy to programmatically change images.

• When performing domain randomization for some physical parameter, the mean of the range should correspond to reasonable physical values. If one thinks that friction is really 0.7 (whatever that means), then one should code the domain randomization using something like: friction = np.random.uniform(0.7-eps, 0.7+eps) where eps is a tuneable parameter. Real-world calibration and/or testing may be needed to find this “mean” value. OpenAI did this by running trajectories and minimizing mean squared error. I think they had to do this for at least the 264 MuJoCo parameters.

• It may help to add correlated noise to observations (i.e., pose and fingertip data) and physical parameters (e.g., block sizes and friction) that gets sampled at the beginning of each episode, but is kept fixed for the episode. This may lead to better consistency in the way noise is applied. Intuitively, if we consider the real world, the distribution of various parameters may vary from that in simulation, but it’s not going to vary during a real-world episode. For example, the size of a block is going to stay the same throughout the robotic hand’s manipulation. An interesting result from their paper was that an LSTM memory-augmented policy could learn the kind of randomization that was applied.

• Actor-Critic methods use an actor and a critic. The actor is the policy, and the critic estimates a value function. A key insight is that only data passed to the actor needs to be randomized during training. Why? The critic’s job is to accurately assess the value of a state so that it can assist the actor. During deployment, only the trained actor is needed, which gets real-world data as input. Adding noise to the critic’s input will make its job harder.

This reminds me of another OpenAI product, Asymmetric Actor-Critic (AAC), where the critic gets different input than the actor. In AAC, the critic gets a lower-dimensional state representation instead of images, which makes it easier to accurately assess the value of a state, and it’s fine for training because, again, the value network is what gets deployed. Oh, and surprise surprise, the Asymmetric Actor-Critic paper also used domain randomization, and mentioned that randomizing colors should be applied independently (or separately) for each object. I agree.

• When applying randomization to images, adding uniform, Gaussian, and/or “salt and pepper noise” is not sufficient. In our robot bed-making paper, I used these forms of noise to augment the data, but data augmentation is not the same as domain randomization, which is applied to cases when we train in simulation and transfer to the real world. In our paper, I was using the same real-world images that the robot saw. With domain randomization, we want images that look dramatically different from each other, but which are also realistic and similar from a human’s perspective. We can’t do this with Gaussian noise, but we can do this by randomizing hue, saturation, value, and colors, along with lighting and glossiness. OpenAI only applied per-pixel Gaussian noise at the end of this process.

Another option, which produces some cooler-looking images, is to use procedural generation of image textures. This is the approach taken in these two papers from Imperial College London (ICL), which use “Perlin noise” to randomize images. I encourage you to check out the papers, particularly the first one, to see the rendered images.

• Don’t forget camera randomization. OpenAI randomized the positions and orientations with small uniform noise. (They actually used three images simultaneously, so they have to adjust all of them.) Both of the ICL papers said camera randomization was essential. Unfortunately the sim-to-real cloth paper did not precisely explain their camera randomization parameters, but I’m assuming it is the same as their prior work. Camera randomization is also used in the Dexterity Network project. From communicating with the authors (since they are in our lab), I think they used randomization before it was called “domain randomization.”

I will keep these and other tricks in mind when applying domain randomization. I agree with OpenAI in that it is important for deploying machine learning based robotics in the real world. I know there’s a “Public Relations” aspect to everything they promote, but I still think that the technique matters a lot, and will continue to be popular in the near future.

# Apollo 11, Then and Now

Today, we celebrate the 50th anniversary of the first humans walking on the moon from the Apollo 11 mission on July 20, 1969. It was perhaps the most complicated and remarkable technological accomplishment the world had ever seen at that time. I can’t imagine the complexity in doing this with the knowledge we had back in 1969, and without the Internet.

I wasn’t alive back then, but I have studied some of the history and am hoping to read more about it. More importantly, though, I also want to understand how we can encourage a similar kind of “grand mission” for the 21st century, but this one hopefully cooperative among several nations rather than viewed in the lens of “us versus them.” I know this is not easy. Competition is an essential ingredient for accelerating technological advances. Had the Soviet Union not launched Sputnik in 1957, perhaps we would not have had a Space Race at all, and NASA might not exist.

I also understand the contradictions of the 1960s. My impression from reading various books and news articles is that trust and confidence in government was generally higher back then than it is today, which seems justified in the sense that America was able to somehow muster the political will and pull together so many resources for the Space Race. But it also seems strange to me, since that era also saw the Civil Rights protests and the assassination of Martin Luther King Jr in 1968, and then the Stonewall Inn riots of 1969. Foreign policy was rapidly turning into a disaster with the Vietnam War, leading Lyndon Johnson to avoid running for president in 1968. Richard Nixon would be the president who called Neil Armstrong and Buzz Aldrin when they landed on the moon and fulfilled John F. Kennedy’s vision from earlier — and we all know what happened to Nixon in 1974.

I have no hope that our government can replicate a feat like Apollo 11. I don’t mean to phrase this as an entirely negative statement; on the contrary, that our government largely provides insurance instead of engaging in expensive, bombastic missions has helped to stabilize or improve the lives of many. Other factors that affect my thinking here, though, are less desirable: it’s unlikely that the government will be able to accomplish what it did 50 years ago due to high costs, soaring debt, low trust, and little sense of national unity.

Investment and expertise in math, science, and education show some disconcerting trends. The Space Race created heavy interest and investment in science, and was one of the key factors that helped motivate, for example, RPI President Shirley Ann Jackson to study physics. Yet, as UC Berkeley economist Enrico Moretti describes in The New Geography of Jobs (the most recent book I’ve read) young American students are average in math and science compared to other advanced countries. Fortunately, the United States in the age of Trump continues to have an unprecedented ability to recruit high skilled immigrants from other countries. This is a key advantage we have over China, and we cannot relinquish it, but neither does it give us a pass for the poor state of math and science education in many parts of the country.

What has improved in the last 50 years is the strength and technological leadership of the private sector. Among the American computer science students here at Berkeley, there is virtually no interest in working for the public sector. For PhDs, with the exception of those who pursue careers in academia, almost all work for a big tech company or a start-up. It makes sense, because the most exciting advancements, particularly in my fields of AI and robotics, have come from companies like Google (AI agents for Go and Starcraft), and “capped-profits” like OpenAI (Dota2). Google (via Waymo), Tesla, and many other companies are accelerating the development of self-driving cars. Other companies perform great work in computer vision, such as Facebook, and in natural language processing, with Google and Microsoft among the leaders. Those of us advocating to break up these companies should remember that they are the ones pioneering the technologies of this century.

NASA wants to send more humans on the moon by 2024. That would be inspiring, but I argue that we need to focus on two key technologies of our time: AI and clean energy. Recent AI advantages are extraordinarily energy-hungry, and we have a responsibility not to consume too much energy, or at the very least to utilize cleaner energy sources more often. I don’t necessarily mean “green” energy, because I am a strong proponent of nuclear energy, but hopefully my point is clear. Perhaps the Apollo 11 of this century could use AI for better management of energy of all sorts, and could be pursued by various company alliances spanning multiple countries. For example, think Google and Baidu aligning with energy companies in their home countries to extract more value from wind energy. Such achievements have the potential to help people all across the world.

AI will probably be great, and let’s ensure we use it wisely to create a better future for all.

# Understanding Prioritized Experience Replay

Prioritized Experience Replay (PER) is one of the most important and conceptually straightforward improvements for the vanilla Deep Q-Network (DQN) algorithm. It is built on top of experience replay buffers, which allow a reinforcement learning (RL) agent to store experiences in the form of transition tuples, usually denoted as $(s_t,a_t,r_{t},s_{t+1})$ with states, actions, rewards, and successor states at some time index $t$. In contrast to consuming samples online and discarding them thereafter, sampling from the stored experiences means they are less heavily “correlated” and can be re-used for learning.

Uniform sampling from a replay buffer is a good default strategy, and probably the first one to attempt. But prioritized sampling, as the name implies, will weigh the samples so that “important” ones are drawn more frequently for training. In this post, I review Prioritized Experience Replay, with an emphasis on relevant ideas or concepts that are often hidden under the hood or implicitly assumed.

I assume that PER is applied with the DQN framework because that is what the original paper used, but PER can, in theory, be applied to any algorithm which samples from a database of items. As most Artificial Intelligence students and practitioners probably know, the DQN algorithm attempts to find a policy $\pi$ which maps a given state $s_t$ to an action $a_t$ such that it maximizes the expected reward of the agent $\mathbb{E}_{\pi}\Big[ \sum_{t=0}^\infty r_t \Big]$ from some starting state $s_0$. DQN obtains $\pi$ implicitly by calculating a state-value function $Q_\theta(s,a)$ parameterized by $\theta$, which measures the goodness of the given state-action with respect to some behavioral policy. (This is a critical point that’s often missed: state-action values, or state-values for that matter, don’t make sense unless they are also attached to some policy.)

To find an appropriate $\theta$, which then determines the final policy $\pi$, DQN performs the following optimization:

where $(s_t,a_t,r_t,s_{t+1})$ are batches of samples from the replay buffer $D$, which is designed to store the past $N$ samples (usually $N=1,000,000$ for Atari 2600 benchmarks). In addition, $\mathcal{A}$ represents the set of discrete actions, $\theta$ is the current or online network, and $\theta^-$ represents the target network. Both networks use the same architecture, and we use $Q_\theta(s,a)$ or $Q_{\theta^-}(s,a)$ to denote which of the two is being applied to evaluate $(s,a)$.

The target network starts off by getting matched to the current network, but remains frozen (usually for thousands of steps) before getting updated again to match the network. The process repeats throughout training, with the goal of increasing the stability of the targets $r_t + \gamma \max_{a \in \mathcal{A}} Q_{\theta^-}(s_{t+1},a)$.

I have an older blog post here if you would like an intuitive perspective on DQN. For more background on reinforcement learning, I refer you to the standard textbook in the field by Sutton and Barto. It is freely available (God bless the authors) and updated to the second edition for 2018. Woo hoo! Expect future blog posts here about the more technical concepts from the book.

Now, let us get started on PER. The intuition of the algorithm is clear, and the Prioritized Experience Replay paper (presented at ICLR 2016) is surprisingly readable. They say:

In particular, we propose to more frequently replay transitions with high expected learning progress, as measured by the magnitude of their temporal-difference (TD) error. This prioritization can lead to a loss of diversity, which we alleviate with stochastic prioritization, and introduce bias, which we correct with importance sampling. Our resulting algorithms are robust and scalable, which we demonstrate on the Atari 2600 benchmark suite, where we obtain faster learning and state-of-the-art performance.

The paper was written in 2015 and submitted to ICLR 2016, so straight-up PER with DQN is definitely not state of the art performance. For example, the Rainbow DQN algorithm is superior. Everything else is correct, though. The PER idea reminds me of “hard negative mining” in the supervised learning setting. The magnitude of the TD error (squared) is what we want to minimize in the Bellman equation. Hence, pick the samples with the largest error so that our neural network can minimize it!

To clarify a somewhat implied point (for those who did not read the paper), and to play some devil’s advocate, why do we minimize the magnitude of the TD error? Ideally we would sample with respect to some mysterious function $f( (s_t,a_t,r_t,s_{t+1}) )$ that exactly tells us the “usefulness” of sample $(s_t,a_t,r_t,s_{t+1})$ for fastest learning to get maximum reward. But since this magical function $f$ is unknown, we use absolute TD error because it appears to be a reasonable approximation to it. There are other options, and I encourage you to read the discussion in Appendix A. I am not sure how many alternatives to TD error magnitude have been implemented in the literature. Since I have not seen any (besides a KL-based one in Rainbow DQN), it suggests that DeepMind’s choice of absolute TD error was the right one. The TD error for vanilla DQN, is:

and for Double DQN, it would be:

and either way, we use $| \delta_i |$ as the magnitude of the TD error. Negative versus positive TD errors are combined into one case here, but in principle we could consider them as separate cases and add a bonus to whichever one we feel is more important to address.

This provides the absolute TD error, but how do we incorporate this into an RL algorithm?

First, we can immediately try to assign the priorities ($| \delta_i |$) as components to add to the samples. That means our replay buffer samples are now $(s_{t},a_{t},r_{t},s_{t+1}, | \delta_t |)$. (Strictly speaking, they should also have a “done” flag $d_t$ which tells us if we should use the bootstrapped estimate of our target, but we often omit this notation since it is implicitly assumed. This is yet another minor detail that is not clear until one implements DQN.)

But then here’s a problem: how is it possible to keep a tally of all the magnitude of TD errors updated? Replay buffers might have a million elements in them. Each time we update the neural network, do we really need to update each and every $\delta_i$ term, which would involve a forward pass through $Q_\theta$ (and possibly $Q_{\theta^-}$ if it was changed) for each item in the buffer? DeepMind proposes a far more computationally efficient alternative of only updating the $\delta_i$ terms for items that are actually sampled during the minibatch gradient updates. Since we have to compute $\delta_i$ anyway to get the loss, we might as well use those to change the priorities. For a minibatch size of 32, each gradient update will change the priorities of 32 samples in the replay buffer, but leave the (many) remaining items alone.

That makes sense. Next, given the absolute TD terms, how do we get a probability distribution for sampling? DeepMind proposes two ways of getting priorities, denoted as $p_i$:

• A rank based method: $p_i = 1 / {\rm rank}(i)$ which sorts the items according to $| \delta_i |$ to get the rank.

• A proportional variant: $p_i = | \delta_i | + \epsilon$, where $\epsilon$ is a small constant ensuring that the sample has some non-zero probability of being drawn.

During exploration, the $p_i$ terms are not known for brand-new samples because those have not been evaluated with the networks to get a TD error term. To get around this, PER initializes $p_i$ according to the maximum priority of any priority thus far, thus favoring those terms during sampling later.

From either of these, we can easily get a probability distribution:

where $\alpha$ determines the level of prioritization. If $\alpha \to 0$, then there is no prioritization, because all $p(i)^\alpha =1$. If $\alpha \to 1$, then we get to, in some sense, “full” prioritization, where sampling data points is more heavily dependent on the actual $\delta_i$ values. Now that I think about it, we could increase $\alpha$ above one, but that would likely cause dramatic problems with over-fitting as the distribution could become heavily “pointy” with low entropy.

We finally have our actual probability $P(i)$ of sampling the $i$-th data point for a given minibatch, which would be (again) $(s_t,a_t,r_t,s_{t+1},| \delta_t |)$. During training, we can draw these simply by weighting all samples in the $N$-sized replay buffer by $P(i)$.

Since the buffer size $N$ can be quite large (e.g., one million), DeepMind uses special data structures to reduce the time complexity of certain operations. For the proportional-based variant, which is what OpenAI implements, a sum-tree data structure is used to make both updating and sampling $O(\log N)$ operations.

Is that it? Well, not quite. There are a few technical details to resolve, but probably the most important one (pun intended) is an importance sampling correction. DeepMind describes why:

The estimation of the expected value with stochastic updates relies on those updates corresponding to the same distribution as its expectation. Prioritized replay introduces bias because it changes this distribution in an uncontrolled fashion, and therefore changes the solution that the estimates will converge to (even if the policy and state distribution are fixed). We can correct this bias by using importance-sampling (IS) weights.

This makes sense. Here is my intuition, which I hope is useful. I think the distribution DeepMind is talking about (“same distribution as its expectation”) above is the distribution of samples that are obtained when sampling uniformly at random from the replay buffer. Recall the expectation I wrote above, which I repeat again for convenience:

Here, the “true distribution” for the expectation is indicated with this notation under the expectation:

which means we uniformly sample from the replay buffer. Since prioritization means we are not doing that, then the distribution of samples we get is different from the “true” distribution using uniform sampling. In particular, PER over-samples those with high priority, so the importance sampling correction should down-weight the impact of the sampled term, which it does by scaling the gradient term so that the gradient has “less impact” on the parameters.

To add yet more confusion, I don’t even think the uniform sampling is the “true” distribution we want, in the sense that it is the distribution under the expectation for the Q-learning loss. What I think we want is the actual set of samples that are induced by the agent’s current policy, so that we really use:

where $\pi$ is a policy induced from the agent’s current Q-values. Perhaps it is greedy for simplicity. So what effectively happens is that, due to uniform sampling, there is extra bias and over-sampling towards the older samples in the replay buffer. Despite this, we should be OK because Q-learning is off-policy, so it shouldn’t matter in theory where the samples come from. Thus it’s unclear what a “true distribution of samples” should be like, if any exists. Incidentally, the off-policy aspect of Q-learning and why it does not take expectations “over the policy” appears to be the reason why importance sampling is not needed in vanilla DQN. (When we add an ingredient like importance sampling to PER, it is worth thinking about why we had to use it in this case, and not in others.) Things might change when we talk about $n$-step returns, but that raises the complexity to a new level … or we might just ignore importance sampling corrections, as this StackExchange answer suggests.

This all makes sense intuitively, but there has to be a nice, rigorous way to formalize it. The “TL;DR” is that the importance sampling in PER is to correct the over-sampling with respect to the uniform distribution.

Hopefully this is clear. Feel free to refer back to an earlier blog post about importance sampling more generally; I was hoping to follow it up right away with this current post, but my blogging plans never go according to plan.

How do we apply importance sampling? We use the following weights:

and then further scaled in each minibatch so that $\max_i w_i = 1$ for stability reasons; generally, we don’t want weights to be wildly large.

Let’s dissect this term. The $1/N$ part is because of the current experience replay size. To clarify: this is NOT the same as the capacity of the buffer, and it only becomes equivalent to it once we hit the capacity and have to start over-riding samples. The $P(i)$ represents the probability of sampling data point $i$ according to priorities. It is this key term that scales the weights proportionally. As $P(i) \to 1$ (which really should never happen) the weight gets smaller, with an extreme down-weighting of the sample’s impact. As $P(i) \to 0$, the weight gets larger. If $P(i) = 1/N$ for all $i$, then we get uniform sampling with the $1/N$ term canceling out $1/(1/N)$.

Don’t forget the $\beta$ term in the exponent, which controls how much prioritization to apply. They argue that training is highly unstable at the beginning, and that importance sampling corrections matter more near the end of training. Thus, $\beta$ starts small (values of 0.4 to 0.6 are commonly used) and anneals towards one.

We finally “fold” this weight together with the $\delta_i$ TD error term during training, with $w_i \delta_i$, because the $\delta_i$ is multiplied with the gradient $\nabla_\theta Q_\theta(s_t,a_t)$ following the chain rule.

The PER paper shows that PER+(D)DQN it outperforms uniform sampling on 41 out of 49 Atari 2600 games, though which of the exact 8 games it did not improve on is unclear. From looking at Figure 3 (which uses Double DQN, not DQN), perhaps Robotank, Defender, Tutankham, Boxing, Bowling, BankHeist, Centipede, and Yar’s Revenge? I wouldn’t get too bogged down with the details; the benefits of PER are abundantly clear.

As a testament to the importance of prioritization, the Rainbow DQN paper showed that prioritization was perhaps the most essential extension for obtaining high scores on Atari games. Granted, their prioritization was based not on absolute TD error but based on a Kullback-Leibler loss because of their use of distributional DQNs, but the main logic might still apply to TD error.

Prioritization can be applied to other applications of experience replay. For example, suppose we wanted to add extra samples to the buffer from some “demonstrator” as in Deep Q-Learning from Demonstrations (blog post here). We can keep the same replay buffer code as earlier, but allocate the first $k$ items in the list come from demonstrator samples. Then our indexing for overriding older samples from the current agent must skip over the first $k$ items. It might be simplest to record this by adding a flag $f_t$ to the sample indicating whether it is a demonstrator or current agent sample. You can probably see why researchers prefer to write $(s_t,a_t,r_t,s_{t+1})$ without all the annoying flags and extra terms! To apply prioritization, one can adjust the raw values $p_i$ to increase those from the demonstrator.

I hope this was an illuminating overview of prioritized experience replay. For details, I refer you (again) to the paper and for an open-source implementation from OpenAI. Happy readings!

# My Second Graduate Student Instructor Experience for CS 182/282A (Previously 194/294-129)

In Spring 2019, I was the Graduate Student Instructor (i.e., Teaching Assistant) for CS 182/282A, Designing, Visualizing, and Understanding Deep Neural Networks, taught by Professor John Canny. The class was formerly numbered CS 194/294-129, and recently got “upgraded” to have its own three-digit numbers of 182 and 282A for the undergraduate and graduate versions, respectively. The convention for Berkeley EECS courses is that new ones are numbered 194/294-xyz where xyz is a unique set of three digits, and once the course has proven that it is worthy of being treated as a regular course, it gets upgraded to a unique number without the “194/294” prefix.

Judging from my conversations with other Berkeley students who are aware of my blog, my course reviews seem to be a fairly popular category of posts. You can find the full set in the archives and in this other page I have. While most of these course reviews are for classes that I have taken, one of the “reviews” is actually my GSI experience from Fall 2016, when I was the GSI for the first edition of Berkeley’s Deep Learning course. Given that the class will now be taught on a regular basis, and that I just wrapped up my second GSI experience for it, I thought it would be nice to once again dust off my blogging skills and discuss my experience as a course staff member.

Unlike last time, when I was an “emergency” 10-hour GSI, I was a 20-hour GSI for CS 182/28A from the start. At Berkeley, EECS PhD students must GSI for a total of “30 hours.” The “hours” designation means that students are expected to work for that many hours per week in a semester as a GSI, and the sum of all the hours across all semesters must be at least 30. Furthermore, at least one of the courses must be an undergraduate course.1 As a 20-hour GSI for CS 182/282A and a 10-hour GSI for the Fall 2016 edition, I have now achieved my teaching requirements for the UC Berkeley EECS PhD program.

That being said, let us turn to the course itself.

## Course Overview and Logistics

CS 182/282A can be thought of as a mix between 231n and 224n from Stanford. Indeed, I frequently watched lectures or reviewed the notes from those courses to brush up on the material here. Berkeley’s on a semester system, whereas Stanford has quarters, so we are able to cover slightly more material than 231n or 224n alone. We cover some deep reinforcement learning in 182/282A, but that is also a small part of 231n.

In terms of course logistics, the bad news was obvious when the schedule came out. CS 182/282A had lectures on Mondays and Wednesdays, at 8:00am. Ouch. That’s hard on the students; I wish we had a later time, but I think the course was added late to the catalog so we were assigned to the least desirable time slot. My perspective on lecture times is that as a student, I would enjoy an early time because I am a morning person, and thus earlier times fit right into my schedule. In contrast, as a course staff member who hopes to see as many students attend lectures as possible, I prefer early afternoon slots when it’s more likely that we get closer to full attendance.

Throughout the semester, there were only three mornings when the lecture room was crowded: on day one, and on the two in-class midterms. That’s it! Lecture attendance for 182/282A was abysmal. I attended nearly all the lectures, and by the end of the semester, I observed we were only getting about 20 students per lecture, out of a class size of (judging by the amount listed on the course evaluations) perhaps 235 students!

Incidentally, the reason for everyone showing up on day one is that I think students on the waiting list have to show up on the first day if they want a chance of getting in the class. The course staff got a lot of requests from students asking if they could get off the waiting list. Unfortunately I don’t think I or anyone on the course staff had control over this, so I was unable to help. I really wish the EECS department had a better way to state unequivocally whether a student can get in a class or not, and I am somewhat confused as to why students constantly ask this question. Do other classes have course staff members deal with the waiting list?

One logistical detail that is unique to me are sign language interpreting services. Normally, Berkeley’s Disabled Students’ Program (DSP) pays for sign language services for courses and lab meetings, since this is part of my academic experience. Since I was getting paid by the EECS department, however, DSP told me that the EECS department had to do the payment. Fortunately, this detail was quickly resolved by the excellent administrators at DSP and EECS, and the funding details abstracted away from me.

## Discussion Sections

Part of our GSI duties for 182/282A is that we need to host discussions or sections; I use the terms interchangeably, and sometimes together, and another term is “recitations” as in this blog post 4.5 years ago. Once a week, each GSI was in charge of two discussions, which are each a 50-minute lecture we give to a smaller audience of students. This allows for a more intimate learning environment, where students may feel more comfortable asking questions as compared to the normal lectures (with a terrible time slot).

The discussions did not start well. They were scheduled only on Mondays, with some overlapping time slots, which seemed like a waste of resources. We first polled the students to see the best times for them, and then requested the changes to the scheduling administrators in the department. After several rounds of ambiguous email exchanges, we got a stern and final response from one of them, who said the students “were not six year olds” and were responsible for knowing the discussion schedule since it was posted ahead of time.

To the students who had scheduling conflicts with the sections, I apologize, but we tried.

We also got off to a rocky start with the material we chose to present. The first discussion was based on Michael I. Jordan’s probabilistic grapical models notes2 to describe the connection between Naive Bayes and Logistic Regression. Upon seeing our discussion material, John chastised us for relying on graduate-level notes, and told us to present his simpler ideas instead. Sadly, I did not receive his email until after I had already given my two discussion sections, since he sent it while I was presenting.

I wish we had started off a bit easier to help some students gradually get acclimated to the mathematics. Hopefully after the first week, the discussion material was at a more appropriate difficulty level. I hope the students enjoyed the sections. I certainly did! It was fun to lecture and to throw the occasional (OK, frequent) joke.

Preparing for the sections meant that I needed to know the material and anticipate the questions students might ask. I dedicated my entire Sundays (from morning to 11:00pm) preparing for the sections by reviewing the relevant concepts. Each week, one GSI took the lead in forming the notes, which dramatically helped to simplify the workload.

At the end of the course, John praised us (the GSIs) for our hard work on the notes, and said he would reuse them them in future iterations of the course.

## Piazza and Office Hours

I had ZERO ZERO ZERO people show up to office hours throughout the ENTIRE semester in Fall 2016. I don’t even know how that is humanly possible.

I did not want to repeat that “accomplishment” this year. I had high hopes that in Spring 2019, with the class growing larger, more students would come to office hours. Right? RIGHT?!?

It did not start off well. Not a single student showed up to my office hours after the first discussion. I was fed up, so at the beginning of my section the next week, I wrote down on the board: “number of people who showed up to my office hours”. Anyone want to guess? I asked the students. When no one answered, I wrote a big fat zero on the board, eliciting a few chuckles.

Fortunately, once the homework assignments started, a nonzero number of students showed up to my office hours, so I no longer had to complain.

Students were reasonably active on Piazza, which is expected for a course this large with many undergraduate students. One thing that was also expected — this one not so good — was that many students ran into technical difficulties when doing the homework assignments, and posted incomplete reports on Piazza. Their reports were written in a way that made it hard for the course staff to adequately address them.

This has happened in previous iterations of the course, so John wrote this page on the course website which has a brief and beautiful high-level explanation of how to properly file an issue report. I’m copying some of his words here because they are just so devastatingly effective:

If you have a technical issue with Python, EC2 etc., please follow these guidelines when you report an issue in Piazza. Most issues are relatively easy to resolve when a good report is given. And the process of creating a good Issue Report will often help you fix the problem without getting help - i.e. when you write down or copy/paste the exact actions you took, you will usually discover if you made a slip somewhere.

Unfortunately many of the issue reports we get are incomplete. The effect of this is that a simple problem becomes a major issue to resolve, and staff and students go back-and-force trying to extract more information.

[…]

Well said! Whenever I felt stressed throughout the semester due to teaching or other reasons, I would often go back and read those words on that course webpage, which brought me a dose of sanity and relief. Ahhhhh.

The above is precisely why I have very few questions on StackOverflow and other similar “discussion forums.” The following has happened to me so frequently: I draft a StackOverflow post and structure it by saying that I tried this and that and … oh, I just realized I solved what I wanted to ask!

For an example of how I file in (borderline excessive) issue reports, please see this one that I wrote for OpenAI baselines about how their DDPG algorithm does not work. (But what does “does not work” mean?? Read the issue report to find out!)

I think I was probably spending too much time on Piazza this semester. The problem is that I get this uncontrollable urge to respond to student questions.3 I had the same problem when I was a student, since I was constantly trying to answer Piazza questions that other students had. I am proud to have accumulated a long list of “An instructor […] endorsed this answer” marks.

The advantage of my heavy Piazza scrutiny was that I was able to somewhat gauge which students should get a slight participation bonus for helping others on Piazza. Officially, participation was 10% of the grade, but in practice, none of us actually knew what that meant. Students were constantly asking the course staff what their “participation grade” actually meant, and I was never able to get a firm answer from the other course staff members. I hope this is clarified better in future iterations of 182/282A.

Near the end of the grading period, we finally decided that part of participation would consist of slight bonuses to the top few students who were most helpful on Piazza. It took me four hours to scan through Piazza and to send John a list of the students who got the bonus. This was a binary bonus: students could get either nothing or the bonus. Obviously, we didn’t announce this to the students, because we would get endless complaints from those who felt that they were near the cutoff for getting credit.

## Homework Assignments

We had four challenging homework assignments for 182/282A, all of which were bundled with Python and Jupyter notebooks:

• The first two came straight from the 231n class at Stanford — but we actually took their second and third assignments, and skipped their first one. Last I checked, the first assignment for 231n is mostly an introduction to machine learning and taking gradients, the second is largely about convolutional neural networks, and third is about recurrent neural networks with a pinch of Generative Adversarial Networks (GANs). Since we skipped the first homework assignment from 231n, this might have made our course relatively harder, but fortunately for the students, we did not ask them to do the GANs part for Stanford’s assignment.

• The third homework was on NLP and the Transformer architecture (see my blog post here). One of the other GSIs designed this from the ground up, so it was unique for the class. We provided a lot of starter code for the students, and asked them to implement several modules for the Transformer. Given that this was the first iteration of the assignment, we got a lot of Piazza questions about code usage and correctness. I hope this was educational to the students! Doing the homework myself (to stress test it beforehand) was certainly helpful for me.

• The fourth homework was on deep reinforcement learning, and I designed this one. It took a surprisingly long time, even though I borrowed lots of the code from elsewhere. My original plan was actually to get the students to implement Deep Q-Learning from Demonstrations (blog post here) because that’s an algorithm that nicely combines imitation and reinforcement learning, and I have an implementation (actually, two) in a private repository which I could adapt for the assignment. But John encouraged me to keep it simple, so we stuck with the usual “Intro to DeepRL” combination of Vanilla Policy Gradients and Deep Q-learning.

The fourth homework assignment may have been tough on the students since this was due just a few days after the second midterm (sorry!). Hopefully the lectures were helpful for the assignment. Incidentally, I gave one of the lectures for the course, on Deep Q-learning methods. That was fun! I enjoyed giving the lecture. It was exciting to see students raise their hands with questions.

## Midterms

We had two midterms for 182/282A.4 The midterms consisted of short answer questions. We had to print the midterms and walk a fair distance to some of the exam rooms. I was proctoring one of them with John, and since it was awkward not talking to someone, particularly when that someone is your PhD advisor, I decided to strike up a conversation while we were lugging around the exams: how did you decide to come to Berkeley? Ha ha! I learned some interesting factoids about why John accepted the Berkeley faculty offer.5

Anyway, as is usual in Berkeley, we graded exams using Gradescope. We split the midterms so that each of the GSIs graded 25% of the points allocated to the exam.6 I followed these steps for grading my questions:

• I only grade one question at a time.

• I check to make sure that I understand the question and its possible solutions. Some of them are based on concepts from research papers, so this process sometimes took a long time.

• I get a group of the student answers on one screen, and scroll through them to get a general feel for what the answers are like. Then I develop rough categories. I use Gradescope’s “grouping” feature to create groups, such as “Correct - said X,Y,Z”, “Half Credit - Missed X”, etc.

• Then I read through the answers and assign them to the pre-created groups.

• At the end, I go through the groups and check for borderline cases. I look at the best and worst answers in each group, and re-assign answers to different categories if necessary.

• Finally, I assign point values for the groups, and grade in batch mode. Fortunately, the entire process is done (mostly) anonymously, and I try not to see the identity of the students for fairness. Unfortunately, some students have distinctive handwriting, so it was not entirely anonymous, but it’s close enough. Grading using Gradescope is FAR better than the alternative of going through physical copies of exams. Bleh!

• Actually, there’s one more step: regrade requests. Gradescope includes a convenient way to manage regrade requests, and we allowed a week for students to submit regrade requests. There were, in fact, a handful of instances when we had to give students more points, due to slight errors with our grading. (This is unavoidable in a class with more than 200 students, and with short-answer questions that have many possible answers.)

We hosted review sessions before each midterm, which were hopefully helpful to the students.

In retrospect, I think we could have done a better job with the clarity of some midterm questions. Some students gave us constructive feedback after the first midterm by identifying which short-answer questions were ambiguous, and I hope we did a better job designing the second midterm.

We received another comment about potentially making the exam multiple choice. I am a strong opponent of this, because short answer questions far more accurately reflect the real world, where people must explain concepts and are not normally given a clean list of choices. Furthermore, multiple choice questions can also be highly ambiguous, and they are sometimes easy to “game” if they are poorly designed (e.g., avoid any answer that says a concept is “always true,” etc.).

Overall, I hope the exams did a good job measuring student’s retention of the material. Yes, there are limits to how well timed exams correlate with actual knowledge, but it is one of the best resources we have based on time, efficiency, and fairness.

## Final Projects

Thankfully, we did not have a final exam for the class. Instead, we had final projects, which were to be done in groups of 2-4 students, though some sneaky students managed to work individually. (“Surely I don’t have to explain why a team of one isn’t a team?” John replied on Piazza to a student who asked if he/she could work on a project alone.) The process of working on the final projects involved two pass/fail “check-ins” with GSIs. At the end of the semester, we had the poster session, and then each team submitted final project reports.

The four GSIs split up the final project grading so that we were the primary grader for 25% of the teams, and then we graded a subset of the other GSI teams to recalibrate grades if needed. I enforced a partial ordering for my teams: projects with grades $x$ were higher quality than those with grades less than $x$ and worse than those with grades higher than $x$, and just about equivalent in the case of equal grades. After a final scan of the grades, I was pretty confident with my ordering of them, and I (like the other GSIs) prepared a set of written comments to send to each team.

One other aspect of the project deserves extra comment: the credit assignment problem. We required the teams to list the contribution percentages of each team member in their report. This is a sensitive topic, and I encourage you to read a thoughtful blog post by Chris Olah on this topic.

We simply cannot assign equal credit if people contribute unequally to projects. It is not ethical to do so, and we have to avoid people free-riding on top of others who do the actual work. Thus, we re-weighted grades based on the project contribution. That is, each student got a team grade and an individual grade. This is the right thing to do. I get so passionate about ensuring that credit is allocated at least as fairly as is reasonable.

## The Course Evaluation

After the course was over, the course staff received reviews from the students. My reviews were split up into those from the 182 and the 282A students. I’m not sure why this is needed, as it only makes it harder for me to accumulate the results together. Anyway, here are the number of responses we received:

• 182 students: 145 responses out of 170 total.
• 282 students: 53 responses out of 65 total.

I don’t know if the “total” here reflects students who dropped the course.

Here are the detailed results. I present my overall numerical grades followed by the open-ended responses. The totals don’t match the numbers I recently stated, I think because some students only filled in a subset of the course evaluation.

My first thought was: ouch! The reviews said that the EECS department average was 4.34. Combining the undergrad and graduate ratings meant that I was below average.

Well, I was hoping to at least be remotely in contention for the Outstanding GSI award from the department. Unfortunately, I guess that will not happen. Nonetheless, I will still strive to be as effective a teacher as I can possibly be in the future. I follow the “growth mindset” from Carol Dweck,7 so I must use this opportunity to take some constructive criticism.

From looking at the comments, one student said I was “kinda rude” and another said I was “often condescending and sometimes threatening” (?!?!?). I have a few quick reactions to this.

• First, if I displayed any signs of rudeness, condescension, or threatening behavior (!!) in the course, it was entirely by mistake and entirely unintentional! I would be terrified if I was a student and had a course staff member threaten me, and I would never want to impose this feeling on a student.

• Regarding the criticism of “condescension,” I have tried long and hard to remain humble in that I do not know everything about every concept, and that I should not (unreasonably) criticize others for this. When I was in elementary school, one of my most painful nightmares was when a speech teacher8 called me out for an arrogant comment; I had told her that “skiing on black diamond trails is easy.” That shut me up, and taught me to watch my words in the future. With respect to Deep Learning, I try to make it clear if I do not know enough about a concept to help a student. For example, I did not know much about the Transformer architecture before taking this class, and I had to learn it along with the students. Maybe some of the critical comments above could have been due to vague answers about the Transformer architecture? I don’t use it in my research, unlike the three other GSIs who do, which is why I recommended that students with specific Transformer-related questions ask them.

• One possibility might be that those negative comments came from students who posted incomplete issue reports on Piazza that got my prompt response of linking to John Canny’s “filing an issue report” page (discussed earlier). Admittedly, I was probably excessively jumping the gun on posting those messages. Maybe I should not have done that, but the reality is that we simply cannot provide any reasonable help to students if they do not post enough information for us to understand and reproduce their errors, and I figured that students would want a response sooner than later.

I want to be a better teacher. If there are students who have specific comments about how I could improve my teaching, then I would like to know about it. I would particularly be interested in getting responses from students who gave me low ratings. To be clear, I have absolutely no idea who wrote what comments above; students have the right to provide negative feedback anonymously to avoid potential retaliation, though I would never do such a thing.

To any students who are genuinely worried that I will be angry at them if I get non-anonymous negative feedback from them, then in your email or message to me, be sure to paste a screenshot of this blog post which shows that I am asking for this feedback and will not get angry. Unlike Mitch McConnell, I have no interest in being a hypocrite, so I will have to take the feedback to heart. At the very least, a possible message could be structured like the following:

Dear Daniel

I was a student in CS 182/282A this past semester. I think you are a terrible GSI and I rated you 1 out of 5 on the course evaluation (and I would have given a 0 had that been an option). I am emailing to explain why you are an atrocious teacher, and before you get into a hissy fit, here’s a screenshot of your blog showing that we have permission to give this kind of feedback without retaliation:

[insert screenshot and other supporting documentation here]

Anyway, here are the reasons:

Reason 1: [insert constructive feedback here]

Reason 2: [insert constructive feedback here]

I hope you find this helpful!

Sincerely, […]

And there you have it! Without knowing what I can do to be a better teacher, I won’t be able to improve.

On the positive side, at least I got lots of praise for the Piazza comments! That counts for something. I hope students appreciated it, as I enjoy responding and chatting with students.

Obviously, if you graded me 5 stars, then thanks! I am happy to meet with you and chat over tea. I will pay for it.

Finally, without a doubt, the most badass comment above was “I’m inspired by his blog” (I’m removing the swear word here, see the footnote for why).9 Ha ha! To whoever said that, if you have not subscribed, here is the link.

## Closing Thoughts

Whew, now that the summer is well underway, I am catching up on research and other activities now that I am no longer teaching. I hope this reflection gives an interesting perspective of a GSI for a course on a cutting-edge, rapidly changing subject. It is certainly a privilege to have this opportunity!

Throughout the semester, I recorded the approximate hours that I worked each week on this class. I’m pleased to report that my average was roughly 25 hours a week. I do not count breakfast, lunch, or dinner if I consumed them in between my work schedule. I count meetings, and usually time I spend on emails. It’s hard to say if I am working more or less than other GSIs.

Since I no longer have to check Piazza, my addiction to Piazza is now treated. Thus, my main remaining addictions to confront are reading books, blogging, eating salads, and long-distance running. Unfortunately, despite my best efforts, I think I am failing to adequately reduce the incidence of all four of these.

That is a wrap. I am still super impressed by how much John Canny is able to quickly pick up different fields. Despite being department chair in two days, he continues to test and implement his own version of the Transformer model. He will be teaching CS 182/282A next Spring, and I am told that John will try to get a better time than 8:00am, and given that he’s the department chair, he must somehow get priority on his course times. Right?

Stay tuned for the next iteration of the course, and happy Deep Learning!

I thank David Chan and Forrest Huang for feedback on earlier drafts of this post.

1. It is confusing, but courses like CS 182/282A, which have both undergraduate and graduate students, should count for the “undergraduate” course. If it doesn’t, then I will have to petition the department.

2. These are the notes that form the basis of the AI prelim at Berkeley. You can read about my experience with that here

3. This was the reason that almost stopped me from being a GSI for this course in Fall 2016. John was concerned that I would spend too much time on Piazza and not enough on research.

4. I made a cruel joke in one of my office hours by commenting on the possibility of there being a third midterm. It took some substantial effort on my part to convince the students there that I was joking.

5. I’m sure John had his choice of faculty offers, since he won the 1987 ACM Doctoral Dissertation award, for having the best computer science PhD in the world. From reading the award letter in his disseration, it says John’s dissertation contained about “two awards’ worth” (!!) of material. And amazingly, I don’t think his PhD thesis includes much about his paper on edge detection, the one for which he is most well known for with over 31,000 citations. As in, he could omit his groundbreaking edge detector, and his thesis still won the disseration award. You can find the winners of the ACM Doctoral Dissertation Award here. Incidentally, it seems like the last five Berkeley winners or honorable mentions (Chelsea Finn, Aviad Rubinstein, Peter Bailis, Matei Zaharia, and John Duchi) all are currently at Stanford, with Grey Ballard breaking the trend by going back to his alma matter of Wake Forest.

6. One of my regrets is that I did not know that some other GSIs were not 20-hour GSIs like me, and worked less. Since that was the case, I should have taken more of the duty in grading the exam questions.

7. You can probably guess at least one of the books that’s going to appear on my eventual blog post about the “Books I Read in 2019.” You can find past blog posts about my reading list here (2016), here (2017), and here (2018)

8. Most students in our Deaf and Hard of Hearing program at my school district took speech lessons throughout elementary and middle school, since it is harder for us to know if we are pronouncing words in the way that most hearing people do. Even today, I don’t think I can pronounce “s” in a fully satisfactory manner.

9. A weird and crazy factoid about me is that — as a conservative estimate — it has been ten years since I last uttered a swear word or used a swear word in writing. This includes “censoring” swear words with asterisks.