# Gradient Descent Converges to Minimizers

Their main contribution is conveniently outlined in a single obvious paragraph (thank you for clear writing!!):

If $f: \mathbb{R}^d \to \mathbb{R}$ is twice continuously differentiable and satisfies the strict saddle property, then gradient descent with a random initialization and sufficiently small constant step size converges to a local minimizer or negative infinity almost surely.

Let’s make it clear what this contribution means:

• We’re dealing with the gradient method, $x_{k+1} = x_k - \alpha\nabla f(x_k)$. It’s nothing too fancy, and the constant step size makes the analysis easier.

• The sufficiently small step size means we want $% $ where $L$ is the Lipschitz constant. In other words, it satisfies the well-known inequality $\|\nabla f(x) - \nabla f(y)\|_2 \le L \|x-y\|_2$ for all $x$ and $y$. I have used this inequality a lot in EE 227C.

• The strict saddle property restricts $f$ so that every critical point (i.e., those points $x$ such that $\nabla f(x) = 0$) is either (a) a local minimizer, or (b) has $% $. It serves to restrict $f$ because other functions could have critical points where all the eigenvalues are zero. Note that since the Hessian is a symmetric matrix, all the eigenvalues are real numbers. In addition, a local minimizing point $x$ means the eigenvalues of $\nabla^2f(x)$ are all strictly positive.

• They claim that the gradient method will go to a local minimizer. But where else could it go to? There are two other options: saddle points, and local maxima. Gradient descent, however, cannot go to local maxima because it is by definition a descent procedure, unless (I think) for some reason we’ve initialized $x_0$ as a point that is already a global maxima, so $\nabla f(x_0) = 0$ and we get nowhere. So the only thing we worry about are saddle points. Thus, if “saddle points are not a problem” as suggested in the paper, then that therefore means gradient descent converges to local minimizers, as desired.

It’s worth discussing saddle points in more detail. The paper “Identifying and Attacking…” uses the following diagrams to provide intuition:

Image (a) is a saddle point of a 1-D (i.e., scalar) function. Images (b) and (c) represent saddle points in higher dimensions. They are characterized by the eigenvalues of the Hessian at those critical points. If all eigenvalues are non-zero and either strictly positive or strictly negative, then we get the shape of (b) with a min-max structure. If there exists a zero eigenvalue, then we get (c) with a degenerate Hessian. (Recall that a matrix is invertible if and only if all its eigenvalues are non-zero.) Image (d) is a weird “gutter shape” which also results from at least one zero eigenvalue. I’m not completely sure I buy their explanation – I’d need a little more explanation for why this happens. But I suppose the point is that the authors of “Gradient Descent Converges to Minimizers” don’t want to consider degenerate cases with zero eigenvalues. It must make the analysis easier.

Section 3 of “Gradient Descent Converges to Minimizers” provides two examples for intuition. The first example is $f(x) = \frac{1}{2}x^THx$, where $H={\rm diag}(\lambda_1, \lambda_2, \ldots, \lambda_n)$ and has no zero components (and hence no zero eigenvalues) but it must have at least one positive and at least one negative component. Otherwise, we wouldn’t have any saddle points! By the way, the only critical point for this function is $x=0$, as $\nabla f(x) = Hx=0$ if and only if $x=0$.

The gradient update is $x_{k+1} = (I-\alpha H)x_k$. Applying this recursively, we get $x_{k+1} = (I-\alpha H)^{k+1}x_0$. More specifically, the iterates take on the following form:

Indeed, an analysis of gradient descent with $% $ shows that gradient descent will only converge to $x=0$ if the initial point $x_0$ is in the span of $\{e_1, \ldots, e_k\}$ where $k$ represents the number of strictly positive eigenvalues (so $% $). Remember: we don’t actually want to converge to that point, since it is a saddle point! But fortunately, as $% $, if we randomly initialize $x_0$ appropriately, the only way our iterates converge to the zero vector is if all components from $k+1$ to $n$ were exactly zero, and the probability of that happening is zero. Great! We don’t converge to the (bad) critical point! We converge to … a better point, I hope. (The paper uses the term “diverge” but I get uneasy reading that.)

The second example is $f(x,y) = \frac{1}{2}x^2 + \frac{1}{4}y^4 - \frac{1}{2}y^2$. Finding the explicit gradient update is straightforward, and is provided in the paper. They also explicitly state the three critical points of $f$. Their argument is similar to the previous example in that they can reduce the cases of converging to an undesirable saddle point to a case which would require initializing a certain component of the starting 2-D point $(x_0,y_0)$ to zero, which cannot happen with random initialization (well, the technical way to say that is “zero measure” …).

I still have a few burning questions on these (plus some of the other stuff mentioned in Section 3) but I’ll hold off on writing about those once I have time to get to the meat of this paper, Section

1. In the meantime, it will be interesting to see what kind of work gets built off of this one.

# Requesting Accommodations Takes Time

One of the things that I’ve been a little frustrated about lately is the time it takes to arrange and obtain academic accommodations, such as sign language interpreting or captioning services. I can’t just show up to a lecture or an event and expect a sign language interpreter to be there. I have to explicitly request the service, and there are many reasons why this process might get delayed.

• Before agencies or institutions provide the service, I have to prove that I need the service. This means, at minimum, I need to provide them my audiogram, and they might need some additional background information about my education. Sometimes an interview is required; I had a remote interview with a Berkeley DSP employee before I had arrived for my first classes.

• After the initial registration hurdle, I can start formally requesting accommodations. To schedule an accommodation for a campus-related event, I have to fill out an online form with information about the time, the location, and other stuff. Berkeley’s gotten better with the forms, as they’ve implemented extra features that help to counter my earlier criticism. On the other hand, there can still be a noticeable delay between when I submit the form and when I get responses, and I have to keep reminding myself that weekends and vacations do not count as “real days” when counting how many days in advance to submit a request.

• In some cases, it can be extremely annoying to schedule accommodations for one-time events. If it is the first time that I am participating in an event, then I usually don’t have much information on the setting or environment, and it is not always clear if there will be one speaker (which is easier for an interpreter) or a debate with people shouting simultaneously. In addition, I often need to have a detailed schedule of the event, and it’s common to have people wait until the last minute to finalize schedules. I’ve had to send lots of emails to remind others that I need a detailed schedule ASAP, and people hate to see “ASAP.”

• Finally, it’s not clear how much accommodations can help in practice. I’m not counting cases when there’s some kind of mistake in the scheduling (they do happen sometimes, as in my prelims). I’m considering cases when they work normally, but they simply do not produce any benefit. For instance, when I took CS 288, I had captioning services. In general, they worked as intended (well, not always) but it was extremely hard for me to follow and understand concepts based on real-time, imperfect captions.

I should note, of course, that I’m not the only one who has mentioned this. In fact, I was actually inspired to write this short piece after reading a longer essay by Teresa Blankmeyer Burke, an Associate Professor of Philosophy at Gallaudet University who is deaf. Her blog post covers on some of the themes regarding the time it takes to schedule accommodations. I think her experience is similar to mine: lots and lots of emails to write and forms to fill.

Nonetheless, despite the annoyance of scheduling accommodations, it is important for me to look at the big picture. First, I usually get the accommodations, which is something that not every deaf person in the world can say. In addition, even when accommodations do not work that well, I know that the people providing them are trying their best to help me, and I appreciate that.

# Looking Back: Some of my Lifelong Regrets

I often think about some of my lifelong regrets, probably because I’m in a stressful period of my life.

What could I have done different? Would I be a much better person today if I had done this instead of that? Why didn’t I think about doing such obvious acts earlier?

Hopefully if I list them here, I can look back at this blog post periodically and ask myself if I’m making progress towards mitigating my constant guilt over these regrets.

Here are ten of my major lifelong regrets:

• (1) I did not do enough math, statistics, and computer programming, both during college and (especially) before college. To be clear, I have been a good student my entire life, getting mostly top grades in the hardest courses available to me. But gradually, it became clear to me that I was just an “average good student”, and at Berkeley, there are a lot of “better than good students” who boast ridiculously long lists of math/programming accomplishments, and long lists of graduate-level courses taken.

I am constantly thinking about how I have to study a certain concept many times or take an extra class because I need to “catch up” to far more experienced students (in my year). Looking back, I wish I had taken all of my high school classes two years earlier than when I actually took them, which would have given me a bigger head-start in college. And programming? Upon graduating from high school, I couldn’t make a simple “Hello World” program, whereas other Berkeley students (and this is especially common among international students) were busy winning programming competitions in high school.

In Outliers (more on that later) Malcolm Gladwell describes how hockey players who were born near January 1, and thus had the size edge during youth leagues, are more likely to reach the highest level of the sport than other guys born at different times of the year. This is because being good early leads to snowballing advantages. This “snowballing” is what I wish could be an advantage for me, not a disadvantage. It’s also partly why I don’t think that just taking courses makes it easy to catch up, as (to take an example) professors would rather work with students who have already taken graduate courses in their research area over “riskier” students who have to take those courses and who may not like them or may not do well in them. It’s really hard to catch up.

• (2) I often did not make it clear to others that I was deaf, in part because I was embarrassed by it. I discussed my uneasiness in telling people that I was deaf in a blog post a few months ago, but here, my focus is on my pre-college life. Starting from middle school, which is when I first became conscious of my dreadful social hierarchy position, I constantly tried to hide my deafness by not signing in public and by focusing on my teachers instead of my sign language interpreters during classes. In high school, I expressed little enthusiasm in discussing “deafness” with anyone. In my senior year, it was awkward for me to write my college essays, since my parents were adamant that I should write about being deaf. (My difficulties in expressing my thoughts probably explains why I didn’t get into many colleges: lackluster essays plus lack of impressive extracurricular activities.) Fortunately, by the time I got to college, I had learned to watch the interpreters more often, but I still don’t generally tell people I’m deaf when we meet for the first time. It’s still a little awkward.

• (3) I spent too much of my life emphasizing sports, either playing sports or following sports-related news. I have spent many hours doing organized soccer, baseball, basketball, skiing, and to a lesser extent, ultimate frisbee and track & field. In addition, during down-time, I would often read popular sports websites such as ESPN and NBA.com. But I think I could have put that time to better use, because sports haven’t exactly been the greatest thing for me. Some people join sports to get to know other people, but I don’t think I made a single friend out of being on a sports team. In addition, sports were often a source of stress in my life. I was usually not among the top players on my sports teams, and I constantly worried about screwing up and embarrassing myself. Finally, and probably most importantly, I’m not sure I genuinely enjoyed sports. When my high school soccer teams won important games (or scored game-winning goals), I was one of the least enthusiastic players on the team during the celebrations. While other players might hoot and holler and pile up upon the player who scored a winning goal, I would quietly do a few token jumps.

• (4) On a related regret, I did not do enough to improve my physical fitness. This is not the same as playing a sport; it’s about the work of weight lifting to get stronger and running to improve stamina. Speaking as someone who’s played a lot of sports, I can definitely vouch for the importance of physical fitness and conditioning. Consider this: if someone doesn’t have the foot skills to handle a soccer ball well, but has incredible speed and strength, that player could be a solid defender on a good soccer team.

I have this regret mainly because I was never among the most athletic players on my high school teams. (I know that a lot of this is genetics, but genetics doesn’t explain everything.) When I was on the high school soccer team, for instance, I was regularly among the slowest long-distance runners when we ran laps and probably the slowest sprinter on the team. And, while I had tried going to the weight room often, I was unable to really notice any strength difference. That changed once I had read Starting Strength and Stronglifts in college and got to see noticeable gains in my weight lifting and overall strength, but that begs the question: why didn’t I know about those resources before college? Fortunately, I’ve gotten a little better at working on my strength, but I’ve also been lagging behind on my running.

• (5) I did not have a good diet until I was around 21. The biggest reason why I consider my my diet to be so bad was because I ate a lot of refined carbohydrates: lots of pizza, plain bagels, white rice, and (sometimes) white pasta and white cereals. Furthermore, even if I had always gone whole wheat for these, having a diet that is 90 percent based on whole wheat does not count as a good diet. I used to eat from Subway a lot, which has heavily processed meat. I also would drink a lot of diet soda, which are almost as bad as regular, sugary soda. What I should have done was emphasize lots of fruits and vegetables, lots of (properly-prepared) meat, and lots of eggs. Of course, all of these have to be cooked and prepared properly, especially in the case of meat. For this regret, I’m happy to say that I’ve made a lot of progress in overcoming my guilt over this. When I was 21, I forced myself to overhaul my diet, and it’s now far more rich and nutritious today than it was a few years ago.

• (6) I spent too much time playing video games and computer games. I’ve played a variety of games in my lifetime: sports games, real time strategy games, turn-based games, shooter games, building/tycoon games, and others. The two that I have probably played the most are Age of Empires II and Civilization IV. In middle school and high school, I spent way too much time playing them than is healthy, sometimes spending ten hours a day when I didn’t have school. I guess one reason why I liked these games so much was that they were strategy games designed to test my mind, that they were related to designing and building empires, and that they were just a whole lot of fun. In addition, these games do not require me to understand any dialogue that happens in them. There are lots of in-game sounds, but that’s what they are: sounds, not words, which are harder for me to discriminate. Fortunately, while I still play some of these games once every few weeks, I no longer have the immediate urge to play a game whenever I have free time. I think I grew out of those during my college years.

Fortunately, I now have gotten a lot better at reading more books. I have read fifteen books this year (so far) and plan to write up a summary of each book I’ve read in a giant blog post at the end of this year. Most of the books I read are well-regarded non-fiction books that relate to real-world subjects of interest: foreign policy, history, technology, psychology, and other areas. But I still feel like I am reading all these books partly to make up for lost time.

• (8) I spent too much time browsing random websites and message boards. In part, this was due to my obsession with playing games. For instance, I have almost 6,000 posts on the Civilization Fanatics Forum and was known as one of the top single-player Civilization IV players on the forum. (Yeeeaah … I was really obsessed with that game!) I also posted on other message boards in addition to game-related ones. Sadly, College Confidential was one of them1. In part because I don’t play games that much anymore, I have been a lot better in avoiding message boards. In addition, because I have so much on my plate now in terms of research and coursework, I spend far less time aimlessly browsing the Internet.

Nowadays, there are only a handful of websites I check on a regular basis, and if they are blogs or news-related, I try not to check them until the evening. I deliberately have only a few websites bookmarked on Google Chrome, and I don’t spend much time reading other people’s blogs as I used to. Oh, and what about Facebook? Don’t worry – Facebook was actually one of the earliest sites that I was able to resist checking.

• (9) In college, I was not aggressive enough in reaching out to other students to work on homework together. I think part of the reason for this is that, for some time, I actually wanted to do homework by myself. To be clear, I was not ignoring requests to work together; I was simply not active in reaching out to other students. I thought that if I worked on my own, I would avoid distractions and learn faster. That worked for a few courses, but as the material became more advanced, I needed to talk to more students, and it was hard for me because I lacked a social base. I relied almost entirely on TAs and professors for assistance with coursework. Fortunately, I’ve now completely changed my stubborn “work alone on homework” strategy and have found other students to work with during classes in recent semesters. As a bonus, my homeworks have generally improved.

• (10) This is the most recent regret I have, focusing on my experience during the past three years. For some reason, I’ve (hopefully temporarily) lost the capability to ignore my isolation. I have let it adversely affect my mood and productivity and I worry about how others view me. It’s true that being able to do better in my courses and, especially, getting some research papers would help me combat my constant obsession about isolation, but at the moment I need to figure out how to ignore these thoughts. I think part of it has to do with growing up and getting older; I have higher expectations for myself, both socially and academically, and I want to aim high.

Hopefully in five more years I can look back at some of the progress I’ve made. As covered earlier, I have made some progress on overcoming some of the constant guilt I feel about myself. I just want to be a better person and not feel like I am constantly in “catch-up” mode with regards to my life.

1. College Confidential is one of the most depressing places on the Internet. Please don’t go there.

# My Mood -- Not Skill -- is the Limiting Factor

When I’m doing some homework, research, or trying to learn a concept, I want skill to be my limiting factor. What do I mean by this? I mean that if I’m having difficulty doing some work, the reason should be straightforward: the work is challenging, and I need more skill (or more accurately, that plus lots of focus and effort) to be able to accomplish my objectives.

Unfortunately, in recent times it has not been skill that is my limiting factor, but how well I feel.

I’m writing this post after a week when I was stressed over feeling isolated, both in terms of research and in terms of social settings. The former is because I haven’t made much research progress, and I feel like I’m cut off from the research community. The latter is, well, kind of obvious.

When I tried to do work this past week, the limiting factor was how well I could bring myself to focus and keep my thoughts about isolation at bay.

Almost all of my negative experiences, almost all of my sources of stress and depression, can be traced back to some sort of isolation.

This explains my frustration, which is another suitable word to describe my Berkeley experience so far. I feel like I’m capable of so much more, except I keep getting distracted. The result is that I feel like I’m in a deep hole and I can’t climb out.

I can’t believe that most people want to feel this way. As far as I understand, humans are social creatures, and people want to feel like they belong. I view myself as a social person, even if I sometimes cannot convey that message clearly enough. I think about social settings all the time; it’s a common theme that appears when my mind wanders. Unfortunately, reality usually strikes a few moments later, in the sense that there are many people who I want to talk with, but I don’t feel like I can talk with them. I might think they’re out of my league socially, or that communication would be difficult for some reason (e.g., accents).

Fortunately, whenever I do get an extensive conversation with someone, it’s enough to keep me refreshed and deplete my internal “isolation meter” for a few days.

The bar for me being happy is set really low.

In the meantime, I have to learn how to stay positive, and I’ll continue searching for people I can work with, hoping that … at … some … point … I can find that true, collaborator to answer my dreams.

# The Four Classes That I Have Self-Studied

While I have access to many advanced and high-quality classes at Berkeley, sometimes I need or want to review foundational topics to make sure I really get the material. I skipped (or did not do well in) some early prerequisites for upper-level computer science and math courses, so I constantly feel like I have to make up for that in my own time.

In this post, I describe four classes that I have self-studied to a substantial extent. Two are from MIT’s Open Course Ware (OCW), and two are from UC Berkeley. All four of my self-studying pursuits were enormously beneficial to me.

Here are the courses listed according to the time I self-studied the material (“My Time”):

18.440, Probability and Random Variables

• Institution: Massachusetts Institute of Technology
• Professor: Scott Sheffield
• Course Offering: Spring 2011
• My Time: Summer 2012

To prepare for my probability class at Williams, I decided to go through MIT’s version of the same class. (In fact, I even made a blog post about this.) Unfortunately, the version on OCW doesn’t offer any lecture videos, so I instead went through all the lecture slides and made sure to understand basically everything covered there. I also took both practice midterms and the practice final, which were surprisingly easy.

To increase my understanding as I was studying the OCW materials, I also read a draft of The Probability Lifesaver, written by Steven Miller (a professor at Williams College).

CS 61B, Data Structures

• Institution: University of California, Berkeley
• Professor: Jonathan Shewchuk
• Course Offering: Spring 2014
• My Time: Summer 2014

I went through all of the class lecture notes from the course website (accessible through Shewchuk’s homepage) and made sure to study them well. I went through all of the homeworks and labs, and made some progress on all three of the major projects. I didn’t complete them – I just got to a stage where I knew I had made a lot of progress and felt that I understood the purpose of the project. One thing I wish I had time for was to do more practice exams, especially those from Paul Hilfinger’s versions of CS 61B (a.k.a. the harder versions).

This class was super helpful for me because I never had a strong data structures education, so I reviewed a lot of concepts that I had implicitly assumed were true but didn’t know why. Reviewing CS 61B made it possible for me to do the tough Java programming assignments in CS 288.

18.06, Linear Algebra

• Institution: Massachusetts Institute of Technology
• Professor: Gilbert Strang
• Course Offering: Spring 2010
• My Time: December 2014 and January 2015

This is a fairly popular MIT OCW course, in part because of the reasonably high quality video lectures. Gilbert Strang lectures at a relatively slow pace, which is fine with me because I prefer slow-paced lectures (and tough assignments). I went through all of the video lectures and made sure to understand them as much as I could, which alone was enough for me to feel like I learned a lot. I briefly read some other related class handouts, but most of the time, my supplemental learning resource was … Wikipedia.

In the future, I should do some of the practice exams.

CS 188: Introduction to Artificial Intelligence

• Institution: University of California, Berkeley
• Professors: Pieter Abbeel and Daniel Klein
• Course Offering: Spring 2012, 2013, and 2014 (varies)
• My Time: Summer 2015

This is a popular undergraduate CS course, both within CS and outside of CS for student who want to “try out computer science.” Fortunately, the class has a lot of material online. I went through all 24 video lectures, but I used different years depending on which YouTube videos had auto-captions and/or which ones had louder sound. Each lecture came with detailed (and humorous) slides, so I read all of those as well.

In addition, I think I took about 15 practice exams (yes, that is not a typo) and rigorously checked my answers with the solutions. It was probably overkill for me, but I really wanted to know this material well. To my disappointment, I noticed that certain questions were recycled (in some form) year after year. Thus, I can’t wait to be a GSI for this class later so I can ramp up the difficulty of the exams by not using those kind of questions.

The Future

I plan to continue my self-studying pursuits during the upcoming summer. Here are the courses on my “self-study radar”:

• At UC Berkeley, CS 61C, Machine Structures. This in in progress … but barely. I’ve personally resolved that I won’t do any other self-studying of a computer science course until I finish this one. Knowing this material down cold is simply too important for me. After this, I can branch off to other, more advanced areas such as self-studying operating systems.

• At MIT, 8.01, Introduction to Physics. I’ve never taken a physics course before and I have not started doing this. The downside is that it might be tough to find practice material since MIT had to pull down the material due to Walter Lewin’s inappropriate actions. The videos are online, but I may have to do some searching for the assignments and exams.

I don’t have anything on my radar for math and statistics, in part because probability and linear algebra are so ridiculously important, that if I really wanted to do any self-studying, it would be better for me to actually re-re-study those two courses! In fact, I should probably be doing that this summer anyway.

# Are Deaf People Immune to Certain Things?

Yesterday, I finished reading Atul Gawande’s fascinating 2002 book Complications: A Surgeon’s Notes on an Imperfect Science. This is the first of his four books – all of them bestsellers – and I read his other three books earlier this year. I’m staying on track to read by far more books than I had planned to as part of my New Year’s Resolution, so that’s nice. Also, unlike in December 2015, when I only discussed the top three books I read, in December 2016, I plan to cover all of the books I’ve read that year. I’ll do it in one blog post, with one paragraph for each book, plus some additional commentary. Complications is already the fourteenth book I’ve read in 2016, so that blog post will be super-long. (It’s currently in a draft state behind the scenes so that I don’t have to write it all at once.) Stay tuned until December 31, 2016, everyone!

But anyway, I wanted to comment on a particularly interesting portion of the book. In a chapter describing a pregnant woman’s intense nausea which mystified her doctors (hence the title of the book), the following text came up:

In 1882, the Harvard psychologist William James observed that certain deaf people were immune to seasickness, and since then a great deal of attention has been focused on the role of the vestibular system—the inner ear components that enable us to track our position in space. Scientists came to believe that vigorous motion overstimulates this system, producing signals in the brain that trigger nausea and vomiting.

My first reaction was: hey, this is pretty cool! And this was done in 1882? Really?

Just to give my personal experience: while I’ve occasionally had mild cases of nausea, I can’t recall ever feeling sea sickness1, or any kind of motion sickness. I’m a huge fan of roller coasters, for instance, and I can ride them often without feeling sick. To make the point clear: when I was in eighth grade, I rode the Boomerang Roller Coaster at The Great Escape twenty-five times – in one day. (Ahh … those memories of empty lines and being able to quickly exit the coaster and race to get back on.)

It’s pretty cool to think about being immune to something. My mind wandered to thoughts about whether deaf people might be immune to things such as deadly diseases. I remembered that passage at the start of The Death Cure when the Rat-Man told Thomas2: “The Flare virus lives in every part of your body, yet it has no effect on you, nor will it ever. You’re a member of an extremely rare group of people. You’re immune to the Flare.”.

I thought about some of these things as I read through the rest of Complications, so after I finished the book, I decided to briefly investigate further. Here is what I found.

• First, it’s clear that it’s not deafness that causes the so-called “immunity” but, as Gawande points out, the condition of the vestibular system. I’m not sure as to whether a weakened vestibular system is the cause for my deafness. I was deaf since birth and as far as I know, there’s no explanation for it besides the randomness of genetics.

• One of the most commonly cited sources for this fact (or “myth” as some might call it) is an old 1968 paper called Symptomatology under storm Conditions in the North Atlantic in Control Subjects and in Persons with Bilateral Labyrinthine Defects. Yeah, the title is pretty bad but the paper showed that in an experiment, a few deaf people did not experience seasickness.

• The original source, William James’ 1882 “study” is called “the sense of dizziness in deaf-mutes”, but I can’t figure out a way to access it; it’s trapped behind several websites that restrict access (ugh). I can’t even use my Berkeley credentials. All of my knowledge about that paper therefore comes from third-party sources.

• Almost all other sources about this topic are from really random and ancient research or (worse) newspaper articles. Here’s one example from a 1986 article on the SunSentinel, and it’s pretty lame. Also, I tend not to trust articles that appear on ad-heavy websites.

So yeah, there isn’t that much focus on deafness so far as the vestibular system. Shucks.

As I was reading through these ancient sources (well, the ones I could access), I also wondered about the evolutionary benefit of being deaf. I can’t think of any, unless deafness somehow came with another benefit to counteract its negative effects. I hope it’s some secret immunity.

I mean, think about how bad life would have been during the years humans have lived. If I had been born in 1800, for instance, I wouldn’t have had access to the high-quality hearing aids I’m wearing as I type this blog post. In fact, the best kind of “hearing aid” I could have used would be those terrifying (and ineffective) ear trumpets. Ugh.

Going further, consider the prototypical “cavemen”. For them, having good hearing would be more important than it is for humans today; there was no sort of disability law and little to no visible communication mediums (e.g. writing) to compensate.

This line of reasoning could also extend to other disabilities. Why do they keep appearing in our population? A quick Google search of “evolutionary benefit of disabilities” resulted in several random, small news articles after another, hardly convincing evidence. Another, non-disability related one might be homosexuality; indeed, that was one of the choices of text that Google suggested for me when I was typing “evolutionary benefit of”. It seems to be fairly accepted that homosexuality is not a choice, but then this raises the question: what is its evolutionary benefit? And what about, er, this kind of stuff? All right, that’s enough of this thinking for today.

1. Admittedly, I don’t spend a lot of my time on boats.

2. The Death Cure is the third book in The Maze Runner series, and the Flare is a deadly disease that infected and killed most of the world’s population.

# Why I (Reluctantly) Don't Show up to Class

At the end of the first CS 267 (Applications of Parallel Computing) lecture, I was looking forward to the rest of the class.

Well, after three more lectures, I’m probably done attending them for the semester.

No, don’t worry, I’m still taking the class1, but I negotiated an unusual accommodation with Berkeley’s Disabled Students’ Program (DSP). All CS 267 lectures are recorded and available on YouTube to accommodate the large number of students who take it as Berkeley students and as non-Berkeley students. The course is also offered almost every year, so students can watch lectures and study the slides from previous iterations of the class.

So what did I suggest to DSP? I told them that it was probably best for me not to attend classes, but to watch the lectures on YouTube, so long as DSP could caption those videos.

Why did I do this? Because CS 267 has three factors that are essentially the death-knell for my sign language interpreting accommodations:

• The material is highly technical.

• The lecturer (Professor Jim Demmel) goes through the material quickly.

• I am not familiar with the foundational topics of this course.

The last one was the real deal-breaker for me. Even in classes that completely stressed me out due to the pace of the lectures and lack of suitable accommodations (CS 288 anyone?), I still had the foundational math and machine learning background to help me get through the readings.

But for a class about lower-level computing details? I have to check Wikipedia and Stack Overflow for even the most basic topics, and I could not understand what was being discussed in lecture.

Thus, I will watch lectures on YouTube, with captions. Unfortunately, DSP said that they required at least a 72-hour turnaround time to get the captions ready2, and I’m also not sure who will make them. I think it would be hard for the typical captioner to caption this material. I suggested that using YouTube’s auto-captions could be a useful starting point to build a transcript, but I don’t know how feasible it is to do this.

I suppose I could fight and demand a shorter turnaround time, but honestly, YouTube’s auto-captions are remarkably helpful with these videos, since I can usually fill in for the caption’s mistakes. Also, the limiting factor in my progress in this course isn’t my understanding of the lecture material – it’s my C programming ability. Finally, I have other issues to worry about, and I’d rather not get into tense negotiations with DSP. For instance, I still regularly feel resentment at the EECS department for what I perceive as their failure to help me get acclimated into a research group. I am only now starting to do research where I am not the lead and can work with more experienced researchers, but it took so long to get this and I’m still wondering about how anyone manages to get research done. My research — and overall mood — has been a little better this semester, but not that much, compared to last semester. I don’t want the same feelings to be present for my opinion of DSP.

Hopefully this new accommodation system for CS 267 will go well.

1. I have never dropped a class before.

2. I have not received the captions for any of the lectures so far.

# The First Day of Class is the Most Awkward

I have now completed the first class sessions for all the courses I’m taking this semester.

And I’m relieved.

I’ve always found the first session to be the most awkward of all class sessions. The reason is that, due to my class accommodations, there are typically two sign language interpreters (or sometimes in the past, captioners/CART-providers) who show up to the class with me. They sit near the lecturer, so they’re impossible to miss.

Consequently, when it’s the first day of class, I sometimes get paranoid and wonder if students are constantly thinking about the extra people in the room. Or, worse, what if they’re repeatedly looking at me? After all, other students might be curious about who on earth might actually need such accommodations. When I think about this, my face feels a bit hotter and I sometimes wish I could hide and blend in like a “normal” student, for once.

That’s not to say I never want people to think about me. For instance, if I knew students were thinking something similar to: wow, that guy over there who needs sign language accommodations must be reasonably good at this material or possess ability to work extremely hard, given his inherent disadvantages, well then perhaps I shouldn’t feel so awkward.

Of course, the point is that I don’t know what other students think of me, so I default to a more pessimistic view.

The worst part about these first sessions is when the interpreting integration does not go seamlessly. When this happens, it’s usually because someone arrived late to class. One of the most awkward first class sessions for me occurred back in my sophomore year of college. I was taking intermediate microeconomics with about 50 students in it. The school administration gave my interpreters the wrong room number, and I had failed to notify them after only recently finding out myself.

This meant that the interpreters showed up five minutes late to the first class, after everyone got seated. They caused a brief interruption, with one interpreter telling me what happened, and the other one introducing themselves to the professor.

Yes, that was pretty awkward. My face was a little red and I kept my eyes firmly focused on the board, hoping that the other 50 students wouldn’t look at me for more than few seconds.

Don’t misunderstand what I’m saying – there are times when I really like the attention. For instance, as I’ve stated a few times in this blog, I enjoy giving talks (e.g., project presentations), so I like the attention in those cases.

I just don’t like being highly visible when it’s the first day and a bunch of students who don’t know me have to suddenly get used to the interpreting services in the class.

In addition to bearing the initial awkwardness over the accommodations, I have a few other first-day concerns. One is that I know I have to arrive early to classes to make sure I can get a seat in the front row of the class, preferably at one of its “ends” since that results in the optimal positioning for me and the sign language interpreters (and probably for the other students; I don’t want to know how annoyed they’d be if the interpreters sat in the center of the room).

Due to the enrollment surge in graduate-level EECS courses, if I don’t manage to quickly secure one of those coveted front-row seats, then I probably have sit or stand near the front corner. For me, it’s better to stand near the front than sit in the back, but fortunately I’ve never had to weigh that tradeoff. In all my classes this semester – in fact, in every class I’ve had in recent memory – I’ve always been able to secure a front-row seat, but it’s still a concern for me.

Fortunately, with the first class sessions behind me, things should improve. From past experience, after about four weeks, everyone seems to get used to the interpreters, and a few wonderful students and professors start socializing with them (and me!).

Furthermore, after the first class, it becomes clearer to me and the interpreters how to best position ourselves for maximum benefit. I’ve had to suggest changing our seats a few times.

All right, I guess what I really want to say is that I’m looking forward to my next few classes.

# IPython, Jupyter Notebooks, and matplotlib

Two and a half years ago, I wrote a post about programming in Python. One of my tips was to use the Python shell, so that one can quickly test simple commands before integrating them in a more complicated project.

Fast forward until now, and my Python habits have changed substantially. One notable change I have made is to use IPython instead of the Python shell. For my usage purposes, the IPython shell has been a strictly superior version of the standard one due to the following:

• It includes TAB completion for functions. For instance, suppose I’m importing the numpy library, and I want to create an array variable, which means I need the array function. I start the IPython shell (by typing ipython on the command line), import the numpy library, and when I press the TAB key after a = np.arr, I get the output:

IPython is smart enough to tell me which methods I might be interested in using! It’s a really nice feature, and I’ve found that it also works when one tries to autocomplete function parameters. In the standard Python shell, typing TAB just means … creating extra TABs.

• It makes it easier to fix for loops, which is handy because it’s really easy to make a mistake with loops. Consider the trivial example below:

In IPython, to fix the loop, I just need to press the UP key and it will load both lines of the for loop. In the standard shell, the UP key would only return print "hi"", forcing the user to essentially retype the loop.

• It remembers commands from previous sessions, so I can exit an IPython session, do other stuff, then restart IPython, press the UP key, and it will give me the commands I used in my last session.

These three are the extra IPython features that have been most useful for my work.

I frequently use Python for work because it is a simple language that has lots of robust math, machine learning, and data analysis libraries. My favorite Python library is matplotlib, which is used for forming high-quality plots.

A few months ago, my workflow for using matplotlib was to write a script that first gets the data into a matplotlib plot, and then saves it (using the savefig(...) function). When I need to make lots of figures, however, it gets cumbersome to manage them, and I often have to keep multiple images open so I can spot-check their changes when I re-run my script (e.g., if I modified the font size of the text).

Fortunately, I discovered Jupyter Notebooks. These are brower-based platforms that make managing matplotlib-based images far easier by keeping information unified in one screen.

To start a notebook session, I type in ipython notebook on the command line, which opens up a web browser (for me, it’s Firefox). I then click New -> Python 2 to start the session. For a basic plot, I can start by importing the library: import matplotlib.pyplot as plt, but then — crucially — I use the %matplotlib inline command. The reason for using that is so that when I write code to plot, and then execute it with a simple SHIFT-ENTER, the image will appear directly under that code cell. Here’s a simple example:

This is nice, but what if I want to change some plot setting? If these images are going to be in an academic paper, they better have labels and legends, among other things. With these notebooks, one can modify the text in a cell and regenerate the image; here’s an example with some common commands I use for my plots:

This example doesn’t quite show the benefit enough, but once projects get more complicated, notebooks are a valuable tool to keep data organized. Moreover, one can save a notebook session so that the next time it gets opened again, its plots remain visible on the webpage.

For those of you who use Python, I encourage you to check out IPython and Jupyter. They add on to what is already an awesome general-purpose programming language.

# The Enrollment Surge in Graduate Courses

This link on Berkeley “By the Numbers” states that 73 percent of undergraduate classes have fewer than 30 students.

That statistic is (painfully) amusing for me to think about, because I’ve only taken graduate courses here, and none has had fewer than 30 students. In my class reviews, I frequently discuss enrollment, so let’s recap:

• CS 280, Computer Vision was overenrolled and had people sitting on the floor in Soda 306. The course staff had to force undergraduates to drop the course.

• CS 281A, Statistical Learning Theory had one of the largest (if not the largest) rooms in Cory Hall, and we still had people sitting on the floor during the first few lectures. This is despite how CS 281A was offered the semester before I took it. Most graduate courses never get offered in consecutive semesters.

• CS 287, Advanced Robotics. This is the only class where I can get a precise picture of enrollment in previous years, since the CS 287 course websites list the final project presentation schedule and I can count the students. (The Fall 2015 edition is on Piazza, not the official website.) The Fall 2009, 2011, 2012, 2013, and 2015 classes had the following respective number of students give project presentations: 19, 36, 15, 48, and 58.

• CS 288, Natural Language Processing was overenrolled at the start; the professor said in his introductory email that “Since there are 80+ of you interested in what is normally a 20-person class, I wanted to be clear about how we’re planning to handle enrollment […].” Even with students eventually dropping, I am almost positive we had well over 30 students, possibly over 40 remaining at the end of the semester.

• CS 294: Deep Reinforcement Learning was overenrolled and the staff moved the room and offered two lecture times. In theory, deep reinforcement learning is just one “small sub-research area” of Artificial Intelligence, but in reality, it’s probably the most popular of those areas.

• EE 227BT: Convex Optimization was also quite crowded, though I don’t know if enrollment was that much greater than in previous years, but I don’t think having 50-60 students should be the norm in a graduate-level course.

It should be clear that this is due to the growing popularity of computer science as a major and a graduate degree (this page provides some hard statistics on Berkeley’s CS enrollment). The result is that Berkeley and similar schools have had to drastically expand the size of faculty and lecturers, but I worry about what will happen long-term if enrollment abruptly declines, say in five years. I wasn’t old enough to understand the dot-com bust, but I think I may need to go and read some of the literature on that era to have a better idea if history is repeating itself.

# Thoughts on Isolation: Why I Hated the Fall 2013 and Fall 2015 Semesters

## Fall 2013

As an alumnus of Williams College, I regularly get emails from my class officers requesting for donations to the college. These emails try to convince us to give money by including variations of: “Don’t forget the awesome memories you had at Williams. Please donate to support the experience of current students!” On the Williams website, it’s not hard to find testimonials of students saying that they have made a lot of friends and love the college. Many students have also told me this directly.

I wish I could agree.

It has now been a year and a half since I graduated from Williams. During our commencement, since the class size was small enough, the graduating students lined up to walk across the stage to get their diplomas. For some students, as their name was called, the audience roared with the sounds of their friends cheering and hollering, and Dean Sarah Bolton would have to smile and wait for the applause to die down before calling the next student.

When she called me, a blanket fell over the crowd. It was uncomfortably quiet. As I approached President Adam Falk to receive my diploma, I heard a faint scream out in the audience. I didn’t look there; I just took the diploma and went directly back to my seat, feeling a little sullen.

Later, my Mom asked me if I had heard her as I was walking on the stage.

To be fair, graduation wasn’t a complete embarrassment, even though it sometimes felt that way. Every now and then, I was able to find and talk to a few graduating students. I waved a bit, asked students about their post-graduation plans, and engaged in other polite conversations. I even managed to get in a few photos.

But deep down, I knew that I had failed on one of my two major goals before entering college. The first goal, which I achieved, was to do well academically and get in a good graduate school in the sciences. I did that, and while I never thought my field would be computer science, somehow I made it. It doesn’t hurt that computer science is a pretty hot field now.

My second goal was to make close friends.

Not acquaintances. Not one-and-done homework buddies. Not people with whom our communication would derive primarily from exchanging Facebook posts.

Real, close friends, people who I could count on for the highest-priority social events, people who I could comfortably hang out with outside of the college realm, people who I could really trust.

I was concerned about making friends before entering Williams, since I had been unable to do that in high school. (Most people from the high school who I stay in touch with nowadays are those who I would have known even if I had not gone to my high school.) To be fair, I was reasonably friendly with students from the Deaf and Hard of Hearing classroom in my high school, but my attempts to extend this to hearing students did not succeed. Given that I was the only deaf student at Williams, I was concerned.

Despite a lack of social skills, my first semester at Williams actually exceeded my expectations. For one of the few times in my life, I was surrounded by brilliant, talented students my age who were also extremely eager to get to know each other. During the first few weeks, I couldn’t believe how many times people would come up to me, unprompted, to say hello, relieving me of the burden to start an awkward conversation. My goodness, where was this my entire life?

Unfortunately, as the months, semesters, and years went on at Williams, I gradually realized that I was missing out on close friendships. I would occasionally find homework collaborators, gym partners, and irregular eating groups.

But when it came to the “real” social events, I was out.

Like in most colleges, Friday and Saturday nights are prime social hours at Williams, the times when students stick with their closest friends to go out to eat, have a party, or to just hang out (hopefully doing nothing illegal, but never mind). I usually spent Friday and Saturday nights in the computer science laboratory or in my dorm room. It’s not that I was turning down party invitations – I didn’t get them.

When I wandered around campus during these times, I regularly walked by large groups of hollering students, some of them drunk. I’m not going to lie – I really, really wished I could have been part of some of those groups, enjoying myself in the company of friends (but without the drunk part). I dreamed about this, replaying hypothetical social situations in my mind and pretending that I was the popular person in the center of the crowd, leading the group to their destination.

Unfortunately, the reality was that during the few times I was lucky enough to be with a group of students late at night, I generally did not enjoy those experiences. The reason is obvious. When the other students talked, I was unable to understand what they were saying. If I were really popular, it might be possible to have students who act as personal translators, but that was not the case.

It didn’t help that I had what I would call a “friendship ranking” problem. I could form a ranking of the top ten Williams students with whom I was friendliest. But I don’t think any of them would have me at a comparable rank on their lists; I would probably be around ten spots lower. Thus, during the prime social hours, those high-ranking students would socialize with people on top of their hypothetical friendship list. It’s what a rational human being would do. And, admittedly, I didn’t have much courage to ask people to do things together. I worried that I wouldn’t be able to understand what they said, or that I would inconvenience them.

During my second year at Williams, I had a series of stressful and unpleasant experiences in groups and parties. I consistently ran into the problem of being unable to hear what students said in group situations. Starting in my junior year, I resolved to never attend parties again. I was sick of showing up to these events myself, watching people roar and laugh at something mysterious, and then walking back to my dorm room by myself.

For a while, simply ignoring these events worked. I sometimes had nagging thoughts that I really was missing out on lots of fun and friendship, but for a while, I could hold thoughts about isolation and friendship at bay.

The Fall 2013 semester was when my mental barrier broke. My isolation truly began to hit home, and to make matters worse, it came during an incredibly stressful time of my life, when I had to write graduate school applications and work on research. During the start of that semester, my isolation consumed me. I constantly thought about it when I was completing homework, sitting in class lectures, eating by myself, and doing other activities. I was unable to focus in class and had trouble sleeping. I soon had enough of it, and left the campus for a weekend to recharge at home and to talk with an external counselor.

During the winter break, since my family lives within driving distance from the college, I remained at home, with the occasional foray to campus if I had a thesis meeting. I soon faced the reality that, while I was home, I didn’t get texts or messages from other students, asking where I was. I felt that students didn’t care about me.

I was disconnected from them.

Fortunately, the Fall 2013 (and winter 2014) debacle had a not-disastrous ending. Being away from campus helped me mentally recover (but didn’t help me make close friends). My grades were fine, and I caught up on research in the following semester, Spring 2014. I also felt better once I had gotten into more graduate schools than I expected, since I could look forward to starting a new social life at my next school, forever thinking about how to upgrade from “acquaintance” to “close friend.”

Despite feeling better in the spring, I still skipped all the major senior social dances, parties, and events. No one asked me to go, and I did not know anyone who I could confidently ask to go with.

Ultimately, I have mixed feelings about my Williams experience: generally positive for academics, generally negative for social. I have not donated any money and don’t plan to donate, though I might change my mind later. After graduating, I knew for sure that I wanted the Fall 2013 semester to remain the worst semester of my life. I had no desire to relive my constant concerns over isolation.

But then, the Fall 2015 semester happened.

## Fall 2015

I’m going to refrain from a final statement as to which of these two semesters was worse. Hopefully, after some more time passes, I can relax and judge the Fall 2015 semester with a clearer mind, like how American presidents are often evaluated more favorably far beyond their presidency, as compared to immediately after their last term.

The Fall 2015 semester, however, currently holds the edge in the title of “worst semester ever”. The culprit, if you haven’t figured it out already: isolation.

Almost all of my negative experiences, almost all of my sources of stress and depression, just like at Williams, can be traced back to that one single, simple concept.

My “isolation thoughts” reappeared in the summer, an ominous sign of things to come. During that summer, I was alone in my lab room, which has six desks but (at that time) had only three students, including me, and the other two had internships at Google and Microsoft. I can remember three times when I was not, strictly speaking, alone there: when one of the two students took a break from his internship to give me a much-needed “hello” while we had lunch (that day was great), when a random Master’s student came to install his computers in the room (but he never showed up again and I saw someone else move his computers later), and when two students from another research group installed a research computer in the lab (but their real office is in a different building).

Aside from those three cases, I can’t think of another time when I spoke with anyone else near my desk that summer. It should say a lot that I vividly remember these minor interactions (and what we talked about), because deep, memorable interactions are hard to get.

As I mentioned in another post with the prefix “Thoughts on Isolation”, the isolation I was experiencing in the summer gradually consumed me and hindered my ability to do work and to study. During the weeks before, during, and after my prelims (i.e., late August), I went through several days that I would call “lost days.” Here’s the definition: a “lost day” is one when I show up to my office at the usual time, stay there for eight to ten hours, but do not make any progress at all on work, because my mind is consumed with thoughts on isolation.

These feedback loops were devastating, robbing me of any hope of making progress during those lost days. I tried desperately to escape the loop: calling my parents, walking around campus, going to cafes, lying down on the couch in the lab room, you name it. But none of these were able to completely get rid of the feedback loop.

If only I could make it to the prelims, I thought, then things would get better. Passing the prelims would give me confidence that I needed to regain my research productivity. The start of the semester meant that there would be more people around. Things would go better.

So much for that.

Despite an impressive performance on the prelims, the Fall 2015 semester was a disaster. If anything, I felt more isolated compared to how I felt in the summer. I was bombarded with signs that students were less isolated than me. I saw students in the same research group stick with each other, working together or hanging out. The fall also brought a new wave of accepted research papers, many of them involving groups of two or more graduate students and postdocs. It was hard to avoid knowing about these papers, as the information is readily available. Sometimes these papers are on graduate student homepages, but I try not to look at those anymore.

Looking at these groups of students, either together socially or together in a publication, made me feel frustrated. I longed to be part of those groups. I wanted to break out of my cycle of isolation. I wanted to feel happy looking at other people, not disappointed.

My mood did not recover from the summer. I would feel upset while sitting in class lectures, knowing that I was different from the other students. I repeatedly got angry at myself during (and after) lectures when I was unable to follow the sign language well enough to sufficiently understand what lecturers were saying. I tried to reassure myself, knowing that I would spend nights and weekends reading webpages and textbooks to catch up on the lecture material, but somehow that didn’t make me feel better.

There’s something else that happened this semester. Something I’ve been trying not to think about lately, without much success.

I would feel isolated and experience a slight twinge of resentment, whenever I heard, read, or thought about “diversity in computer science.” I kept thinking that “diversity” in the context of computer science means getting more women and racial minorities involved (well, not all racial minorities …).

When I search online about “being black in computer science” or other similar queries, I see articles such as this recent one from Stanford. One of the sections in that article says: A feeling of isolation, and it describes isolation from being a racial minority.

A feeling of isolation.

Oh, wow. You know, that might just describe how I feel on a daily basis.

I kept thinking throughout the semester that, whenever the topic of diversity in computer science comes up, it’s assumed that Caucasian and Asian males, such as myself, have few issues getting along with others and feeling included.

That is probably true for most of us, but I can state from personal experience that all the attention towards making women and minorities feel more included in computer science makes me a little frustrated. OK, sometimes more than “a little.”

To be clear, I’m not saying that I don’t have advantages from being a Caucasian and Asian male. I have never been racially insulted, or sexually assaulted. If I had a different body type, those aspects about my life might be different.

But on the other hand, suppose I were black and hearing. Then, wouldn’t it be possible for me to sit through a lecture and finally piece together a few consecutive sentences from the lecturer? Wouldn’t it be possible for me to follow the conversation in a rapidly-scheduled research meeting with five people?

Wouldn’t it be possible for me to enjoy being in a group?

I face challenges that are different from those of women and minorities, some of which will lead to similar conclusions (i.e., isolation). Unfortunately, I don’t feel like I have an outlet, some kind of real support group of students who might help me. And people won’t line up to hear my opinion.

I’m not saying that my first year at Berkeley was that great – it wasn’t – but I never regularly thought about how much I was detesting it here.

Eventually, as the semester progressed with more thoughts on isolation and a few more “lost days,” I finally tried to tell people explicitly that I needed help to combat isolation. That this semester was just taking too much of a toll on me. Earlier, I had told others that my graduate experience wasn’t that great, but I now had to downgrade it from “so-so” to “awful” to make things clear.

I don’t want to place the blame on anyone in particular. I don’t think there is anyone to blame, except the “system” as a whole. I believe this because one thing that hurt me was failing to make it obvious when I first arrived in Berkeley that (a) I was deaf, and (b) I needed help finding real collaborators.

While I do feel like things can move at such a glacial pace, at least there are people here trying to help me out. I’m extremely grateful to the ones who have not completely disregarded me, and have given me the opportunity to – as of today – have much more collaboration than I have had in my life. A new era begins now. I can’t waste this opportunity.

So will my story have a happy ending? (Sigh) I don’t know.

## Conclusion

By now, it should be clear that 2015 was not the greatest year for me. It started off somewhat, kind of, reasonably well, but fell off a deep cliff during the summer and remained buried under a Mount Everest-sized pile of stress during the Fall 2015 semester.

I really hope 2016 will go much better.

I’ll keep this conclusion short. To everyone, my goodness, Happy New Year.

# My Three Favorite Books I Read in 2015

As the year 2015 wraps up, I’ve been reviewing my New Year’s Resolution document. Yes, I do keep one; it’s on my laptop’s home screen so I see it every time I start my computer. No, I unfortunately did not manage to accomplish anything remotely close to my original goals.

I did, however, read more books this year than I did in previous years. I was a committed gamer back in high school and college and I’m trying to transition from playing games to reading books in my free time (in addition to blogging, of course).

In this post, I would like to briefly share some thoughts on three of my favorite books I read this year: Guns, Germs, and Steel, The Ideas that Conquered the World, and (yes, sorry) The God Delusion.

## Guns, Germs, and Steel

Guns, Germs, and Steel: The Fates of Human Societies, by Jared Diamond, is a 1998 Pulitzer Prize- Winning (General Nonfiction) book about, essentially, how human societies came to be the way they are today. It aims to answer the question: Why did Eurasians conquer, displace, or decimate Native Americans, Australians, and Africans, instead of the reverse?

The white supremacists, of course, would say it’s because Caucasians are superior to other races, but Diamond completely eviscerates that kind of thinking by presenting strong geographic and environmental factors that led to Eurasia’s early dominance. Upon the age of exploration, it was Europe which contained the most technologically advanced and most powerful countries in the world. (Interestingly enough, this was not always the case in the world; Australia and China had their turns as being the most advanced countries in the world.) That European explorers had guns were not the main reason why they conquered the Americas, though: it was because they were immune to diseases such as smallpox that decimated the native populations.

I learned a lot from this book. Seriously, a lot. The book was full of seemingly unimportant factors that turned out to have a major impact on the world today, such as the north-south shape of the Americas versus the east-west nature of Eurasia. While I was reading the book, I kept repeating to myself: wow, that argument should have been obvious in hindsight, an indication that the book was effectively supporting its hypotheses. I felt a little uncomfortable when Diamond had to add several disclaimers in the book that it was not going to be “a racist treatise” but unfortunately that text is probably still necessary in today’s world.

A negative effect of reading this book was that, since it deals with the growth of human civilizations, it made me want to play some Civilization IV, but never mind. This was a great book.

## The Ideas that Conquered the World

The Ideas That Conquered The World: Peace, Democracy, and Free Markets in the Twenty-first Century, by Michael Mandelbaum is a 2002 book that reviews the state of Western values at the start of the 21st century. If one compares life today to what it was like during the Cold War and earlier, some of the most remarkable trends are that countries heavily prefer peace as the basis of foreign policy, democracy as the basis of political life, and free markets as the basis of economic growth. Mandelbaum explains how these trends occurred by providing an overview of how countries previously conducted internal and foreign affairs from 1800 to the present. He particularly analyzes the impact of World War I, World War II, and the Cold War on liberal values.

There are many interesting themes repeated in this book. One is that Germany and Japan serve as the ultimate examples of how previously backward countries can catch up to the world leaders by adopting liberal policies. Another is that there are three “dangerous” regions in the world that could threaten peace, democracy, and free markets: the Middle East, Russia, and China, since those countries wield considerable power but have not completely adopted liberal principles. (In 2015, with all the terrorism, oil, and migrant crises in the Middle East, along with America’s diplomatic tensions with Russia and China, I can say that Mandelbaum’s assessment was really spot on!) A third theme is that much of the world has actually become less peaceful after the Cold War, a consequence of how the core countries now have fewer incentives to protect those countries on the periphery.

Of the three books here, this one is probably the least well-known, but I still tremendously enjoyed reading it. I now have a better understanding about why there is so much debate over government size in American politics. The role of the government in a free market society should be to let the market function normally, except that it should provide a social safety net and other services to protect the worst effects of the market. How much and to what extent those services should be provided is at the heart of the liberal versus conservative debate. As a side note: I find it really interesting how “liberal” is related to the free market, yet the stereotype in today’s politics is that conservatives, not “liberals” as in “Democrats”, are the biggest free market supporters. That’s vastly oversimplifying, but it’s interesting how this terminology came to be.

Oh, I should mention that this book also made me want to play Civilization IV. Perhaps I should stop reading foreign policy books? That brings me to the third book …

## The God Delusion

The God Delusion, by Richard Dawkins, is a 2006 book arguing that it is exceedingly unlikely for there to be a God, and that there are many inconsistencies, problems, and harmful effects of religion. This is easily the most controversial of the three books I’ve listed here, for obvious reasons; a reviewer said: “Bible-thumpers doubtless will declare they’ve found their Satan incarnate”.

Dawkins is a well-known evolutionary biologist but is even more well-known for being the world’s prominent atheist. In The God Delusion, Dawkins presents a spectrum of seven different levels of beliefs in God, starting from: (1) Strong theist. 100 per cent probability of God. In the words of C.G. Jung, ‘I do not believe, I know’ to (7) Strong atheist. ‘I know there is no God, with the same conviction as Jung “knows” there is one.’.

Both Dawkins and I classify ourselves as “6” on his scale: Very low probability, but short of zero. De facto atheist. ‘I cannot know for certain but I think God is very improbable, and I live my life on the assumption that he is not there’. I also agree with him that, due to the nature of how atheists think, it would be difficult to find people who honestly identify as falling in category 7, despite how it’s the polar opposite of “1” in his scale, which is very populated.

This book goes over the common arguments that people claim for the existence of God, with Dawkins systematically pointing out numerous fallacies. He also argues that much of what people claim about God (e.g., “how can anyone but God produce all these species today?”) can really be attributed to a one-time event, plus the cumulative nature of evolution. In addition, Dawkins discusses the many perils of religion, about how it leads to war, terrorism, discrimination, and other destructive practices. For an obvious example, look at how many Catholics have a negative and inflexible view of homosexuals and homosexuality. Or for something even worse, look at ISIS.

The God Delusion ended up mostly reinforcing what I had already known, and expresses arguments in a cleaner way than I could have ever managed. This brings up the question: why did I already identify as being in category 6 on Dawkins’ scale? The reason is simple: I have never personally experienced any event in my life that would remotely indicate the presence of God. If the day were to come when I do see a God, then … I’ll start believing in God, with the defense that, earlier, I was simply thinking critically and making a conclusion based on sound evidence. After all, I’m a “6”, not a “7”.

Dawkins, thank you for writing this book.

# For Final Projects, Class Presentations are Better than Poster Sessions

In computer science graduate-level courses at Berkeley, it is typical to have final projects instead of final exams. There are two ways in which these projects are disseminated among the students:

• Class Presentations. These are when students prepare a five to ten minute talk to the class, using slides and other demos to state the project’s main accomplishments. Due to explosions in class enrollment (see my class reviews here for examples), time limits are strictly enforced, so presentations must be precisely timed and polished.

• Poster Sessions. These are when students bring a poster describing their work. Usually, students create posters by stuffing lots of images and text in a power point slide (or other software). Then they print using their lab’s poster printer.

I’ve experienced both scenarios at Berkeley, and based on those I would strongly state the following to instructors: class presentations are better than poster sessions, and should be the method of choice for dissemination of final projects.

First, a class presentation means students practice a useful skill, one that they will likely need for their future careers. This is especially true for academic careers, and students taking graduate-level courses are far more likely to want academic careers than the average undergrad. For me, presentations are also a way that I can channel my humor, which isn’t immediately apparent to other students. A second, less important reason, is that in an age of exploding enrollment in graduate courses, it’s nice to be able to finally learn people’s names when they give class presentations.

One can, of course, learn names and project accomplishments in poster sessions, but this requires more effort and is challenging for people like me. I have lots of difficulty navigating my way through loud, noisy poster sessions filled with accents. I either resort to reading people’s posters (and not understanding much of it anyway due to time constraints) or going through the awkwardness of having a sign language interpreter with me (and having that interpreter struggling through accents and technical terms).

Poster sessions have other downsides that apply broadly, and not just to deaf students. For instance, poster sessions allow students to hide. What happens if students don’t manage to do much for their final projects? As I’ve seen happen in my classes, these students go to the corner of the room to avoid the spotlight. Presentations avoid this issue, unless students are willing to go as far as to even skip their presentation time. Some students who are nervous about public speaking might also want to hide. To most of them, I would respond: good luck convincing your future bosses to have you not do any presenting.

If class presentations force students to produce something that is worth presenting and force them to encounter their fears, then that’s probably sufficient reason alone to use them!

There are other downsides to having poster sessions. They cost more, creating a chasm between students who have access to fancy poster printers and those who don’t; the latter may have to resort to printing out ten pages of work and pasting them together in a poster. Furthermore, the posters that get printed are unlike to be used again, in the exact form. True, many conferences have poster sessions due to scalability issues, but class projects are not generally up to par with research projects, so students would have to re-print posters anyway. And that’s assuming that students are using class projects as the basis for future research, which isn’t always the case.

Class presentations are also superior to poster sessions in that they require less physical room. The presentations can be delivered in the same lecture room, while poster sessions force the course staff to go through the trouble of finding and reserving a large room (or hallway, as is the case for Berkeley).

Furthermore, the one “benefit” of poster sessions, scalability, does not stand up to a rigorous analysis. (If there are other benefits, please let me know because I can’t think of any.)

First, if the class size is so large that it approaches the enrollment of a popular academic conference, then would the course staff really have time to read the final reports? Remember, neither presentations nor poster sessions enable people to fully understand a project; for this, one has to read papers.

Second, with five minutes per presentation, the process goes by quickly, and it is also easier for the course staff to track progress. Also, with a large class, it is likely that students would be encouraged to form groups, drastically reducing the quantity of presentations. If there’s too many presentations for one class, the course staff should divide the class into groups.

Finally, scheduling presentations is not generally a problem even with many groups. Here’s a simple procedure: have a random draw to see who goes next. If the class requires a fixed schedule, then busy instructors should have their TAs form the order of presentations.

Unfortunately, the classes I’m taking next semester have historically used poster sessions rather than verbal presentations, but perhaps I could convince them to change their minds?

# Review of Convex Optimization (EE 227BT) at Berkeley

The third class I took this semester was Convex Optimization (EE 227BT), which was also my first time wading into electrical engineering. There are three convex optimization courses at Berkeley: EE 227A, EE 227B, and EE 227C. (Note: I say 227BT in this title because the course had a “T” for “Temporary,” but that should go away soon.) I did not take the first course, EE 227A, and I think that may have been a reason for my struggles in this class.

To do well in EE 227B, I think one needs to be highly skilled in the following two areas: linear algebra and problem solving. If a student lacks one or both of these skills, he or she is in serious trouble. For a linear algebra concept, consider this problem: $\max_{\{x : \|x\|_2=1\}} x^TAx$ for symmetric $A$. We encountered this at the start of the semester and would see it over and over again. The professor, Laurent El Ghaoui, said: “If you didn’t immediately know that the answer to this was the maximum eigenvalue of $A$, or $\lambda_{\max}(A)$, then run away to EE 227A. This is all linear algebra.” I did know that, in fact, but the class material was nonetheless very difficult for me to understand.

We had five problem sets, and I think they were among the hardest ones I’ve ever had, and also more challenging than those from CS 281A. After spending 30 to 40 hours on the first few homeworks, I realized I needed to seriously start reaching out to other students to get more than two-thirds of the homework done correctly, and I did do that this semester.

Each problem set contained three to five questions, each of which had some number of sub-problems. Their difficulty varied considerably, with some parts following directly from the definition of Cauchy-Schwarz, $x^Ty \le \|x\|_2\|y\|_2$ (not Cauchy-Schwartz … I don’t know why people keep misspelling that), and others requiring some ridiculously complicated insights. The hardest one was to prove Theorem 4 and Corollary 3 from Laurent’s paper Sparse Learning via Boolean Relaxations. Yes, we had to do that, and no, we were not given this paper reference and had to start some of that from scratch. I found out about this paper from another student. Also, the paper was published in 2015, so it must have been difficult since no one else did this until now. Setting the boolean relaxation problem aside, the homework questions were challenging but doable with some problem solving insights (one might need help for these, though), and they were brutally educational.

In terms of homework logistics, we had a paid grader who graded the homeworks, which is different from the previous iteration of the course (Fall 2014) when students had to self-grade their submissions. Note that Laurent’s EE 227BT website is (currently) incorrect; I think he recycles the same links for his classes, so some of it is out of date for the Fall 2015 edition. Our grader was surprisingly generous with points but did not offer detailed feedback and also took three or four weeks before providing grades. In part, this was because of the large class size. We had perhaps eighty students at the start before setting to fifty or sixty.

One of the “less-awesome” aspects of this class, in my opinion, was that we barely followed the projected outline. We were supposed to get five homework assignments, released every other Thursday, which meant we would get two weeks to do each assignment. However, because the lectures quickly fell behind from the outline, Laurent delayed the second homework by a week, which caused a few more subsequent delays for other assignments. This meant that homeworks eventually spilled over into time that was originally designated for us to do final project work. I think it would be best to design homeworks conservatively so that even if the lectures get delayed, there’s no need to put off the homework due dates.

We had a midterm, but that was also delayed, by a week. It was in-class for 80 minutes, open note (but not open laptop or Internet). It had three questions, each with multiple parts, and was out of 40 points total. Judging from the distribution of scores, I think most students got somewhere between 15 and 30 points. It was definitely a challenging midterm, but in retrospect, I thought it was fair, and was of higher quality compared to the CS 280 midterm.

The third part of our grade was based on the final project. We started final project discussions really early, in September! Almost from the beginning, Laurent designed lectures so that we would cover standard concepts (e.g., Lagrange duality) for 75 minutes, and then the last 5 minutes would be an open discussion of final project ideas. Despite the early focus of final projects in the lectures, in reality we didn’t have that much time to work on them due to the homeworks and midterm getting delayed and cutting into project time. I think the course staff should address this in future iterations of the course.

I worked in a group of four in my final project, where we investigated various properties of neural networks. We read a lot of research papers (the “literature review” that Laurent kept saying in lecture) and ran experiments using CAFFE and CVX. We wrote this up in a forty-page final project report. Going through and editing that at the end was a lot of work! A quick warning to future students: the project report date was set before RRR week, which I think is unusual for most graduate courses, which allow students to work on reports through mid-December.

In addition to a report, we had project presentations, which I was happy about since it’s fun to give talks. Not all students would agree with me. During the presentations, my sign language interpreters would comment on some of the students who appeared to be really nervous. To make matters worse, Laurent brought a hand-held microphone to the class, and about half of the students actually held the microphone when they were talking. No, I’m serious! And it’s not like we were on stage at Broadway — we were in a normal-sized classroom! I don’t like holding a microphone because it would make it completely obvious to the rest of the class that I was nervous about public speaking! I think Laurent had good intentions about bringing the microphone, but to future students, please don’t use microphones when talking.

When it was my turn to present, I put the microphone away after someone handed it to me (sorry, not using it!) and immediately started off with a planned joke. I told the class to pretend that Laurent and I were “trapped in a world that represents the loss function of the neural network.” (Don’t ask why!) I continued the story: I led Laurent to a local minimum, but he got angry and wanted the global minimum. I calmly responded that local minima are just as good as the global minimum in neural networks. I added a little acting and tried to cleverly alter my tone of voice. The class roared in laughter, and I think that was probably the most successful joke I have ever pulled off in a class presentation.

To wrap up my thoughts on EE 227B, I think it is similar to most classes I’ve taken in the sense that it is challenging, but very educational. I now feel like I have a much better understanding of concepts in linear algebra, especially those about norms, eigenvectors, and matrix decomposition. Many students who take this course do research in Artificial Intelligence fields, and EE 227B enables students to read AI research papers without getting bogged down by the notation and definitions. This was a huge problem for me when I first started to read machine learning papers a few years ago. I couldn’t even consistently remember what $\|x\|$ meant! Thanks to EE 227B, and some of my own independent linear algebra studying, I’ve cleared a lot of that initial “notation hurdle”.

Finally, to future students who are considering this class, the best advice I have is to make sure that your linear algebra skills are sharp. In particular, be sure you know about matrix norms, eigenvectors, and other forms of matrix decomposition (e.g., Singular Value Decomposition).

If you’re weak in those areas, then in the words of Laurent, “run away to EE 227A.”

# Review of Advanced Robotics (CS 287) at Berkeley

I took Advanced Robotics (CS 287) last semester, which is the graduate level class that Pieter Abbeel teaches at Berkeley. You can view the course website here. Robotics is a vast, highly interdisciplinary field, so to restrict the focus, CS 287 is about the math and algorithms of robot systems. No, we didn’t see giant, science-fiction style robots battle each other, but we did observe a research robot tie knots (alas, through videos, not in real time).

Before the class even began, I could tell we would have some logistics issues. Like almost every course I have taken at Berkeley, CS 287 was substantially over-enrolled at the start; we had perhaps eighty students before settling down to about sixty at the end. According to the CS 287 websites from previous years, it looks like the Fall 2009 and Fall 2012 courses had nineteen and fifteen students, respectively. Yeah, welcome to the new normal.

Due to the class size, Pieter actually provided two different lecture times, one in the morning and one in the afternoon, and I suspect he also convinced John to do the same thing for CS 294-112. Pieter did this to get to know the students better. During some of the class breaks, he would ask a handful of students to introduce themselves to everyone. Since I sat in the front corner of the room for optimal use of sign language interpreting services, I was called on first. From these introductions, I learned a few things from the class composition:

• There were a lot of mechanical engineering graduate students. So much, to the point where I was complaining (er, joking) about this with my interpreters midway through a long sequence of mechanical engineers introducing themselves. It’s a good thing that no one else in the class (I think…) can understand sign language. (PS: to mechanical engineers reading this, I was joking so please don’t get angry.)

• A lot of the students do not speak clearly! Many are quiet, have heavy foreign accents, or exhibit both qualities. The most egregious case resulted in my interpreter not understanding a single word a student said, which I mentioned earlier here.

• A lot of the students did robotics research of some form, whether it was in computer science, mechanical engineering, electrical engineering, or a related field. Then I’m confused, is it just this year that robotics suddenly became popular? Or is it because CS 287 wasn’t offered last year and that this is the “overflow” year?

In terms of course material, CS 287 combined lectures on standard topics in artificial intelligence (e.g., optimization and probability) and on more obscure, robotics research subjects. The course lectures could be divided as follows: Markov Decision Processes, optimization, probability, and research. Overall, I felt that the lectures were polished and of high quality. Pieter seemed like he really knew the material and was able to offer many doses of intuition for some of the more technical material.

I discuss this in my other reviews, so I’ll continue the trend: how did the lectures mesh with sign language interpreting services? Pieter lectured at a fast pace, which was problematic for my two interpreters, who were often exhausted when their 20-minute shifts were up. On the positive side, Pieter spoke loud and clear, to the point where I actually think he’s one of the easiest people for me to understand. Consequently, relative to other classes, I did not have much difficulty in terms of identifying the exact words he uttered. It’s also somewhat ironic that he would be the one to mention to me about an ideal future where people had “virtual captions” projected out of their mouths, which displays the text they say in real-time. Yes, I would like for that to happen.

As an added benefit, the course slides contained a lot of information. In many cases I could understand a concept or a homework sub-problem just by reading the appropriate slides, which is really handy for a text-heavy person like me. Incidentally, while Pieter wrote a lot of math on a white board, in almost all cases it was math directly from the slides, and he was writing it out for intuition. Thus, taking hand-written notes is probably unnecessary for this class.

No course is without its hiccups, however, and I’d like to bring up a few points that may (or may not) matter to future students:

• The difficulty of lectures varied considerably, which one can probably tell by browsing some of the slides. I thought the easiest class was the one on introductory probability. Since the material is quite rudimentary, I think that lecture needs to be eliminated in future iterations of the course. Basic probability is an ironclad requirement for understanding the math of robot systems. Other lectures were more complicated. The convex optimization and Kalman Filtering lectures would have been hard for me to follow had I not already had substantial exposure to those concepts.

• Towards the end of the semester, we had a “project speed-dating” lecture, which is when we gathered in small groups and shared our progress on the final project. Ideally, students could get feedback and learn what others were doing. In reality, most students skipped this class, and I’m not sure how beneficial it was to those students who did attend (I didn’t benefit). Furthermore, we eventually had final project presentations. Thus, I think project speed-dating should be replaced with a “standard” robotics lecture.

• We had three class sessions where guests from industry lectured about their companies. I’m neutral towards these, and would suggest that these only happen when Pieter (or another future instructor, if applicable) is traveling and unable to lecture.

CS 287 had four problem sets which involved math and MATLAB programming. I thought they were, on average, less challenging compared to problem sets in other classes. The math did not require incredible problem-solving skills, and I think they were designed to accommodate people from other fields (mechanical engineers …). For instance, the fourth homework asked to prove that covariance matrices are positive semidefinite, which is something that a lot of machine learning students can answer in thirty seconds. For the coding, we had to fill in MATLAB code in the designated “YOUR CODE HERE” sections. We got a lot of starter code for these assignments, so it’s relatively easy to understand how the code works in the overall pipeline.

To turn in homeworks, we used Gradescope, a company Pieter co-founded with Berkeley students. We only had to turn in PDFs of our answers, and the course staff can grade code-based assignments by spot-checking our plots. (Part of the reason why we had lots of starter code is because some of that is used to generate plots, which means that they are standardized across all student submissions.) We had page limits for our solutions, so be sure you know how to cram lots of figures together in LaTeX, such as by using minipages or subpages. Oh, I should mention: there are no solution sets to these assignments. I agree with Pieter in that there would be too much temptation for students to search for old solutions. Well, I wouldn’t search, but I’m not sure about others.

In addition to regular homeworks, we had four (!) optional extra credits, plus the final project. I only did one of the extra credit assignments, so I don’t have much to comment on those.

For my final project, I worked on a deep learning project about Atari game play, but my project ended up relating more to human learning since I analyzed data from humans playing Atari games on Amazon Mechanical Turk, and I ran out of time to integrate my findings with a Q-Learning agent. Pieter was the one who suggested this project. In fact, back in October, he and the two GSIs actually met with every project group in the class for five minutes to discuss the final project. Then, a day later, I assume Pieter sent out personalized emails to every group with project suggestions. That must have been a lot of work!

Just like in CS 280, we had project presentations, not a project poster session. That is a good thing. Single-student groups presented for 5.5 minutes. I tried to be funny by sprinkling in four jokes in my talk, and went so far as to put in a picture of Bernie Sanders in one of my slides. Unfortunately, I think my Sanders-related joke backfired since a lot of the students were internationals or were not fluent in American politics, whereas I have very strong political beliefs.

We then had to write the usual report to wrap up the project. I will warn future students: the grading for the final project is somewhat stricter than the grading for homeworks, though admittedly I think it was hard to get a really low grade on the project. Thus, to get an A, try to get at least 90 percent of the homework points, and make up for lost points with the four extra credit assignments. Pieter really makes it clear how our grades are computed, which makes the process less stressful for students who care about grades. This is in contrast with some other professors, who might not even return grades for final projects.

In conclusion, I enjoyed CS 287 and would highly recommend it to future students. Again, if possible consider taking this class concurrently with Deep Reinforcement Learning or a similar two-credit class as they would reinforce each other.