Review of Natural Language Processing (CS 288) at Berkeley

This is the much-delayed review of the other class I took last semester. I wrote a little bit about Statistical Learning Theory a few ~~weeks~~ months ago, and now, I’ll discuss Natural Language Processing (NLP). Part of my delay is due to the fact that the semester’s well underway now, and I have real work to do. But another reason could be because this class was so incredibly stressful, more so than any other class I have ever taken, and I needed some amount of time to pass before writing this.

Before I get to that, let’s discuss what the class is about. Natural Language Processing (CS 288) is about the study of natural languages as it pertains to computers. It applies knowledge from linguistics and machine learning to develop algorithms that computers can run to perform a variety of language-related applications, such as automatic speech recognition, parsing, and machine translation. My class, being in the computer science department, was focused on the statistical portion of NLP, where we focus on the efficiencies of algorithms and justify them probabilistically.

At Berkeley, NLP seems to be offered every other year to train future NLP researchers. Currently we only have one major NLP researcher, Dan Klein, who teaches it (Berkeley’s hiring this year so maybe that number will turn into two). There are a few other faculty that have done work in NLP, most notably Michael Jordan and his groundbreaking Latent Dirichlet Allocation algorithm (over 10,000 Google Scholar citations!), but none are “pure” NLP like Dan.

CS 288 was a typical lecture class, and the grading was based exclusively on five programming projects. They were not exactly easy. Look at the following slide that Dan put up on the first day of class:

cs288

I come into every upper-level computer science expecting to be worked to oblivion, so this slide didn’t intimidate me, but seeing that text there gave me an initial extra “edge” to make sure I was focused, doing work early, and engaging in other good habits.

Let’s talk about the fun part: the projects! There were five of them:

Language Modeling. This was heavy on data structures and efficiency. We had to implement Kneser-Ney Smoothing, a fairly challenging algorithm that introduced me to the world of “where the theory breaks down.” Part of the difficulty in the project comes from how we had to meet strict performance criteria, so naive implementations would not suffice.
Automatic Speech Recognition. This was my favorite project of the class. We implemented automatic speech recognition based on Hidden Markov Models (HMMs), which provided the first major breakthrough in performance. The second major breakthrough came from convolutional neural networks, but HMMs are surprisingly a good architecture on their own.
Parsing. This was probably the most difficult project, where we had to implement the CYK parsing algorithm. I remember doing a lot of debugging and checking indices of matrices to make sure they were aligned. There’s also the problem of dealing with unary expressions, since that’s a special case that’s not commonly described in most textbook descriptions of the CKY parsing algorithm (actually, the concept of “special cases not described by textbook descriptions” could be applied to most projects we did…).
Discriminative Re-ranking. This was a fairly relaxing project because a lot of the code structure was built for us and the objective is intuitively obvious. Given a candidate set of parses, the goal was to find the highest ranking one. The CYK parsing algorithm can do this, but it’s better if that algorithm gives us a set of (say) 100 parses, and we run more extensive algorithms on those top parses to pick the best of those, hence the name “re-ranking.”
Word Alignment. This was one that I had some high-level experience with before the class. Given two sentences of different languages, but which mean the same thing, the goal is to train a computer to determine the word alignment. So for an English-French sentence pair, the first English word might be aligned to the third French word, the second English word might be aligned to *no *French word, etc.

I enjoyed most of my time thinking about and programming these projects. They trained me to stretch my mind and to understand when the theory would break down for an algorithm in practice. They also forced me to brush up my non-existent debugging skills.

Now, that having been said, while the programming projects were somewhat stressful (though nothing unexpected given the standards of a graduate level class), and the grading was surprisingly lax (we got As just for completing project requirements) there was another part of the class that really stressed me out, far beyond what I thought was even possible. Yes, it was attending the lectures themselves.

A few months ago, in the middle of the semester, I wrote a little bit about the frustration I was having with remote CART, a new academic accommodation for me. Unfortunately, things didn’t get any better after I had written that post, and I think they actually worsened. My CART continued to be plagued by technical issues, slow typing, and the rapid pace of lecture. There was also construction going on near the lecture room. I remember at least one lecture that was filled with drilling sound while the professor was lecturing. (Background noise is a killer for me.)

I talked to Dan a few weeks into the course about the communication issues I was having in the class. He understood and thanked me for informing him, though we both agreed that slowing down the lecture rate might reduce the amount of material we could cover (for the rest of the students, of course, not for me).

Nonetheless, the remaining classes were still insanely difficult for me to learn from, and during most lectures, I found myself completely lost within ten minutes! What was also distressing was knowing that I would never be able to follow the question/answer discussions that students had with the professor in class. When a student asks a question, remote CART typically puts in an “inaudible” text due to lack of reception and the relatively quiet voice of the students. By my own estimate, this happened 75 percent of the time, and that doesn’t mean the remaining 25 percent produced perfect captions! CS 288 had about 40-50 students, but we were in a small room so everyone except me could understand what students were asking. By the way, I should add that while I do have hearing from hearing aids and can sometimes understand the professor unaided, that hearing ability virtually vanishes when other students are asking questions or engaging in a discussion.

This meant that I didn’t have much confidence in asking questions, since I probably would have embarrassed myself by repeating an earlier question. I like to participate in class, but I probably spoke up in lecture perhaps twice the entire semester. It also didn’t help that I was usually in a state of confusion, and asking questions isn’t always the ticket towards enlightenment. In retrospect, I was definitely suffering from a severe form of imposter syndrome. I would often wonder why I was showing up to lecture when I understood almost nothing while other students were able to extract great benefits from them.

Overall verdict: I was fascinated with the material itself, and reasonably liked the programming projects, and the course staff was great. But the fact that the class made it so hard for me to sit comfortably in lecture caused way more stress than I needed. (I considered it a victory if I learned anything non-trivial from a lecture.) At the start of the semester, I was hoping to leave a solid impression on Dan and the other students, but I think I failed massively at that goal, and I probably asked way too many questions on the class Piazza forum than I should have. It also adversely affected my CS 281a performance, since that lecture was right after CS 288, which meant I entered CS 281a lectures in a bad mood as a result of CS 288.

Wow, I’m happy the class is done. Oh, and I am also officially done with all forms of CART.