My Blog Posts, in Reverse Chronological Order

subscribe via RSS or by signing up with your email here.

Grad School Applications, Stage 3: The Online Applications

Nov 30, 2013

It seems like every prospective graduate student is using the Thanksgiving break to catch up on applications. That’s definitely been my situation; I’ve delayed things far too long (which is quite unlike me), but hopefully I have made up for it these past few days by submitting several fellowships/scholarships and creating final drafts of my statement of purpose essays. With ten schools and a variety of fellowships/scholarships to apply to, I can’t afford to leave everything to the last week before the schools’  deadlines, especially when that also happens to correspond to my final exam week!

To budget my time, I first submitted all the fellowships and scholarships that had deadlines earlier than that of any of my ten graduate schools. Then, I went to work on creating draft after draft of one school’s statement of purpose essay. Fortunately, most universities have similar essay questions, so I can just modify a paragraph at the end that is school-specific.

Once I had done sufficient work for one essay, I put that aside and then did all the “administrative” tasks by filling in the easy stuff of the online applications. This includes writing information about recommenders, writing your address and contact information, and so on.

Some thoughts as I was doing these:

  1. I did them in bulk fashion (i.e., one right after another) and did everything except upload the statement of purpose essays. I felt like that was the most efficient way to do things. Now, when I head back to school, I only have to worry about the essays.
  2. Most applications were part of a university-wide graduate school application form, so I frequently read information that was not relevant to computer scientists but would be relevant to other subject areas. This makes it a little harder on the applicant (since we have to fill in more information than is probably necessary) but it’s easier on the school (only one application website/form needed for the entire graduate school) so I can understand why schools like that.
  3. Some places want me to paste my essay into a “text box,” while others want uploaded PDF documents. I vastly prefer the latter, since there is some LaTeX stuff that I’ve included in my statement of purpose to make it look better, but maybe schools don’t want faculty to be influenced by the aesthetics of the text.
  4. Some schools weren’t specific about whether they wanted me to upload an unofficial transcript or a scanned official transcript. (One school even had contradictory information in two places of their application.) In fact, for two schools, I didn’t realize this until I had actually reached the point in the application where they asked me to upload scans. Fortunately, the registrar emailed me a PDF scan of my official transcript and that solved everything. The lesson is that it’s best to just get an official scan to not leave anything to chance.

The Problem with Seminars

Nov 10, 2013


I like the concept of seminar courses. These are typically small classes that feature active discussion and debate each session. At Williams, these courses — where enrollment is limited to 19 students to comply with U.S. News & World Report standards for “small class sizes” — make up a substantial portion of the humanities curriculum. While there are certainly many wonderful things I can say about seminars, one of my personal gripes is that they pose additional burdens to deaf students.

Here’s the problem: if students are interested and motivated by the course material, they’ll be active participants. That means class discussion will be moving at a quick pace from one person to another as people raise their hands immediately after others finish talking.

But as a deaf person who cannot really understand what my classmates say until I get the feedback from a sign language interpreter, there’s an added delay until I can get the same information. Inevitably, once I can understand what my classmates have said, another one immediately jumps into the discussion by saying something similar to: “building off of [his/her] previous point, I think that […]”.

The end result is that I’ve often felt lost in some of these discussions. Many times, I’ve wanted to say something, only to see someone else claim credit for that concept by being quicker than me at raising his or her hand. It’s not a problem that’s easily solved. There’s always going to be some sort of delay with sign language, CART, and other accommodations, but it can pose difficulty to deaf students. Since seminars tend to incorporate class participation as a large fraction of students’ grades, that factor can be a deterrent to deaf people for taking these courses.

As I think about seminar courses, I’m reminded of a particularly painful high school AP US History class. The class was divided into groups of three, and we had to debate over a topic that I’ve long since forgotten. (Each group had to defend a unique perspective.) But the main thing that I remember was that the teacher required each student to make three substantial comments in the debate in order for him or her to receive full credit.

The debate ended up being chaotic, with students shouting out their comments all over the place, often interrupting each other without restrain. (My teacher actually had to stop the class once, so we could relax and start fresh.) Predictably, I was completely lost among the commotion and didn’t see any way I could participate without sounding awkward. Eventually, towards the end of the debate, I finally made my sole comment of the day. And that was only because one of my group members (out of sympathy?) actually told me what to say! He mentioned his thoughts to me, raised his hand and then let me talk once the focus was on our group. In other words, I was just echoing his idea.

Fortunately, my teacher recognized the challenges I faced and didn’t penalize me for failing to participate in that embarrassing debate.

So how should a deaf person approach seminars? I’m not interested in asking professors to lower their grading standards (I’d be offended otherwise), though it might be wise to mention to them the delay in reception due to ASL or other accommodations, just so they’re aware. Another thing one could ask is that the professor slow down the pace of discussion. That is, if one student finishes talking, ask the professor to wait a few extra seconds before picking the next person to talk.

With respect to how class discussion proceeds, my best advice is that one should aim to be the first to comment on a class topic. That means when the professor reviews something based on homework readings and then says: “Any thoughts on this?” to the class, that’s the best time for someone like me to participate.

This situation typically happens at the start of class, so there isn’t a need to make your contribution to the class debate relate to previous comments (a huge plus!). Furthermore, professors often articulate better than students, making it easier for me to rely more on my own hearing. Finally, while this might be entirely anecdotal evidence, I’ve observed that professors are often more willing to wait a longer time when they open up a discussion than when they’re in the middle of one.

Allocating Time for the Undergraduate Thesis

Oct 28, 2013

As someone who is currently working on a computer science thesis this year, one of the things that’s really hit home lately is how the undergraduate thesis serves as a healthy “medium” between the undergraduate student and the beginning graduate student mentalities.

For one’s first few years as an undergraduate, it is expected that he or she focus primarily on courses. Research is an excellent “extracurricular” activity and should be taken seriously, but unless one does an extraordinary job — by that, I mean first-rate conference or journal publications — it is likely that students still need to perform well in courses in order to get accepted into a Ph.D. program.

Meanwhile, the beginning graduate student at a Ph.D. program suddenly needs to break away from the undergraduate mentality in order to succeed.

I’m at the point where my grades are still important, but my research is starting to become a bigger part of my studies. Consequently, I have to find sufficient time away from my “normal” classes to focus on research. It’s tempting to let thesis work slide in favor of another hour or two spent on perfecting a problem set to get that “A” grade, but it can add up. As a time-management technique, I suggest having a schedule for one’s thesis work outlined in a “problem set” format so that it mirrors what a typical science class would be like.

PS: Yes, I know I haven’t been blogging too much lately. I’m sorry.

Grad School Applications, Stage 2: Preparing Information for Reference Writers

Oct 7, 2013

Back in July, I published a post proclaiming the start of my fall 2013 graduate school applications process.

Now that it’s the start of October, I can safely say that I’m at a new stage: the point where I need to provide information to all my reference writers about my applications. Don’t neglect this non-trivial step! Letters of recommendations are probably the third most important aspect of one’s application after one’s research experience and grades (in that order), and they become extremely useful in picking out the best of the best.

Here’s what I included in my “packet” of information to my recommenders:

  1. A copy of my updated curriculum vitae. This should be something everyone does.

  2. A document that clearly outlines all of the programs to which I’m applying, as well as any other fellowships and/or scholarships. For me, this is ten Ph.D. programs, four fellowships, and three outside scholarships, and my document ended up being six pages. This includes a LaTeX-generated table of contents and a separate page devoted to an Introduction. For each application, I also included a web link, just in case my reference writers wanted some extra information. Finally, for each school, I also indicated the labs and professors that caught my interest.

I think the last point is something that — sadly — often gets glossed over when sending information to recommenders. It’s not just enough to say that one wants to study at a school; one also should have a general idea of the different research groups at an institution and which ones suit himself or herself the best.

One thing that I had hoped to include in the packet was an updated statement of purpose. Unfortunately, I haven’t found the time to get a sensible essay ready, so that’s the next thing on my agenda to send to my recommenders.

It was a relief to finally send information to my three recommenders, so now I can focus on getting my actual essays and applications. No, I’m not as far as I hoped to be in the plan I posted in July, but I’m getting there. I still have a couple of weeks before the first deadlines arrive…

How to Accommodate Technical Colloquium Talks?

Sep 27, 2013

In my last blog entry, I talked about giving a math colloquium talk. In this one, I’ll talk about attending a talk. This academic year, I have sat in three computer science talks and four other math talks.

And so far, I’ve been somewhat disappointed.

My accommodation for these talks is to use the SmartLink+ FM system owned by Williams College. During each talk, I arrive about five to ten minutes early to meet the speaker and hand over the device so that he or she can wear it. (It’s designed like a lanyard.) However, even with this as an aid, I feel like I don’t get much benefit out of these talks. I think I run into two problems: (1) getting distracted by the speaker wearing the FM system, and (2) getting distracted by the static.

Problem (1) is a mental issue. Sometimes I feel like a burden to the speaker when asking him or her to wear a device that, while not clunky or huge, is still noticeable, can swing around as he or she is moving, and isn’t designed for pure comfort. I also wonder what the other audience members think of the device. Is it distracting to them as well? Do they know that the FM is for me? When I have these thoughts, I also ponder alternative scenarios for accommodations.

Problem (2) is a technical issue. It’s well known that FM systems are great at amplifying sound, and I’m happy to benefit from that. The amplification, though, seems to result in a lot of static as an unfortunate side effect. When a speaker wears the FM system, it can rub against a shirt as he or she moves, and I hear a lot of rustling and static when that happens. In fact, at one point last spring in a machine learning tutorial meeting, I had to completely turn off my right hearing aid since the static had become unbearable. Complicating matters here is that the hearing aids I use that have the FM receivers (i.e. what I need to connect to the system) are not the same as my best pair of hearing aids, which are better at discriminating sound. Do I want to lose out on intensity to retain clarity and precision?

So these two things together seem to hinder my ability to benefit from colloquium talks. In fact, my lackluster experience during today’s computer science colloquium talk inspired me to write this entry. As I allude in the title, I’m thinking about how to accommodate a technical talk for a deaf person.

Here are some hypothetical scenarios I have in mind:

  1. Use the Smartlink+ FM System (i.e. maintain what I’m doing). Advantages: continuity, don’t have to make petitions or write more letters. Disadvantages: covered earlier in this entry.
  2. Use an alternative FM System (e.g., the Contego R900). Advantages: possibly experience less static but retain amplification. Disadvantages: would have to get used to an entirely new system.
  3. Use ASL interpreting services (as I do in my courses). Advantages: familiarity/continuity. Disadvantages: difficult to interpret technical talks.
  4. Use a captioning service (e.g., CART). Advantages: can read word-for-word on a screen. Disadvantages: would take several weeks to get set up, and would probably run into technical difficulties.
  5. Use a combination of the above accommodations. Advantages: can combine the benefits together. Disadvantages: costly, would experience diminishing returns for each addition.
  6. Use no accommodations. Advantages: easiest for me, allowing me to show up at the same time as other students. Disadvantages: will have the least amount of hearing assistance.

As one can see, there’s no substitute to having normal hearing. Different forms of accommodations have their pluses and minuses, and it’s up to the individual and his or her institution to come up with a reasonable plan. I’m still not entirely sure what’s best for me, but hopefully I can come up with some firm decision soon.

My Williams College Mathematics Colloquium Talk

Sep 17, 2013

Williams College requires that all senior math majors give an acceptable 30-minute colloquium talk on the topic of their choice. Virtually all seniors who give their talk “pass” as long as their topic is relatively new, interesting, and isn’t nontrivial. The senior majors have to attend 20 of these student talks, not including their own, so the typical audience consists of other students as well as most of the faculty.

Back in May, I volunteered to give my colloquium talk early, and I was fortunate that the colloquium chair assigned me to be in the first student slot. (It’s always nice to set the trend!) For my colloquium, which I just delivered today, I chose to talk about probabilistic graphical models (PGMs). It wasn’t a difficult decision for me to pick this topic. This past summer, as part of my “moral duty” as a machine learning student, I skimmed a wide variety of recent articles published by the highly prestigious International Conference in Machine Learning. Many of the articles I read incorporated PGMs, and there was one article in particular that struck my eye: using PGMs in crowd-sourcing to grade a test without knowing the answers.

That got me a little interested in PGMs, so I read a little more and learned that these are often considered part of the intersection between computer science and statistics. Effectively, these are graphs that describe their own probability distributions (incorporating statistics) by representing nodes as random variables. By exploiting graph theoretic algorithms (incorporating computer science), it’s possible to efficiently model a scenario that might otherwise be too intractable to analyze directly, e.g. in medical analysis when we’re dealing with thousands of random variables. Needless to say, I figured I should explore PGMs in depth, both for my computer science senior thesis and for my colloquium.

Thus, my colloquium talk first gave an introduction to PGMs, and then described the application in crowd-sourcing as described in the paper I linked to earlier. If you’re interested in learning more about these, feel free to check out the slides I used for my talk. You can view them here. (Side note: for something as important as this, always have at least one backup of your slides … try using Dropbox.) Have fun with PGMs!

Clarifications to Mor Harchol-Balter’s Ph.D. Advice

Sep 1, 2013

Professor Mor Harchol-Balter (hereafter, Professor H-B), a faculty member in the School of Computer Science at Carnegie Mellon University, has written a well-cited report about applying to Ph.D. programs. While I agree with almost everything she writes, I figure it’s worth clarifying or modifying a few portions of the report; I’ve listed seven of my suggested clarifications/modifications in this post.

As a disclaimer, I obviously have never been part of a graduate admissions committee before, while Professor H-B presumably takes part in this every winter. What I write in this post is simply my own opinion based on my countless hours of reading about graduate school.

Clarification/Modification #1: Funding the Ph.D.

Professor H-B writes:

Important note 2: There are many companies and government organizations which offer Graduate Fellowships for Ph.D. students. If you are lucky enough to get one of these, they will cover your full way through graduate school, and you will never have to worry about whether your advisor has funding or not. Details about graduate fellowships will be discussed in Section 4.

From what I’ve read, most fellowships (e.g. the NSF) last for at most three years. Thus, it seems like the typical scenario for a fellowship recipient is that he or she uses the money for some number of years, then after it expires, he or she must find some other source of funding. It is extremely rare for students to complete Ph.D.s within three years; the few examples of fast Ph.D.s that I know of (e.g. Frank Morgan) were in the field of mathematics, where it can be enough to discover a convoluted proof to a question.

Clarification/Modification #2: “Top” Ph.D. Programs

Professor H-B writes:

Since my view is that of the top-ranked CS programs, my description below will follow the perspective of those schools. By a top-ranked program, I’m typically talking about a Ph.D. program ranked in the top 10 or so.


As I’ve said earlier, to get into a top graduate school you need prior research experience. This is not necessarily true for schools below the top 10, or maybe even the top 5.

Her article is written and directed at students who aspire to study at the top programs. I personally would extend the "top" part (in both blockquotes above) to be perhaps "top 25-ish" or so, because I’m pretty sure the schools ranked ~10 to ~25 value research experience to a comparable extent as do the very top schools. Also, there are some subfields of computer science that lower ranked schools might specialize in, which may not be reflected in their overall ranking. One example might be The University of Pennsylvania (currently ranked #17 overall) and their stellar programming languages group.

Clarification/Modification #3: Computer Science GRE

Professor H-B writes:

The subject exam – If applying to a CS Ph.D. program, you should probably take your subject exam in CS, Math, or Engineering. Check with the school you’re applying to.

As I mentioned earlier, there is no such thing as a computer science GRE subject test now. Her article was written in 2011, while the CS exam was terminated in 2013, so she’ll almost certainly remove this part during her next update.

Clarification/Modification #4: Getting Research Experience

Professor H-B writes:

As an undergraduate, you can apply for a summer internship at a research lab or another school. I did this. Type “summer internships for undergraduates” into Google and you’ll be amazed how many opportunities there are.

I personally would be more specific and provide the link to this page, which has a listing of most (if not all) current NSF-sponsored computer science REUs. Alternatively, you can try searching within your school if you’re at a research university.

Clarification/Modification #5: Asking for Recommendations

Professor H-B writes:

Asking for a letter of recommendation won’t be a problem if you have been doing research with this person, but that won’t be possible in every case. Here’s a guideline which will maximize the contents of your letter. This works on the theory that professors have very little time and little memory (both of which are good assumptions):

She then recommends preparing a packet of materials for the professor, including materials such as a statement of purpose, a photo of you, etc. Most of the advice is straightforward and is what one should definitely do (e.g. the statement of purpose). If, however, a recommender needs a photo of a student to write an effective recommendation, then it’s likely that he or she doesn’t know the student well enough to write a solid letter anyway.

Clarification/Modification #6: Why to Apply for Fellowships

Professor H-B writes:

Even before you decide which schools you want to apply to, you should pick out which outside fellowships you are eligible for and apply to all of these. I myself applied to 5 outside fellowships. Many outside fellowships require a U.S. citizenship, so not everyone is eligible. There are at least 4 reasons to apply for a fellowship:

Her four reasons are (1) prestige, (2) funds graduate school, (3) makes a more appealing applicant, and (4) to avoid being a fool. Her argument for (3) is based on schools accepting you after you receive a fellowship. By that time, however, it’s usually April or May, and this doesn’t give you enough time to visit (or even think about) the school. My point here is that Ph.D. programs typically tell students if they are accepted in February, and students have an April 15 deadline to select their school. Suppose a student applied to school X and doesn’t hear back, which typically means a rejection. But on April 10, he receives a prestigious fellowship, and school X accepts him on April 11 upon figuring out the news. But that leaves just a handful of days for the student to consider the offer, and doesn’t allow a visit, etc. Quickly accepting an offer from them could be a risky decision.

It’s worth mentioning that Professor Philip Guo also has written advice on why to apply to fellowships. His additional reasons include (1) practicing writing, and (2) your research advisor will make you apply anyway.

Clarification/Modification #7: Ranking of the Department

Professor H-B writes:

Consider the overall ranking of department. This is important only because it determines the average quality of your peers (the other graduate students). Your peers are the people who will teach you the most in graduate school.

While I definitely agree with this (and others do, see e.g. Jeff Erickson’s post), and am also pretty sure that Professor H-B wasn’t being too serious in this writing, I can’t believe that the ranking of a department is only important due to the quality of the students. For instance, the average professor at a top school will have more grant money and productive research projects than the average professor at a mid-tier school. For instance, I remember reading a blog post by a former Ph.D. student at Berkeley who recalled that his advisor had a “seemingly endless supply of money.”

Is There a Useful Compendium of Advice for Deaf Students?

Aug 25, 2013

As I was just getting the urge to start writing more about machine learning and theory of computation, I had this nagging thought:

Is there a useful compendium of advice for deaf students that discusses how to navigate through their undergraduate, and potentially graduate, experiences?

Here’s why I’m curious. I read advice aimed at computer science Ph.D. students all the time. One only has to browse websites of computer science professors and Ph.D. students who have blogs to find short advice articles such as how to manage your advisor. Also, guidance obviously isn’t limited to blogs. Computer science professors Michael Ernst (Washington) and Tao Xie (Illinois) have compiled quite a bunch of writings by themselves or others that may be of interest to computer science Ph.D. students.

So is there something similar for deaf students? By that, I’m referring mostly to American college and graduate students. Wouldn’t that be a great resource for younger students, so that they might read and understand how older students have survived (or failed) the journey?

Unfortunately, I don’t think there’s much advice out there, but there’s a chance I’m wrong. One of the problems when trying to search this is that if you type in “Advice for Deaf Students” in a search engine, most of the pages that show up are actually aimed at teachers, and provide tips and suggestions for working with and accommodating such students.

Keep in mind that I’m trying to look for sources similar to the ones that Professor Ernst or Professor Xie have on their homepages. Short essays by the random deaf person here and there are fine, but has anyone actually done a search for this and compiled a list together? If not, then I would be interested in starting one. I know I can’t do it all myself, since I start too many projects each year that I fail to finish, and general-purpose advice should rarely be written by just one person. It would be interesting to see if this idea could become a reality.

Programming with Java, Swing, and Squint

Aug 22, 2013

Since I’m going to build a complete compiler for a subset of the Java programming language this fall in my Compiler Design course1, I thought it would be prudent to at least look through an introductory Java book over the summer. I want to make sure I didn’t forget much in the one-year period from last summer to this one, where I didn’t touch Java at all. Upon some thinking and exploration, I decided to do a speed read of the 337-page book Programming with Java, Swing, and Squint. (You can find the entire book online, in the link that lists all the chapters.) Having just finished the book, I can offer some of my comments.

This book was written by a Williams College computer science professor and is intended to be supplementary reading material for our introductory programming class. Since I never took the first (or second!) courses in the typical sequence for the CS major, this might help me fill in some holes in my knowledge as compared to other Williams College students. According to what I’ve read and heard, our first CS class is a straightforward intro-to-Java course, with a special emphasis on the understanding of networks and digital communication.

Programming with Java, Swing and Squint therefore does not assume any prior Java knowledge and is relatively easy to read. For those who are wondering, Swing is a widely-used application programming interface that one can import into Java code (put the line import javax.swing.*; at the top of your code) to provide graphical user interfaces (GUIs) for user programs, and Squint is a library specifically designed for the Williams intro CS course.

The book doesn’t waste any time in introducing the reader to small programs that construct simple GUI interfaces. At first, the author makes the necessary claim that one has to accept certain incantations of Java as “magic words” that must be included in code in order for it to run, e.g. the “public class XXX” text, but he explains this stuff in the appropriate (and humorly-named) subsequent chapters. Even the other examples in the book — email interfaces, building a calculator, etc. — are simple enough yet comprehensively presented and introduce the programmer to a variety of concepts, including (but not limited to) primitives, objects, classes, methods, control structures, loops, recursion, and arrays. While there is at least one substantial program for each chapter, the book doesn’t include any programming exercises or exercises/solutions. Thus, readers may find that they need to search elsewhere for additional programming tasks.

My final opinion: this can be useful as a first book for someone who has no programming experience but is interested in writing Java scripts immediately. Even a more advanced programmer can probably use this book for the purpose of figuring out how to explain a programming concept to a complete newbie. There were many facts about programming and Java that I subconsciously knew but couldn’t explain clearly to a layman before today. Thus, I’m happy to have read this book even though I already knew almost all of the material.

  1. Side note: I only have three classes this semester: Compiler Design, Artificial Intelligence, and Complex Analysis. My fourth “class” is actually my senior thesis, and my fifth “extra class” consists of the graduate school applications. 

My Pre-College Education as a Deaf Mainstreamed Student

Aug 8, 2013


I probably have an unusual pre-college education compared to most Williams College students, so I thought it would be interesting to share my experience.

Pre-School and Elementary School

I know I participated in some sort of pre-school education, but obviously my memory is quite fuzzy here. I was in a program where I’d attend a few sessions a week with other deaf and hard-of-hearing (DHH) students and had teachers who knew sign language. I’m not sure how much “learning” went on, since it’s pre-school.

Then came elementary school. But before I discuss that, I just want to briefly point out the concept of mainstreamed education. As I was a mainstreamed student, I can provide my own definition in the context of deaf and hard of hearing education: this means we take part in regular education with most of the other (hearing) students in our grade, but will occasionally sit in “special education” sessions that specifically cater to our needs. Typically, we’ll have some sort of accommodation in the regular classes, such as a microphone/FM system, while the special education classes have no need for them since the teachers are trained to teach such students.

Anyway, while in elementary school, It was here that I believe I first experienced the distinction of having the two styles of classes. I was assigned to be in regular courses for all my “core” classes (English, Math, Science, and Social Studies) along with a few other DHH students my grade, but I also took part in sessions designed specifically for DHH students. For instance, I took speech and social work sections, which teach skills that are harder (on average) for deaf students to acquire as compared to hearing students. A quick note: some DHH students take their core courses in the special education classes, so they essentially receive all their education there. It all depends on the student’s education plan.

My elementary school was unique in that it actually had these kind of special education classes. Most schools don’t, which means many DHH students are forced to take long bus rides to an appropriate school. I was one of them during elementary school, but as far as I can tell I was better off than some of the students, especially those who had to undergo two hour rides each day to and from school (four hours of being on the bus a day!).

Middle School

My split between taking part in regular and special education classes continued in middle school, but with a more skewed focus to the regular classes. This is necessary, after all; while special education classes are useful, they almost always can’t provide as much material as a regular class.

At the time I was a student, my middle school had nine forty-minute periods in a day. One of these was a daily “tutorial” period where students don’t have a class and can focus on their work (or play games). My tutorial room was located in the same room where most of the other DHH students took classes. Unfortunately, the tutorial period wasn’t standardized for students; in other words, my school essentially divided the students into nine groups, each with their own specific tutorial period. This limited the time I could interact with other DHH students, since it was rare that we would have the chance to meet in the tutorial room at the same period. They could also be in the middle of a class even if they were there, further restricting socialization.

There were a few DHH students in my grade, and in an effort to make efficient use of interpreters (and other resources), my school auto-assigned us to be in the same classes, so at least I wasn’t completely alone in those classes.

I continued to take speech and social work sessions throughout middle school, in the same classroom as the tutorial room for all DHH students, but I never took any academic classes there. As I mentioned earlier, I did have about two or three other DHH students in my core courses as well as a few secondary ones, such as Health, Music, and Physical Education.

The process of taking important examinations was also unique for me. If there was a normal test made by a teacher, I would take it in class with the other students. But for state-administered exams, I would take it in a separate, private room with an interpreter by my side in case I needed to listen to instructions. Obviously, they weren’t allowed to actually take the exam for me. It was pretty convenient for me, since I didn’t have to worry about the distraction of other students.

High School

At the time I was a student, my high school was designed so that there were four kinds of days (A, B, C, and D), and we would cycle through them during the academic year. On “B” and “D” days, during the second of four designated, 85-minute “blocks” of the day, there would be an “advisory” period, which is basically like study hall — students are assigned to a class, but there’s not going to be a lecture, so we can work on whatever we want.

Naturally, being a DHH student, I was assigned to be in the same advisory classroom as the other DHH students. This was much better than the situation in middle school, where my tutorial period wouldn’t coincide with the tutorial periods of other DHH students. While my advisories were often filled with work — I was regularly juggling several Advanced Placement classes at a time — I occasionally found time to play several rounds of chess and other games with other DHH students. The advisory period was also useful for organizing activities among the DHH students, since we were all together in one period. We would sometimes have special days that included an annual picnic, a trip to an amusement park (e.g., this one), and food provided during holiday seasons.

I still took some speech and social work classes and most of my midterms, final exams, and state-administered exams in a separate room. But after an agreement with my teachers, I no longer had to take speech and social work classes. I had about ten years of those classes, and we all agreed that further improvement due to these sessions would be negligible.

Finally, in high school, we had more freedom to pick their own schedule. Thus, I was no longer guaranteed to have other DHH students in my classes. In fact, I think the only true high school class I had that included other DHH students was physical education.


I was fortunate to live near a school district that was able to effectively provide me with what I needed in order to perform well in school. I’m now a student at Williams College, where there are no other deaf students, so at this point, I’m basically “on my own.” The transition from a mainstreamed pre-college education to a mainstreamed/hearing college is now complete.

No More Computer Science GRE Subject Test Exam

Aug 1, 2013

A while back, I said I was planning on taking the computer science subject exam for graduate school. I knew it wasn’t going to be too much of a big deal for my application, but it would at least give me an extra data point.

Of course, I didn’t realize that I was actually behind the times. The computer science GRE subject test is no longer offered; the last time it was administered was in April 2013. The following rationale is from the Educational Testing Service (ETS) website.

Over the last several years, the number of individuals taking the Computer Science Test has declined significantly. Test volume reached a point where ETS could no longer support the test psychometrically. As a result, the GRE Program discontinued the Computer Science Test after the April 2013 test administration. Scores will continue to be reportable for five years.

All I can say is that I’m relieved, since the test wouldn’t have helped me that much and it saves me the studying time. Furthermore, these subject tests tend to be more helpful for those applications who either (1) don’t come from a top school, or more importantly, (2) didn’t major in computer science. Since that doesn’t describe my scenario, I didn’t need to depend on the subject test at all.

There are others who are perfectly fine with seeing the test discontinued. Such viewpoints are present in, for instance, this blog post.

Of course, I’m just as guilty of bias as anyone else. Someone who didn’t major in computer science will probably disagree with me. Also, I’ve heard that foreign students made up much of the high scores on the exam, so this may hurt them a little. (But my knowledge here is sketchy.)

Regardless, though, all this really means is that we can get back to our research.

Recap of the 2013 Algorithmic Combinatorics on Words REU

Jul 28, 2013


Yesterday, I arrived back home after spending the previous eight weeks at the 2013 Algorithmic Combinatorics on Words REU. It was a great experience overall, so I thought I’d share a bit about what happened.

The Experience

I arrived in Greensboro, NC, on June 2, and was greeted by one of the research assistants (RAs) and a few other student participants who had arrived at roughly the same time. The RAs had generously offered to drop us off at our apartments, and they also assisted us in getting settled during the first day by providing keys, taking us out to dinner, etc. The following morning, we met our REU coordinator, Professor Francine Blanchet-Sadri and went through a typical orientation process. (She gave me permission to address her on a first-name basis, so hopefully it’s okay if I use “Francine” in the rest of this post.)

One of the unique things about this REU is that there’s only one faculty advisor here (Francine) who advises all the student research groups. From what I know, most REUs are structured such that several faculty members offer their own projects, and students have to apply to the REU while ranking them according to preference. The faculty members also typically advise only one team of students. At UNCG, all students and the RAs (who conduct their own research here as well) essentially work in the field of algorithmic combinatorics on words in teams of two, though some individuals paired together may agree to split and work by themselves.

We spent the first few days going over background material in Algorithmic Combinatorics on Words and listening to Francine (or the RAs) give seven talks about different subfields in which we could perform research. After we were through with the background material, the fourteen of us — eleven student participants and three research assistants — ranked the seven projects and attempted to match up the groups as fairly as possible. After some unlawful coercion gentle negotiation, we eventually settled on an alignment of two students to each of the seven possible topics. Note that Francine demanded that all seven topics be used, so we had to make sure that there were people working on the less popular topics. I was assigned to work with another student on the topic entitled Abelian and Subword Complexity.

As it turned out, we only seriously investigated abelian complexity. I won’t get into too much depth here, but I figure it can’t hurt to at least give an extremely basic introduction. If we define a word \(w\) to be a sequence of characters over some alphabet, then the abelian complexity of that word with respect to \(n\), denoted \(\rho_{w}^{ab} (n)\), is simply the number of abelian equivalence classes of subwords of length \(n\) in \(w\). Two words are abelian equivalent (and thus in the same equivalence class) if and only if each letter in a given alphabet shows up the same amount of times in the two words; for instance, words \(x_1 = 010\) and \(x_2 = 100\) are abelian equivalent since both have two 0s and one 1, but \(y_1 = 011\) and \(y_2 = 001\) are not abelian equivalent. With this in mind, suppose we have the word \(w_0 = 010110\) over the binary alphabet \(A_k = \{0,1\}\). We have \(\rho_{w_0}^{ab} (4) = 2\) because we can form a length-4 subword of \(w_0\) with using two 0s and two 1s (e.g. 0101) or one 0 and three 1s (e.g. 1011). To make things more interesting, I investigated infinite words, but that’s a topic for another day.

The first few weeks were primarily devoted to background reading provided with the set of notes Francine compiled for our particular research topic. Even though we mostly read papers with the dreaded “in progress” label (in other words, they’re littered with typos and confusing English), the reading wasn’t too bad, and I began brainstorming a bunch of ideas and possible avenues for research.

There was a major open conjecture posed in one of the longer papers I read, and during the third week, I felt like I began seeing ways to prove it. Thus, I spent days putting my ideas into writing and verifying them with a variety of my own Python scripts.

The problem, of course, was that there was always at least one case/example that wouldn’t work.

I suspect I’m not the only one who got roadblocked this way. I came up with idea after idea, but my programs came up with counterexample after counterexample, and eventually, I had to choose between (a) splitting my already numerous cases into smaller cases with little hope that I could cover all of them, or (b) abandon the conjecture for now and move on to a different topic.

Fortunately, my research partner actually knew what he was doing, and during the time I had spent trying to solve the open question, he had found some interesting patterns regarding abelian complexity in a certain class of words. For instance, with the help of Mathematica, he showed me graphs of abelian complexities for infinite words that resembled fractal patterns. We soon made his findings our primary research focus and dedicated ourselves to explaining why these graphs showed up the way they did, and if there was an efficient algorithm to actually compute abelian complexities.

Over the next few weeks, we would prove a variety of lemmas, come up with additional conjectures, create almost a hundred Python scripts, and write up our results in what would eventually become a 25-ish page paper that we gave to Francine just before the conclusion of the program. Thus, what started out as a bunch of pictures eventually turned into algorithms and mathematically rigorous theorems. I never did prove the conjecture I first worked on, and after contacting the person who actually formed the conjecture (he was in the REU last year), it seemed like he had tried a similar version of a proof — albeit on a smaller scale — that I had done but failed as well.

Overall, I believe this REU really gives students a sense of research beyond the stereotypical advisor-student relationship. Previously, my research — even at another REU — mostly consisted of the following cycle:

Faculty Advisor: “Do task X”
Me: “Yes, sir/ma’am”

Instead, it was more like we really got to look at what we wanted to explore. Heck, one student somehow made heavy use of complex analysis in his research. I still haven’t been able to figure out the connection.

My learning obviously wasn’t just limited to algorithmic combinatorics on words. For instance, I found out that I’m as clear a type one personality as it gets (though I was close to testing as a type six), that my diet is incredibly strange, that machine learning is more important to the national government than theory or algebraic geometry, and a whole host of other things. (Actually, I suspected these were true prior to coming, but my experience there all but confirmed them. Also, I don’t know anything about algebraic geometry.) I do hope, though, that I was able to teach the other students as much as they taught me.

Other Thoughts

One of the defining features of this REU is that it’s heavily structured. The work day is six days a week from nine to five daily, with Sundays off. (Note to future/prospective REU students: If you plan on setting aside a full weekend for your own non-research related activities in the summer, then this program isn’t for you.) Students are expected to be in our designated classroom by 9:00 AM each morning. At that time, Francine comes into the room and briefly meets with each group to discuss their progress and to provide feedback.

The classroom itself is where many students do research, which reminds me of Google-style facilities where there are no individual offices or labs. There are tradeoffs to this kind of “open plan” work environment; it allows for greater interaction among groups and quick feedback, but it can get distracting at times. Fortunately, there’s a computer science lab nearby where students can work if they want a more serene environment. I sometimes worked there even though I could have simply turned off my hearing aids in the classroom and not get distracted. (Turning off my hearing aids presents a whole host of problems.) Even if one wants complete isolation, there are so many accessible classrooms in the science building that this objective is not difficult to accomplish.

Saturdays are unique work days, with REU-sponsored pizza for lunch and typically some sort of event, such as a presentation on LaTeX. During the end of the fourth and eighth weeks of the REU, we all convened together to present our research. This consisted of about seven fifty-minute talks, so it does consume the full work day. There is a coffee machine in the classroom, so unless you’re like me and hate coffee, you should get enough caffeine to stay focused.

Surprisingly, this REU isn’t all about work. At least from my experience, the RAs and students formed social activities such as hiking (see the image at the top of this post), card games, movie nights, and dinners. We had a Facebook group to help in this regard, which was especially useful since the fourteen of us were divided among five different houses, which were not next to each other. Incidentally, housing quality will obviously depend on whichever house you’re assigned to live in. I was probably assigned to the worst one, but I still had a decent-sized bedroom and a functioning bathroom/kitchen, so I survived.

The campus and nearby area is decent enough. There are plenty of restaurants and eating places near the working area, which means it’s possible to go out for lunch at Subway, Jimmy Johns, Thai food, etc. There are also a variety of fields, basketball courts, and a gym where people can participate in sports and other activities outside of work. The weight room — located in the student recreation center — isn’t that bad, since it actually has a power rack. Yes, it has its share of bozos who spend their entire gym sessions doing curls and who can’t squat correctly, but I did meet two other guys there who actually knew a thing about barbell training. And I was even able to convince another REU student to join me in the gym. (Here, a one out of thirteen ratio is impressive.)

I’m probably forgetting a whole host of other things I wanted to write about, but I think the above summarizes some of the interesting things about the REU.

Good luck to everyone who went and I wish you all the best.

Machine Learning (Part 1 of 4): Introduction

Jul 24, 2013


This summer, I’ve spent a good amount of time analyzing the content of my blog. By looking at the composition of my computer science entries, I realized that I don’t talk about the subject material in my classes a whole lot. Most of my posts in that category are related to programming, research, and other areas. I do have four course-related posts thus far in a “theory of computation” series, which you can access by looking at the recently-added directory of blog entries1, but other than that, there’s honestly not much.

I’m hoping to change that as the summer turns into fall. One way is to revive my theory of computation series, which is motivated in part because I’m going to be a theory teaching assistant this fall. Entries are currently being drafted behind the scenes.

And another way is to introduce a new series of posts relating to one of my favorite classes at Williams, machine learning. This is also the subject area of my senior research thesis, so I’ll definitely be committed to writing about the subject. This will be a four-post series, with this one being the first.

This post will give an introduction the field and, along with the second post, will discuss the variety of learning algorithms (I’ll explain what these are later) that are commonly studied in machine learning. The third post will involve analyzing the advantages and disadvantages of the learning algorithms and discuss scenarios where some may be preferable over others. The fourth and final post will discuss some of my possible future research in the field.

Introduction to Machine Learning

So what is machine learning anyway? First, let’s go over the corresponding Williams College course description:

Machine Learning is an area within Artificial Intelligence that has as its aim the development and analysis of algorithms that are meant to automatically improve a system’s performance. Automatic improvement might include: (1) learning to perform a new task; (2) learning to perform a task more efficiently or effectively; or (3) learning and organizing new facts that can be used by a system that relies upon such knowledge.

At the heart of machine learning, then, is dealing with the question of how to learn from data. After all, our goal in this field is to figure out how to train a computer to adequately perform some task, and those almost always involve some sort of data manipulation. Possibly the most ubiquitous such “task” in machine learning is classifying data. The canonical example of this is separating spam email from non-spam email. Somehow, someway, we must use our vast repositories of spam and non-spam email to train an email client how to detect spam email with high precision and recall. That way, we can be reasonably confident when deploying it in the real world.

Needless to say, this is an important but inherently complicated task. Sure, there are some emails that are obviously spam, such as ones that are filled with nothing but dangerous URLs and non-English text. But what about those kinds of emails where someone’s writing to ask you about money? Most would consider those as spam, but what if a relative was actually serious about asking money, but without knowing it, wrote in a style that was similar to those guys from unknown countries? (Perhaps the relative doesn’t use email much?) Furthermore, we can also run into the problem of ambiguity. If there exist perplexing emails such that even knowledgeable human readers can’t come with a consensus on spam vs non-spam, how can the computer figure out something like this?

Fortunately, with email, we won’t usually have such confusion. Spam tends to be fairly straightforward for the human eye to detect — but can the same be said for a computer? The key is to take advantage of existing data that consists of both spam and non-spam emails. The more recent the emails (to take into account possible changes over time) and the more diverse the emails (to take into account the many different writing styles of people and spam engines) the better. We can take a large subset of the data and “train” our email client. We assume that each email will have a label stating whether it is “spam” or “not spam” (if we relax this assumption, then things get harder — more on that later) and we must use some kind of algorithm to teach the client to recognize the common characteristics of emails in both categories. Then, we can take a “test” set, which might consist of all the remaining data that we didn’t use for training, and see how well the email client performs.

The advantage with this approach is that, since we assumed the data are labeled, we can judge and analyze the results, taking into account not just basic factors — such as percentage of emails classified correctly — but also if there are any trends or patterns that might give us insight as to when our learning algorithm works and doesn’t work. We can continue to modify our learning algorithm and its parameters until we feel satisfied with its performance on the testing set. Only then do we “deploy” it into the real world and watch it in action, where it has to deal with unlabled email.

In fact, a good analogy of machine learning in the context of humans seems to be sports referees. These people have to undergo a period of education and training before they can get tested on some “practice” games. They will then get feedback before moving on to the more serious competitions. Current NBA referees, for instance, might have been trained via this simple algorithm: “Read Book W, Pass Written Exam X_1 and Physical Exam X_2, Referee Summer League Game Y, and if performance is satisfactory, Referee Actual NBA Game Z.”

Hopefully this makes sense. As the previous example and general concepts imply, machine learning can make an impact in many fields other than computer science. Statistics, psychology, biology, chemistry, and many other areas have benefitted from machine learning tactics. In fact, such learning algorithms are even used in fraud detection.

Now let’s move on to some more formal definitions.

The Problem Setting

We have a computer capable of performing classification, which is the process of assigning a given category to each element in the data. The specific categories may or may not be known to the learner, but in general, knowing the categories ahead of time makes for far easier machine learning. A learning algorithm is something that can be used to help a machine (i.e. a computer) better perform a task when given data. “Better perform” can obviously mean different things depending on the circumstances or evaluation methodology used, but for the sake of simplicity, let’s suppose we’re only focusing on accuracy, or correctness.

To carry out the machine learning and evaluate performance, we’ll need some data in the form of feature vectors, which store the relevant attributes of our samples, and usually includes its class label. For instance, with the email example earlier, the vector might include attributes such as the number of characters present in the index zero, then the number of words present in index one, and the email domain in index two, and so on. Attributes can be real-valued or categorical. One element in the feature vector — possibly the last one — might be reserved for the true classification of SPAM vs NON-SPAM. The machine will then use these feature vectors with a learning algorithm to build a learning model.

There are multiple ways of performing this learning. Three common methods are supervised learning, unsupervised learning, and reinforcement learning. Supervised learning involves the use of labeled training data to build a clear model for output, while unsupervised learning has unlabeled training data and generally performs tasks such as clustering (i.e. identifying similar elements). Reinforcement learning is when a grade is given to some output. This allows the learner to know what’s going right and wrong. A good analogy is when a young child touches a radiator and gets burned. He will typically learn from his error and avoid touching radiators in the near future, even if they are not actually hot.

My machine learning class did not discuss reinforcement learning, so for now we can focus on supervised and unsupervised learning.

To allow machine learning to happen in supervised learning, it is common to divide our data into training, validation, and testing sets.

  1. The training set’s primary purpose is to build the learning model that the machine can utilize to classify future examples. The ideal training set is large, diverse, and is accurately labeled, which might involve humans hand-labeling the data.

  2. Validation sets are used to check how well a model has performed before we move on to testing. We may have multiple approaches and might use our validation set to pick the top candidates or slightly modify some parameters.

  3. Testing sets tend to be used to officially evaluate the performance of our proposed learning model. The learner is generally not going to have access to these elements to build the model, since that would defeat the point of testing.

There are different ways to partition data into those sets. It is common, in my experience, to simply combine the validation and testing sets, but the validation set is used enough to make it worth mentioning. If we have very little data, then we might consider omitting the validation set, or perhaps even treating the entire data set as both training and testing as a last resort. This is not desirable because we want to train a machine to perform well on the entire distribution of relevant data, not just our own samples, so there’s a danger of overfitting. In other words, we build the model so tightly towards our present data that it fails to generalize to the larger population.

On the other hand, unsupervised learning deals with clustering. The goal here is to find groups of examples that are similar to each other but distinct from other groups of examples. We’ll get to this more when I discuss clustering algorithms.

Learning Algorithms

I believe the easiest learning algorithm to discuss is decision stumps, since it has just one clear component. We pick an attribute and associate a rule to it. If it’s categorical, then we can have multiple groups for each of the possible values for that attribute, and assign elements accordingly. If it’s real-valued, we often associate a threshold to it and divide elements based on that rule. For instance, if we have real-valued data such as the number of words in an email message, we might set a threshold of 500 words. All emails with fewer than that quantity are spam, and all emails with at least 500 words are not spam.

That’s it! Obviously, in our particular example, this is a terrible classification. The simplicity of decision stumps is one of its major drawbacks, since we have to rely on one single attribute to make our choice of classification; many times, it is unreasonable for this to result in an acceptable classification. On the other hand, the fact that it’s so simple means we can easily explain this model to a group of non-technical people. Don’t neglect this important fact! Scientists and mathematicians must know how to communicate with people from a variety of fields.

In the next post, I’ll discuss an obvious extension of this problem to decision trees, which are not restricted to classifying after just one decision.

  1. This was removed when the blog migrated to Jekyll in May 2015. 

Grad School Applications, Stage 1: The Quiet before the Storm

Jul 20, 2013

I’m about to enter my senior year at Williams College, and my goal is to pursue a Ph.D. in computer science directly after graduation. Thus, I have to write some graduate school applications.

Since this seems to be a topic that interests many college students across the country, I thought it would be interesting to show readers how I progress through this crucial stage of my life. Perhaps this will be informative to the random student who happens to come across this blog.

Also, since I haven’t actually started the applications, it makes sense to write now so that it’s ultra-clear what I was thinking, planning, etc. Hence, this post is called “Stage 1.”

Now, in computer science, zero is the new one, so we tend to start numbering from zero. But it doesn’t make sense to do that for this blog entry. In my opinion, “Stage 0″ consists of everything one does before the application season: doing well in computer science courses, getting solid research experience, reading about and understanding graduate school life, GREs, etc. Obviously, anyone who hasn’t done most of these and wants to pursue a C.S. Ph.D. now is pretty much screwed.

For me, though, I’m almost through with Stage 0. I think I’ve done fairly well at Williams so far, and I have some research experience. I’ve also read some writings that I found extremely helpful to me; two of the best are Professor Philip Guo’s PhD Grind, and Professor Mor Harchol-Balter’s PhD advice. Finally, I took the general GRE back in April, so I’m good to go with that. I have not taken the subject test, though … and I’ll probably take it anyway, even if some schools don’t require it (more on that later).

Thus, I’ll define Stage 1 (i.e. right now) as the process of determining where to apply and setting a schedule for completing the application materials.

First and foremost, I hope to attend a well regarded computer science department. Sure, I can take the C.S. rankings straight off of the U.S. News & World Report, but I need to be careful not to pick a school because of its overall prestige, only to realize that it’s not as strong in my projected research areas as it is in other fields. (Even worse is a school that has great overall prestige, but has a virtually nonexistent computer science department.)

Context matters. As an example, I once knew a guy who turned down an offer from a top four school to go to one that was ranked well outside the top ten. I thought he was crazy — until I realized that the school he went to was extremely strong in his research area.

So what are the benefits of attending a prestigious graduate school institution other than the prestige? Professor Jeff Erickson suggests that one reason is the average quality of the graduate students. It makes sense that the better the graduate students, the more they can help and motivate each other to advance the field of computer science. (Of course, it also helps if they’re not enormously cut-throat!) The professors at the top school will also be leaders in their field, but I need to be careful again here because a famous professor does not imply an excellent advisor. Is it possible to gain knowledge on an advisor’s effectiveness by investigating the career paths of his or her Ph.D. students?

The strength of the department is clearly going to be my primary factor in graduate school. But there are a few other factors to consider. One is the location; I’m probably going to be happy in a place that’s not too rural nor right in the middle of a city. If I had to choose one of the extremes, I’d opt for the urban environment, and one of the reasons is that in a larger city, it’s probably easier for me to secure accommodations. In fact, I’d suggest this is true for anyone with a documented disability. A city also makes it easier to have direct fights instead of time-consuming stoppages, and would give me a break of mountains and forests after spending four years in Williamstown.

Anyway, that’s enough wishful thinking and non-application stuff. Right now I really need to obey the following schedule:

  1. Finish first drafts of applications by the end of August
  2. Study for the computer science subject test from the period of mid-August to mid-October, and take the exam sometime then or shortly after
  3. Secure letters of recommendation by the start of September, and give the recommenders all the relevant information about me by some scheduled date
  4. Finish second drafts of applications by the end of September

I figure it can’t hurt to at least take the computer science GRE subject test. If a school requires it, then I’ll have done it. If not, then I can still see what areas I need to study in further detail.

It’s Time to Ditch PowerPoint and Word in Favor of LaTeX

Jul 12, 2013


The Big Idea

I’m surprised I didn’t do this earlier, but since I was planning to do so anyway, now seems like a good time. To put it simply …

I will not voluntarily use Microsoft PowerPoint, Microsoft Word, or any other word processing or slideshow software (e.g. Google Docs).

Instead, as the title of this blog post indicates, I will be using LaTeX to fulfill all of my needs.

A Brief History

Don Knuth, Professor Emeritus of Computer Science at Stanford, created TeX (which would later influence the creation of LaTeX) in the late 1970s in order to easily create publication-quality mathematics papers. LaTeX is basically the same thing as TeX, except it’s easier to use (e.g. fewer esoteric commands required, etc.). The way LaTeX works is that we take a text editor of our choice, write down a bunch of stuff in LaTeX syntax, and then compile the text to form a PDF document as output. My primary LaTeX text editor is TexShop, which you can see in the top image of this post, but I’ve also been using emacs lately.

It’s not “what you see is what you get” (WYSIWYG), which for some people is understandably a major drawback. Nonetheless, LaTeX has become so popular and is standard knowledge among serious mathematicians and scientists, so in hindsight, Knuth’s creation was an enormous success. In fact, WordPress even allows LaTeX directly into its posts, such as the following (random) integral: \(\int_0^\infty (x^5 - 3x) dx\), which was generated with the following text: \int_0^\infty (x^5 - 3x) dx, surrounded by appropriate tags, which are usually dollar signs.


If you’ve never heard of LaTeX before this post, my proclamation might seem like a pretty big deal. Why avoid using two popular and crucial software in favor of something that seems complicated and only oriented for mathematicians? In my opinion, there are several strong reasons, and I’ll focus first on the use of LaTeX versus Word (or similar word processing software).

The first and most important reason is that in terms of formatting math, LaTeX is far superior to what Word can offer. Sure, one can try to be a master at using Word’s equation editor to circumvent this drawback. (I had a statistics professor who claimed that LaTeX was worthless to him because he could live by using equation editors.) But there are many problems with that stance, and I’ll list some of them.

  1. LaTeX — when written correctly — still produces cleaner and crisper math than the equation editor.

  2. LaTeX can be formatted in many ways depending on the kind of document (e.g. class notes versus a conference publication).

  3. Using an equation editor or other tools often require clicking on a bunch of buttons and pages to search for fraction layouts, Greek symbols, and other non-standard document elements. In LaTeX, we can do all this from our keyboard in an easy and intuitive way. Suppose we want to insert the greek symbol alpha in the document. In Word, I have to look up either the keyboard shortcuts or a large database of symbols. In LaTeX, I simply type in $\alpha$ to get $latex alpha$. (Special names in LaTeX have the reverse backslash \ preceding them.)

In a sense, what I’m really trying to say is that a LaTeX expert can use his or her experience, knowledge, and online documentation to produce quality mathematical expressions quickly.

A second reason to favor LaTeX over Word is that (I believe) LaTeX performs faster. Just today, I opened up a six-page Microsoft Word document and was amazed at how long it took from the moment I pressed the blue “W” on my screen to when I could actually modify the document. There is also a delay between when the document’s contents become visible and when you can actually modify the text without lag. In that same time, I can open up a 50-page LaTeX document and edit it seamlessly, since it’s just plain text. If I want to compile it to view the PDF output, it can take a while during the first compilation (but it’s definitely not unreasonable) and after that, compiling tends to be faster. In addition, a competent LaTeX user shouldn’t be compiling his or her document every ten seconds.

A third reason is that LaTeX can format the endings and beginnings of pages better than the standard “widow and orphan control” of Microsoft Word. If I’m writing a document in Word and I start a new paragraph at the very last line of the page, Word will automatically put that line on the following page once I’ve written enough of that new paragraph. Sometimes I want this, and sometimes this is annoying because I know that I’m wasting space and that the text on different pages might look weird if one page ends on an earlier line than another. LaTeX solves this problem automatically by cleverly “squishing” the text together or forcing it to be on a new page, whichever looks better.

This even works when there are figures involved (e.g. graphs, pictures), which is a huge plus. If there’s not enough space for a figure to appear at the bottom of some page, or if there’s too many to fit on one page, LaTeX will reassign them to some pages accordingly (in the final PDF output) and fill up the remaining spots on the page with text. It’s also possible to “assign” a figure so that it will always be at the top (or bottom) of whatever page it ends up on in the PDF output, a handy feature. Users have the option to resize and center figures, assign captions, and assign labels for referencing in text (e.g. “Figure 3 shows that …”). In fact, we can label anything we want by using the label{} tags. Then, elsewhere in the document, we can use ref{label_name} to refer to something we’ve labeled. The reason why labels are useful is that LaTeX keeps the number consistent no matter how many other figures we add or modify. For instance, if we add in an earlier figure at the start of the document, the “Figure 3 shows that …” text will automatically convert to “Figure 4 shows that …” — reflecting the added image. Needless to say, labels are extremely useful when writing academic papers filled with theorems, lemmas, propositions, etc.

There are other advantages, too, such as that the default settings for LaTeX are superior to those of Microsoft Word (e.g. justified versus non-justified and page numbers on versus off, respectively). Others have discussed these advantages, too; see this post for a start. Also, LaTeX is free. It’s open source, relatively bug-free (after all, it’s been around for decades), and definitely not going away anytime soon.

But what if I need to make slides to give a presentation?

Don’t worry, LaTeX has that covered as well! The key is to use the Beamer class. The following image shows the “cover slide” of a presentation I gave using LaTeX Beamer in my machine learning class last semester, based on this ICML 2012 paper. And yes, that paper, like virtually all computer science papers, was formatted using LaTeX.


With Beamer, we use \begin{frame} and \end{frame} and put text between those two commands to get what we want on one “slide.” The advantage of using Beamer is that it’s a LaTeX class, so we can seamlessly incorporate LaTeX code into our slides. Beamer will also output documents in PDF and can have a nice and clickable “table of contents” settings on the top of each slide, depending on the theme one uses. The PDF output is important, since while PowerPoint is used on many computers throughout the world, PDF viewers are virtually standard in modern computers. There are many more computers with PDF viewers but without PowerPoint than there are computers with PowerPoint but without PDF.


Now, I understand that I will be unable to completely avoid Word and PowerPoint, so I won’t uninstall them from my laptop. Why might I need to use them?

  1. The biggest reason is probably if I’m collaborating with a group of non-LaTeX users. No one is going to want to learn LaTeX in one night just to please me, especially if there’s no math involved, so I’ll have to suck it up and go with what they’re using.

  2. I also want to keep Word and PowerPoint just in case there are some important documents I need to open from someone (or from a webpage). I don’t want to ask people to send me PDFs as an alternative, and if I’m trying to open up stuff written by someone on his website years ago, I likely won’t even be able to ask in the first place.

Bottom Line

I’m looking forward to life largely bereft of PowerPoint and Word. Admittedly, the benefit of LaTeX decreases when one moves from writing technical documents to writing generic documents, but there are still times when LaTeX’s beauty can make it clearly the superior choice of typesetting software. For instance, LaTeX is great for writing resumes and curriculum vitaes. Needless to say, my current resume/CV was formed using LaTeX, and I recently won second place in a competitive resume contest.