My Blog Posts, in Reverse Chronological Order
subscribe via RSS
I might have mentioned this before, but to reiterate, I’m a huge fan of the sandbox computer game Minecraft. (That game and Sid Meier’s Civilization IV: Beyond the Sword currently take up about 95% of my gaming time.)
Today, I want to focus on one aspect of Minecraft, which is map making. Using a variety of third party world editing utilities, such as World Edit, players can generate gigantic, epic-looking buildings, landscapes, and other structures that would otherwise be too burdensome or impossible to build by hand. And when they’re satisfied with the quality of their map, players can upload their creation online to allow others to play it.
And there’s one mapmaker who has turned into a superstar.
That person would be Vechs. His Super Hostile series has become the most popular custom-made map or map series of all time, with well over a million total downloads, countless videos of his maps on YouTube, and spawning a new generation of Minecraft map makers who aspire to create similar maps as he. I’ve put a screenshot of the forums on August 16, which indicates that his topic has over three times as many replies as the one with the Skyblock map. Click for a larger view.
Whenever he updates or uploads a new map, hundreds of players download his map on the same day. He has become so good that Minecraft fans couldn’t ignore him. In fact, even a few developers of the game tried out his map. (They didn’t succeed.) Vechs’ source of map making success comes from creating essentially a new genre of Minecraft, called Complete the Monument [CTM] maps. Vechs started his series by making a variety of adventure maps where players could play like an ordinary Minecraft map.
But his series didn’t start to take of until he added the salient feature of a Victory Monument. Players now have to gather 19 different blocks scattered around his maps to successfully “complete the monument” and beat the map. The catch is that those blocks tend to be heavily guarded by enemies and ingenuous traps, hence the naming “Super Hostile.” To put things in perspective, beating the most difficult maps in his series is orders of magnitude more difficult than playing through a normally generated Minecraft map on hard difficulty.
So why am I discussing about him now? And why is this entry filed under computer science1?
I think I see a deep connection between what he’s done and what I hope to achieve in my computer science research. Substitute “Minecraft” for “computer science” and “Map Making” with any field of computer science, and my point gets clearer.
Vechs took a field of Minecraft — mapmaking — and specialized in one thing, CTM maps. His experience has unequivocally made him the current world expert on creating those kind of maps. There are other “fields” of Minecraft that people can become famous in. Examples include creating “Let’s Play” Youtube videos, popular modpacks, creative mode structure work … the list goes on. But stripping away most of his other options let Vechs maximize his focus on the one aspect that made him famous in the Minecraft world. His latest maps show significant improvement over the first few, indicating that even the best have room for improvement. But he needed time to practice on his map making skills before he got really good at it.
So what should I do?
Right now, simply based on my current research experience, my favorite field of computer science is machine learning. So over the next year, unless I suddenly experience an epiphany and fall in love with complexity theory, I should aim to learn as much machine learning as possible. Consequently, that will be the topic I hope to explore in my next REU project. And since I’m actually taking a course in machine learning in the spring 2013 semester, I know which of my four spring classes I should expend the most effort towards.
That is my mindset. I want to eventually get to the cusp of current knowledge in one field (not two, not three) of computer science — machine learning. Then, I hope to make an impact by making a leap in my field. As others have noticed, that’s how many new discoveries are made. Vechs created a “leap” in the map making community by inspiring a new genre of maps. Hopefully, I can help create generations of future research projects.
Well, it was filed under computer science, under the old blog system (before May 15, 2015). ↩
You’re in luck. There will be a quadruple-header of blog entries (four in four days). This is the opening post.
Almost a year ago, I posted a blog entry discussing some of my short-term goals. So this is the second post in a possible series of entries that delineate objectives I set for myself. In the previously linked entry, I mentioned that the goals were for the fall 2011 semester only, but since I didn’t make another set of goals for the spring, I’ll just extend those 18 to encapsulate the spring 2012 semester and the following summer. In that respect, I completed points number 1, 3, 4, 5, 7, 8, 16, 17, and 18.
But now, I want to be more serious. What are some of the five most important objectives I hope to achieve from now until the end of the summer of 2013? I’ve already mentioned blogging more, so I won’t say it again. But I did come up with …
1. Maintain a better sleeping schedule
My sophomore year’s sleeping schedule was shambolic. One night, I’d go to sleep at midnight. Then it would be 4:00 AM the following night, then back to midnight, and the madness continued. Too often, I let my work dictate how much I slept. This even caused me to uncharacteristically miss several classes by oversleeping. So not only do I want my schedule to be stable, I also want to shift it back a few hours so I can comfortably wake up at 6:00 or 7:00 AM consistently. Since my earliest class on any day of the week is at 9:55 AM, I should have ample time to study, work, or blog in the morning.
2. Study for the Graduate Record Examinations
This is fairly straightforward. I’ll take the general GRE towards the end of my junior year, and I’ll probably take the computer science GRE subject test in the fall 2013 semester. With the exception of midterms and finals periods, I want to dedicate at least 4 hours per week to this task. The aim is to get a high enough GRE score so that my application is not immediately thrown out the window for the top computer science graduate programs. I’ve actually been studying a little this past summer by reviewing some GRE and even SAT vocabulary. Does anyone have additional recommendations for me?
3. Maintain a 3-day workout week
Since the start of November 2011, I’ve consistently been weight training about three days a week, with rare exceptions (they’re justified, trust me on this). Exercises include the squat, hang clean, bench press, pull ups, push ups, core circuits, leg press, calf press, and deadlifts. My strength has notably improved, and I’ve kept my body weight at around 150 pounds while avoiding any of the “Freshmen 15.” Why couldn’t I have maintained this level of dedication in high school?
4. Pick better situations to socialize
I’ve already discussed at length why I do not socialize well in certain situations. I need to strike a balance between avoiding them without giving the impression that I’m too reclusive. Related to that, I want to find ways to create more one-on-one conversations located in quiet environments. I don’t think I’ve done this as well as I could have in the past two years — but I have ideas to rectify that.
5. Engage in more computer science research
The final goal here relates both to my ongoing project of text simplification and what I hope to engage in the following summer. (I don’t think I can realistically take part in an entirely new research project during the academic year, with classes and my work-study teaching assistant duties.) My Bard College research team is still running some experiments, and I’ll be in touch with them to hopefully write up our results. I also want to focus on REU applications for the summer of 2013. I could work solo with a Williams faculty member over that summer, but collaborating on a project with a group of three or four researchers is much more compelling to me. I’ll be in touch with the Williams faculty nonetheless. The preliminary goal is to find a project in the area of artificial intelligence.
That’s all for now. It’s time for me to do them….
UPDATE May 13, 2015: I only managed to do half of what I wanted for this series, but at least I did something. As of now, I’m not going to go back to working on this because my current academic and research interests have shifted.
The fall 2012 semester is approaching. It’s not as fast as those winter waves in Waimea Bay, but close enough. (Yes, the above photo I took is of the same beach — click for a larger view.)
Here are all the courses I’m taking:
- Applied Abstract Algebra
- Computer Graphics
- Theory of Computation
All are lectures, with Computer Graphics being the only course that includes a lab component. Classes 1 and 3 satisfy my math major requirements, while 2 and 4 are for computer science. For the first semester ever, I won’t have at least one class that doesn’t fall outside of my two majors. So on the one hand, this means I’ll maximize my dedication in these classes, and will probably get high marks (famous last words).
But unfortunately, it means I won’t have as many options if I get a little burned out of computer science and math. I tend to spend long hours studying for exams and working on homework, so I’m going to try and do something that will hopefully alleviate some of the workload. This is purely an experiment, and one that I plan to continue if it brings solid results this semester.
I’m going to make a series of blog posts on my Theory of Computation class (henceforth, CS 361). For reference, here’s the course description from the Williams Course Catalog that delineates the fun stuff coming up for me:
This course introduces a formal framework for investigating both the computability and complexity of problems. We study several models of computation including finite automata, regular languages, context-free grammars, and Turing machines. These models provide a mathematical basis for the study of computability theory–the examination of what problems can be solved and what problems cannot be solved–and the study of complexity theory–the examination of how efficiently problems can be solved. Topics include the halting problem and the P versus NP problem.
After every few classes, I hope to record on Seita’s Place what I learned and any relevant information going above and beyond the classroom discussion. By the time I take the midterm and final, I’ll have a nice repository of information online to help do quick review. I will strive to start these entries as soon as possible in draft form, and will add information to them a few hours after each CS 361 class.
There will be a consistent format for each of the posts. Each entry will be titled “CS Theory Part X: Y” where X is some natural number, and Y is a phrase that relates with the material I’ve learned and will cover in the blog entry. I want this to be like a personal Wikipedia that makes heavy use of rigorous proofs and outside sources.
So why do I want to do this? The most important benefit will be that it deepens my knowledge of theoretical computer science in a way that avoids long study hours and memorization session. Again, as I plan to update these entries soon after my classes end, I will minimize the amount of material I forget due to time. Furthermore, by writing these entries in my own words, I force myself to understand the material well, a prerequisite for explaining a subject in depth. (There’s a whole host of information online that backs up the previous claim.) Since I don’t want to write a book on theory, I have to pick the right spots to focus on, which requires me to be able to effectively judge the importance of all the concepts hurled at me in the class. Also, using the Internet over paper to express these posts makes it easier to link together concepts in a web, as explained by Scott Young’s holistic learning method.
But this begs the question: why this class, and not one of the other three?
My long-term goal is to pursue a Ph.D in computer science. As part of the process, I’ll be taking the computer science GRE and Ph.D qualifying exams. As you may expect by the course description, the material in CS 361 is most closely related with what’s going to be on the test than the material in the other three classes. Taken from the Educational Testing Service Website, we see that 40 percent of the material is theory!
III. THEORY AND MATHEMATICAL BACKGROUND — 40%
A. Algorithms and complexity
Exact and asymptotic analysis of specific algorithms
Algorithmic design techniques (e.g., greedy, dynamic programming, divide and conquer)
Upper and lower bounds on the complexity of specific problems
Computational complexity, including NP-completeness
B. Automata and language theory
Models of computation (finite automata, Turing machines)
Formal languages and grammars (regular and context-free)
C. Discrete structures
Elementary combinatorics and graph theory
Discrete probability, recurrence relations and number theory
I suspect the amount of theory material on Ph.D qualifying exams is similar. These vary among institutions, so there’s no standard.
Computer graphics, while no doubt an interesting subject, isn’t as important in terms of the subject test material.
IV. OTHER TOPICS — 5%
Example areas include numerical analysis, artificial intelligence, computer graphics, cryptography, security and social issues.
It would also be more difficult for me to post graphics-related concepts online, as I’m certain that would involve an excessive number of figures and photos. I do have a limit on how many images I can upload here, and I’m not really keen on doing a whole lot of copying from my graphics class’ webpage; I prefer to have the images here be created by myself.
I also chose CS 361 over my two math classes. If I’m planning to pursue doctoral studies in computer science, it makes sense to focus on CS 361 more than the math classes. I was seriously considering doing some Probability review here, but the possibly vast number of diagrams and figures I’d have to include (as in graphics) is a deterrent.
Finally, another benefit of writing the series will be to increase my attention to Seita’s Place. I hope to boost my writing churn rate and my research into deafness and deaf culture. Even though it’s relatively minor, this blog has become more important to me over the past year, and I want to make sure it flourishes.
I’ll keep you posted.
As I mentioned earlier, I’m in Honolulu, Hawaii. I’ve been taking many pictures, which gives me the chance to finally add a picturesque post to my text-based blog. And since I just turned 20, I can continue to post on my birthday, just as I did last year.
One of the highlights of my trip was visiting Kauai, the fourth Hawaiian island I’ve set foot on. I took a picture of the same falls that was on the cover page of the tour guide. It’s gorgeous.
Here’s a view of Waimea Canyon, or the Grand Canyon of the Pacific. I’ve never been to the real Grand Canyon, but it felt like I was looking at a similar structure of nature. The view was so nice that I decided to add a panorama of this to replace the ugly default image of this WordPress Blog template.
Back in Oahu, I was at Waimea Bay. (What’s up with the repeated name?) This is one of the most beautiful beaches I’ve set foot on. I’m glad this is the summer — I’ve heard that real waves appear in the winter.
There were also some nice Japanese-style temples on the way back from Waimea.
Afterwards, I made my way to the famous Nuuanu Pali Lookout, whose strong winds are (sadly) the only thing that can make my hair stick up. Looking at this reminded me that I should have increased the resolution of my images on my phone.
Finally, I’ve also had my share of mischievous moments.
All right, that’s enough for now. It’s back to more traditional blog entries later. I have a lot I want to talk about in the next month.
Daniel, how much do your hearing aids help you hear?
I’m surprised I don’t get asked this question often by my classmates, colleagues, professors, and others. Perhaps it’s because my speaking ability causes hearing people to believe that I can hear as well as them. Or maybe they believe that hearing aids can cure hearing loss?
Unfortunately, that’s not what they do. They amplify sound, and allows me to be aware of its existence. If someone’s talking, then I know that the person is talking, regardless of whether I’m looking at him or her. The difficulty for me, and for other people who wear hearing aids, is understanding the fine details.
Here’s an analogy. Suppose you’re on your computer and are looking at a tiny image that’s 50×50 pixels in size. From what you can see, it’s interestingly complex and alluring. It’s bright and colorful. There are wondrous curves that weave together and form some figure you can’t make out. You want to better understand what the image conveys, so you do the logical thing and try to resize it by pasting it into Photoshop and then dragging its edges to fit your entire monitor. (I’m assuming you’re a little naive — no offense intended.)
But there’s a problem.
When you do that, the image doesn’t become clearer. Your computer has to invent new pixels that coincide with the original pixels. As a result, the “large” image is a badly distorted version of the smaller one, and you still can’t fully understand what the image means. But perhaps it does convey more information than the really small image did, so despite its flaws, you stick with this method of understanding tiny pictures.
As I mentioned earlier, hearing aids let me understand sound that I would otherwise be unable to understand with the naked ear. A lot of sounds. In fact, I can hear no sounds with an intensity of 90 decibels or lower with my natural hearing; it’s the epitome of being “totally deaf.” For me to hear and understand a person talking in a normal tone without my hearing aids, my ears must be inches away from his or her mouth. Obviously, that’s not happening during most of my conversations, so I wear hearing aids almost all the time when I’m not sleeping.
A Typical Situation
I want to discuss a challenge that occurs often in my life, which is socializing in groups. Most of its difficulty is due to mechanical limitations of hearing aids, but there are also other forces in play. An example of a hypothetical situation would be during my 2012 [Bard College REU], where I would often amble with one or several other students for ten minutes at a time, which was the walking distance from our dorm to the heart of campus.
In the company of just one student, I can easily start and maintain a satisfying conversation. I still have to ask my companion to repeat every fifth or sixth sentence he says, but at least I get the general direction of what we’re talking about.
But things get exponentially worse with three, four, and more people. And this is where the hearing aid’s lack of ability to make sound distinctive hurts me. What typically happens is the other people get involved in a conversation that I can’t follow. There are multiple factors that hinder my hearing in this kind of situation.
The first is what I explained before, with the hearing aid’s difficulty in clarifying amplified sound. Incidentally, I won’t go into depth on technical reasons, since there are a whole host of articles that such as this one that talk about hair cells. A second one is that in a group conversation, people often don’t look directly at me when talking, thus rendering my lip-reading skills useless. (Lip-reading can account for about 25 percent of my comprehension.) Moreover, my hearing aids are designed to best amplify sound when it’s coming directly at me, towards my eyes, which isn’t always the case. It’s even less common if all of us are moving (i.e. walking), which adds another roadblock to my comprehension.
A third source of hindrance is simultaneous talking and ambient noise. As you can imagine, when my companions keep interjecting each other or laugh as a group, it adds another level of complexity. In order for me to understand as well as a hearing person could in that situation, my hearing aids would have to somehow partition the various sources of sound into “Sound coming from Person A,” “Sound combing from Person B,” and so on. Things get worse when there happen to be, say, thirty young campers in a group next to us yelling, and the like, which is why I prefer quiet cafeterias and restaurants.
A fourth problem — yes, there’s more! — is the “Deaf Nod,” which doesn’t so much relate to hearing aids but is a consequence of being deaf. This occurs when a deaf person who doesn’t understand what’s said in a conversation decides to do a weak nod to give the impression that he or she understands what’s going on, despite how it’s the exact opposite! I’m so guilty of the Deaf Nod that I feel shameful. Part of this stems from frustration in lack of understanding, and another motivation is not to seem like a hassle to the people who I’m conversing with. The ultimate result is that I just play along with a conversation that I’m unfamiliar with, which has a chain effect since I then don’t fully understand what’s discussed next if it builds up on previous dialogue. It’s more common for me to do the Deaf Nod when there’s at least five people involved, or if I’m not really familiar with my conversationalist.
Finally, my hearing aid batteries could die at any moment. Fortunately, I usually receive some form of notification, such as several awkward beeps together. But sometimes, there’s no warning, and my batteries die suddenly. If it’s my right hearing aid that stops working, then it’s not much of a problem, because I obtain most of my hearing from my left ear. But any extra bit that I can hear helps. And it won’t always be possible for me to have easy access to my batteries. I usually put them in a small pouch in my backpack, so I have to dig in, get the battery out, replace the old battery, and insert it. Even though the entire process takes a few seconds, simply doing it detracts from my focus on the ongoing conversation, making it even harder to get back in the mix. And I’m not going to even discuss the case of when my left hearing aid is the one that dies — I can barely understand anything if that happens!
Battery failure is the most common of hearing aid technical difficulties, but there are others. Even though my hearing aids are generally reliable, I have experienced many cases of when technical difficulties have ruined current and potential conversations.
To recap, here’s a list of the barriers I experience:
- Difficulty in clarifying amplified sound
- Lack of eye-contact
- Simultaneous talking
- Frustration and the “Deaf Nod”
- Hearing aid technical difficulties
So while the benefits of hearing aids are enormous, the (non-exhaustative) previously listed
challenges make it impossible for me to experience what life is really like for hearing people,
especially during group situations. There are other cases when the hearing aid falls short of
optimality, such as when I’m watching television, but I can write an entire
rant blog entry about that later.
Now for the Good Part
I don’t want to give the impression that hearing aids don’t help me at all. I was merely highlighting a salient downside. But the reality is without them, I would never be aware of many noises that exist in today’s world. I don’t think I would have done as well in school if I didn’t have hearing, and I certainly wouldn’t be able to do well at an academically rigorous school like Williams College (second in Forbes’s 2012 college rankings!), since I rely heavily on communication with my professors.
And I also wouldn’t be as eager to go to computer science graduate school.
Look at the home pages of the computer science faulty at your college or university, and go see their non-dissertation publications. How many papers do not have another author? I checked as many Williams computer science faculty publications as I could. And I would guess that just two or three percent of them (textbooks and informal papers excluded) didn’t have two or more authors.
I’m sure that most of the communication involved was email, but from my own experience, I’m convinced that it’s so much easier to conduct group research face-to-face. And that’s the baseline of what I need from my hearing aids. I need to hear just enough to make working on a research project feasible, which means communication should not be a research roadblock.
I love hearing aids. I put them in my ears as soon as I wake up every morning. I take care of my hearing aids by cleaning and drying them regularly. I store all pairs in soft containers and use my portable hearing aid dryer to ensure they [don’t break down due to moisture]. Even though it’s really tempting to do so, I never take them for granted, and constantly remind myself of how dependent I am on them. (Writing this entry was one way to do that, for instance.)
Hearing aids have offered me the chance to experience the euphony of the world and to be capable of socializing with the vast majority of people I meet. But they perform poorly in many situations, falling short of normal hearing, and they are incapable of truly curing significant hearing loss.
Java was the first programming language I felt comfortable enough to write lengthy programs that, for instance, could be used to advance the goals of a research project. So for my first program needed for my summer research at the Bard College REU, I used Java. I wrote code to create a random sentence generator. That was during my second week of the REU, and I had writen one significant Python script in my life, which was for a bonus question from my Algorithm Design & Analysis homework.
Let’s fast forward a bit. By the time the REU ended, I had written over 20 significant programs … in Python. So what happened?
At the start of the REU, I knew much more about Java than Python. But after some prodding by my advisors, and the fact that everyone else in my research project was using Python, I switched languages. I soon found — as they said I would — that Python was so much easier to write than Java. In particular, the file input and output is so stunningly simple, yet incredibly useful, a must for all the scripts we wrote that involved manipulating files. My project, after all, was about text simplification, and all the relevant corpora were stored in files.
I also found it easier to understand the official Python documentation than Java’s, so looking up things was less of a challenge. Like the guy who wrote this (heavily biased) article about Python versus Java, I have to look up a lot of things for Java, whereas the same became a bit less true for Python.
My experience confirmed what I’ve heard that learning new languages is easier once one becomes proficient at another language. Well, with the possible exception of Malbolge.
I started this blog on August 1, 2011, so I get to celebrate that I’ve been blogging for about a year. I now want to take a step back and see what I have achieved and what goals I should set for the future.
This is my 36th entry, so if you do the math, that’s 3 posts every month, which is a pace I hope to at least maintain in the next few years. There were definitely times when I didn’t feel interested in blogging, but as I explained in my [Blog Productivity] entry, I anticipate that to change. Of course, this assumes I maintain my normal blog entry size of about 500 words.
Of my 36 entries, 12, 9, and 15 (including this one) are filed in the categories Computer Science, Deafness, and Everything Else, respectively. I’ve found that it’s a lot easier to come up with topics to write about in the “Everything Else” category than in the other two. But I’m trying to carve out a niche for myself in the blogging world. Anyone can write a generic blog about any topic, but I want to devote more attention to computer science and deafness, two subjects that encapsulate much of my life.
I’ve also started the habit of writing multiple drafts before publishing posts. In the past, I would start blog entries from scratch and post them as soon as I was tired of writing. (And any future edits would require me to fix something that was already published.) But now, I have multiple drafts of posts saved in my WordPress Dashboard. I think my past urge to publish as soon as possible was because I wanted to get as much content up on my blog, even if it represented less than my full effort. But now that I’ve already got a good amount of material, I think it’s been easier for me to sit back, sift through my drafts, make (possibly major) revisions, and then publish only if I feel that I’ve really poured in enough effort in that entry.
Finally, I’m trying to increase the traffic to Seita’s Place. I’ve been searching and following other blogs if I deem that they are significantly related to the topics I post here. Hopefully this will lead to some reciprocal action. However, since I enjoy writing, I will still be blogging even if I don’t get many visitors.
All right, let’s switch gears. Now that my Bard College REU is over, the next chapter of summer 2012 starts — in Honolulu, Hawaii! I’ll be there for two weeks and I hope that I’ll have a great time with my family. And true to my new habits, I have a few interesting posts backed up (no less than five), which will likely be published upon my return to Albany, New York. Stay tuned.
UPDATE May 13, 2015: With the move of this blog to Jekyll, some of these points might not be relevant anymore.
With the way the blog has been growing recently, I’ve decided to add a new page that contains my ten favorite entries here with rationales, called “Top Ten Entries.” You can find it at the very top of the blog, next to the currently-existing “About” page. In a way, my new page serves as self-motivation, since whenever I’m in the process of writing a yet-to-be-published entry, I’ll be wondering how I can tweak it so it can crack my top ten list.
Furthermore, the page can be used by readers who want to see a sampling of my work or who do not have the time to read all entries. I also suggest that people visiting this blog for the first time should take a look at that page. In the future, I may make a separate page, “Start Here,” just for that purpose, but I think the page I just set up now will be well-suited as a starting point. Any constructive feedback on the structure of “Seita’s Place,” of course, is more than welcome.
I’ve written my final report to the National Science Foundation for my summer work at the 2012 Bard College Mathematics & Computation Research Experience for Undergraduates. My project focused on text simplification, which can be broadly defined as the process of making (English) text easier to read while maintaining as much underlying content as possible. I would consider it to be a subfield of machine learning, which is a branch of artificial intelligence.
The primary objective of my work was to improve on existing results of text simplification. There is no standard way to measure the quality of simplification, so my research team — consisting of me, another undergraduate, and two Bard college professors — decided to use BLEU scores. The advantage to using those scores is that, in the summer of 2011, another research group used it as a measure of their level of simplification. Our goal was to beat their BLEU scores.
To carry out the translation process, we used the open-source Moses software. To train the system to simplify text, we used aligned, parallel data from Wikipedia and Simple English Wikipedia. Moses is designed to be able to translate across different languages, but we considered “Simple English” as its own language. Thus, we viewed the project as encompassing an English-to-English translation problem, where the “foreign” language is derived from Wikipedia, and the “simple” language is derived from Simple English Wikipedia. Our hope was that Moses would be able to understand the steps involved in making English text simpler and act as an effective means of translation.
We soon discovered, though, that our parallel data was of low quality. We therefore used the LIBSVM and LIBLINEAR classification softwares to improve the data by switching or deleting certain pairs of lines from the parallel corpus. For instance, if there was a short sentence in the Wikipedia data that was aligned to an obviously more complex sentence from the Simple English Wikipedia data, it made sense to switch those two sentences so they were in the more appropriate data set. After all, the data was made by random people contributing to the two Wikipedias, so there were bound to be a few bad samples here and there.
Our group successfully classified a random sample of sentences as belonging to the Wikipedia set or the Simple English Wikipedia set with a higher degree of accuracy than previous researchers did. My professors are still performing some experiments, so it remains to be seen if we can get higher BLEU scores.
Overall, I’m glad I had the chance to participate in the Bard REU, and I’m optimistic that we will produce a strong research paper. I thank the professors there for accepting me into their program, and the National Science Foundation, which has now sponsored my second straight summer program.
A few months ago, I made a Facebook status update announcing how happy I was to have new hearing aids. Well, I’m still glad to have them, but I have one main gripe about their touch screen. The slightest drop of sweat that contacts the touch screen will cause the hearing aid to act unpredictably.
My hearing aid model can be found here. And as you can guess, instead of having the old-fashioned hearing aid switch that you click to increase the volume control, there’s a touch screen instead. Pushing your fingertips up the back of the hearing aid will increase volume; pushing it down will decrease volume. And touching the hearing aid will change the mode (e.g. the T-coil, or telecoil).
Actually, that’s almost as annoying to me as my main gripe, which I mentioned earlier. Suppose I have an itch near the back of my ear. If I inadvertently touch the back of the hearing aid while trying to relieve myself, I’ll change the mode when I don’t want to. This forces me to make a few more touches in order to get it back to the old mode. And switching modes causes a temporary blockage of sound from entering my ear.
But to me, the bigger problem is that if I engage in any sort of physical activity for just a few minutes, the sweat that gets in touch with the hearing aid will cause it to behave abnormally. When my hearing aids first acted weirdly by making a lot of beep-beep-beep sounds due to sweat, I thought they were breaking down. But then I realized that the sweat was causing the hearing aid mode to change! The sweat seemingly perturbs the touch screen and causes it to touch itself. So I end up tapping the hearing aid a few more times to get it back to the right mode. But then it changes modes again! And again! The cycle continues.
So, for instance, when I go to the weight room, I make sure I have my backup hearing aids on, which do not have a touch screen. Alternatively, I’ll just take off the hearing aids. (This presents a multitude of additional risks, so I wouldn’t recommend it if you’re not experienced with lifting weights, or if the gym is especially crowded.)
I’ve learned my lesson. I’m still happy to wear these hearing aids, but when I get new pairs in a few years, I’ll be sure to avoid the ones with touch screens.
A few weeks ago, I remember solving a Project Euler question that particularly intrigued me. I saw problems similar to this when I was in middle school, but never recalled myself successfully solving something them. For reference, the problem was Project Euler 85, counting the number of rectangles in a rectangular grid.
But when I attempted this particular problem? It took me fewer than thirty minutes to think of a solution and code it. So what changed?
I believe it’s been my new algorithmic approach to solving problems.
I fully admit that I’m a brute-force, “try-them-all” kind of person. Whenever I see a problem of the form “How many X satisfy the property Y,” my first instinct is to go through all possible candidates of X and store a count of how many satisfy Y. I first approached this problem by remembering what I did years ago — count rectangles by hand. The questions I experienced back then were much easier in that it was feasible, though time consuming, to actually count all the possible rectangles. Of course, it’s so easy to miss or double-count a few rectangles here and there that I was bound to be off in my final answer.
So I realized I needed a formulaic, or algorithmic, approach to this problem. There had to be some formula out there that could take as input just the dimensions of the largest rectangle, and use that information to compute the total amount of rectangles contained within it.
There was. I started by counting rectangles by hand for rectangles of a small scope (e.g. a 2×3 one), and then testing my conclusions on other small rectangles. This was my algorithm:
total = 0
for i: 1 to n, inclusive
for j: 1 to m, inclusive
total += (n-i+1)*(m-j+1)
Basically, I consider all the possible dimensions of a smaller rectangle within a larger, n-by-m rectangle. (For instance, I could consider all 1×1 rectangles.) Then I use the (n-i+1)*(m-j+1) formula I derived which computed all the possible rectangles! It was very pleasing for me to solve this problem, and I know that I would be able to solve questions relating to counting rectangles within rectangles with an algorithms-based approach.
For reference, here was my actual code. It’s written in Python. (I used Java to solve the first 50 or so problems, but since then, it’s been Python all the way.) The code gave me the correct answer in fewer than five seconds.
I have to thank Project Euler for not just giving me a medium through which to practice programming, but also for lending me a new perspective on problem solving approaches. I’m just two solutions away from level 3 at this time….
This post is part of a series of posts related with my solutions to selected Project Euler questions.
After following a 9:30 AM to 6:00 PM schedule for most of the summer, I’ve occasionally experienced “That 2:30 Feeling,” courtesy of the 5-Hour Energy commercials. (Being a college student, in which I can nap as often as I wish, just doesn’t provide that experience.) For those who haven’t seen the commercial, “That 2:30 Feeling” refers to the almost complete exhaustion many people feel after lunch, which is then mysteriously followed by a period of renewed energy near the evening as the workday comes to a close. I clearly, though, don’t want the mid-afternoon lulls to attenuate my productivity. So what should I do?
Idea 1: Caffeine
Since I can’t stand coffee or tea, caffeine for me usually comes in the form of energy drinks. But is this really the right solution? A quick burst of caffeine might make the difference between making significant progress on a project or having another regrettable day of doldrums.
It comes at a cost, though. Caffeine’s side effects are well documented, and a particularly infamous one is that it can ruin a sleeping schedule. In fact, from my experience with consuming energy drinks, the immediate impact seems negligible. My focus during the hours after drinking Red Bull rarely seems to be on a higher level than my focus throughout days when I abstain from consuming energy drinks. The lasting impact is far more noticeable. On days when I would ordinarily sleep at around midnight, having a Red Bull before lunch makes it highly likely that I’ll stay up until 4:00 AM before managing to finally wrestle myself to sleep. Then the next day’s sleeping schedule is ruined … and it’s a constant struggle for me to return to my desired “midnight-to-8″ routine. So what is a better solution?
After a day at work when I was just about to feel “That 2:30 Feeling,” something happened that just screamed “crisis averted!” I found something to do that regained my focus and allowed me to stay alert for the entire day. It brings me to my next idea.
Idea 2: Code.
On that day I referred to earlier, I set a goal to write a Java program that could generate artificial sentences out of a bank of words. Suddenly, I felt eager. After a moment of deliberation, my fingers began tapping on my keyboard as fast as lightning. My brain was completely funneled in on the current task. My focus was optimal; my productivity was at its peak. And by the end of the day, I had a working program that generated humorous random sentences. (Those who have written random sentence generators will know what I am talking about.) In subsequent days, I tried to plan out my work by setting the period of time after lunch as a coding period.
So what is my point? Not everyone can or wants to code. And not all jobs are conducive to programming. I was lucky in that my interest in programming, combined with my work which requires me to write several scripts to support my overall goal of text simplification, made it easy for me to identify something that was necessary and could be solved with some programming.
In the grand scheme of things, having something to do that grabs your attention and mandates focus, but isn’t too dull, and is related to your job, is what you needs to fight “That 2:30 Feeling.” Everyone may have something different that works for him or her. For me, my solution is to code.
I just made seven blog entries last June. That’s how many I had in the rest of 2012 combined, so it’s clear that I must be doing something different in the summer than during the spring semester.
The obvious answer is that I’m not faced with the overwhelming academic demands of Williams College. While I am working full time at Bard, I’m definitely laboring for fewer hours as compared to spring semester. And often, my work consists of twiddling my thumbs and rereading academic papers while I wait for a lengthy experiment to conclude. At the time of this writing, for instance, I have two processes running on my computer that I expect to take about ten hours to complete. Sadly, I haven’t shut down my lab computer in about a week.
But the deeper answer is that I have gotten more motivated about writing and have consequently been diligent in setting aside serious writing time. Even though I was inundated with coursework and applications last semester, there were still periods of time, like the entire Winter Study period, that were just ripe for writing. Yet I didn’t blog much, preferring to follow and play basketball, play Minecraft, and do other activities. But writing a blog, even if it doesn’t receive much attention, has a certain appeal that games lack. The lasting impression of written work I produce, by its very nature, affords me an opportunity to look back and see how my interests have developed and vacillated throughout my undergraduate years. And this is also practice for an activity that I will be performing often in life, whether it be for scholarship applications, formal emails, manuscripts, or other documents. By writing often, I feel like I experience hidden reservoirs of enjoyment. Those are especially prevalent during days when I don’t have much to do for the night, and want to make use of precious time.
I hope to maintain this rate of posting during my junior year of college. Since I won’t have a roommate, it’s easier for me to have my own sleeping schedule. My preliminary plan is to wake up much, much earlier than the 9:30 to 11:00 AM time that was the norm during my sophomore year, and use the early morning as study time. By doing more work before classes, I hope to easily set aside blogging time several nights a week.
On a general note, my recent surge in posting just shows that the more you like an activity or subject, the more productive you will be. That is why I need to make sure that I am deeply passionate about whatever computer science field I intend to study in the future.
As a prospective computer science graduate student, I know I will likely be attending — and talking — at conferences. And my worry is that accommodations will be either lacking or unsuitable for the task. Just recently, I read a few startling messages on an email list aimed at people with disabilities interested in science, technology, engineering, and mathematics. A deaf student was in a tough situation regarding accommodations. Here was the message that started it all, with the author, location, and relevant names protected:
I’ve registered for two […] conferences this summer. […] I sent emails to the coordinators asking for interpreter support, but they have not responded. Since it’s illegal under the ADA for organizations to refuse to provide reasonable disability accommodations, what would the best approach be here? I don’t want to come off too strongly and alienate them.
Ouch. The student can’t gain the full experience of the conference without accommodations. But if he lets them know about the ADA’s (The Americans with Disabilities Act) requirements, will the coordinators view him as a nuisance and possibly block him from coming? The student’s position doesn’t improve with his update, in which he told us how the coordinator responded:
“We do not have the capability to provide an interpreter, but it will absolutely be no problem to accommodate if you obtain one yourself.
Let me know how else I can help.”
At least the coordinator responded, but not in a fair way! I don’t expect everyone to be experts on the ADA, but from what I can see, this message was, in my opinion, poorly constructed and displays a lack of research about accommodations on the coordinator’s part. I would hope that the writer would offer a better excuse1 rather than the current bland response. Fortunately, the deaf student received support from people on the email list. Here are some segments of the most scintillating response:
It is amazing, but not uncommon for anyone in any organization not to know the relevant rules or laws governing disabilities in general now. […] I do know for a fact, that there are several students in the country that have disabilities that earn their PHD’s with very little accommodations, because they don’t want to be seen as the troublemaker in their respective departments. They may win the grievance or lawsuit in the end, but don’t get the recommendations of their department heads when they start looking for faculty positions after that. This is an unwritten game that plays out each and every quarter and semester at a university in the country. I currently hear from people that want a solution that does not require them to file a grievance or lawsuit. Unfortunately, it is not limited to schools and businesses, it is also extremely prevalent in governmental agencies as well. I have seen it at the local and State levels, but it is still very common among the myriad of Federal agencies.
Fortunately, there were a lot of people in the disability rights movement that came along before me to help pave the way that has allowed me to be successful in life. I feel that it is my responsibility to continue to break down barriers that will allow even more people to benefit from the lives they want to lead. If that requires me to educate some people, then I gladly accept the role. If it requires me to kick the door down, then I can also achieve this as well.
This is well said. I admit that I feel like a burden when asking for accommodations, but I know that in the end I need them in order to perform well in my studies and work. My aim is to become the best computer scientist I can, and if others view me as a troublemaker, so be it. I will just have to take advantage of my accommodations and do the best work possible to show to others that I deserve to work wherever I please.
Another person wrote a short email that sums up all of our sentiments:
It shouldn’t be about how much the event costs. You have the right to get the same benefit from it as someone who is able to hear.
There has been no update from the original deaf student since his last message which thanked the respondents for their support. I hope it has worked out for him.
My idea of an “excuse” would be just what the coordinator said — a lack of capability to provide interpreters — but there needs to be justification and evidence that the conference did as much as it could to provide accommodations. If the conference was in the middle of nowhere with no interpreters available within a 2-hour radius, then I can understand. But who would organize a conference in the middle of nowhere? ↩
Midway through my Bard College summer REU, it is becoming clearer to me how I have been spoiled. At Williams College, all ASL interpreters who work for me are required to possess RID (Registry of Interpreters for the Deaf) certification. To obtain that title, interpreters have to possess a standard, national excellence of sign language fluency, knowledge, and skill. They have to pass tests and other requirements detailed on the linked website. The situation was the same at the University of Washington’s Summer Academy. During an “Academy Base,” which was the 9:00 to 10:30 AM time slot when all the students would gather around in a room and listen to several lecturers, the disability coordinator there once said verbatim: “Have you noticed that all the interpreters here are really good?”
They were outstanding, and I’m wishing the same quality of services existed at the Bard College REU. I am grateful that Bard has generously provided me with interpreting services for all talks, even for those based on such abstruse topics that I would never be able to understand. The interpreters here, unfortunately, are not in the same class as those at Williams College or the University of Washington. They remind me of my interpreters during high school. That is the effect of being spoiled; you are gift-wrapped something outstanding, and do not want to release it and obtain a lesser version the next day. Even though the law requires that someone like me is entitled interpreting services, institutions can provide accommodations of varying quality.
So for all RID-certified interpreters out there, thank you for taking the extra step to ensure that you are delivering high quality interpreting services. I can only hope that my own signing will be up to your standards one day.
Tonight, I made the executive decision1 to rename one of the three categories of my blog, academia, as computer science.
Why did I do this? Lately, as a result of taking more high-level computer science courses and independent research, I have found more topics related to computer science that I feel comfortable enough writing about. I do not have a comparable level of knowledge of academia, as I obviously have never experienced what the life of a professor is like. (Heck, I don’t even have a Bachelor’s yet!) Furthermore, I feel like any future post I make filed under “academia” would be stale and likely a rewriting of another person’s opinion. Hence, I wish to avoid bifurcating “academia” into “computer science academia” and “generic academia” categories.
My category swap is also a reflection of my own plans for the future. I’d say there’s a 90% chance that I end up pursuing a Ph.D. in computer science, so that goal hasn’t changed. I do, though, want to keep my options open after graduate school. Computer science is a unique field in that there is significant industry demand for Ph.D.s, so many postdocs can find their way into prominent companies such as Google or Microsoft. There is an old letter (1999) here that always pops in my head when I think about graduate school. Obviously, I don’t worship that letter like it’s the Bible, and I’m aware that the writer, a professor of physics, makes a few spurious generalizations. But the general consensus I have heard and seen over the past few years, from my own experience, is that a computer science Ph.D. is more flexible than Ph.D.s in other fields. I have seen and met Ph.D.s at companies such as Google, and could easily see myself following that career path.
I am not going to neglect academia entirely, of course. I will keep it in mind, but I can’t see myself only considering a professorship as my future career. Incidentally, job flexibility might be a reason to side with systems over theory in computer science research. Hopefully I will have a greater understanding of the job market in computer science once I graduate from Williams College.
I have updated all older entries to be filed under the category that suits them best. I have only one category assigned to each blog entry. Including this one, I have made 27 posts on Seita’s Place. Six are categorized as “Deafness,” nine as the newly minted “Computer Science” category, and the remaining twelve encompass the “Everything Else” on my blog. I’d like to keep a balance among those three topics, so I think I know what I want to write about next.
Yes, I know I’m the only person writing this blog, but it feels good to say something like “executive decision,” would you agree? ↩
Eight months ago (wow, has it really been that long?) I made the first of what I hoped would be a series of posts related with American Sign Language (ASL) guidelines. You can view that blog entry here. I hope to expand on Axiom IX:
Axiom IX: The simplest way to manage personal pronouns is to point.
Axiom IX Footnote: To sign the general word “he,” point your finger in the air.
For the purposes of brevity and clarity, I will focus on the personal pronouns listed in the corresponding Wikipedia entry. And by looking at that table, I realized that my axiom was slightly incorrect. Not all personal pronouns are indicated with the index finger. If one is signing a possessive pronoun, e.g. my, yours, his, and her, it’s best to use the entire hand with the palm facing towards the correct entity. More specifically, the hand should be flat and look as if it is the sign for the letter “b” but without the thumb curling towards the center of the palm.
Example: You are signing the equivalent English sentence of “That book is yours” to a friend. A correct ASL depiction would begin with pointing to the “book entity” — pointing to the actual book if it is visible to both of you, or pointing to any non-previously indexed location if it is a “virtual” book, followed by the sign for book. Then, the “your” sign would follow, with your flat palm facing towards your friend. Add emphasis by pushing your hand forward slightly.
The words his, her, and their have similar signs, except the hand will be pointing towards wherever he, she, or they are located (indexed), respectively. And clearly, “my” or “mine” will be the reverse of “your.” Your flat hand should be pointing towards your chest, possibly touching it.
So when is the finger (I mean … using a finger) appropriate? Right now, I think it’s the exclusive sign for he, she, and it. That’s fewer examples than I thought, so the axiom definitely needs to be reworded. And things get even more complex when including the reflexive personal pronouns: myself, yourself, herself, himself, itself. For those signs, you would use a “thumbs-up” on the dominant hand. Direction still needs to be respected; “myself,” for instance, is signed by tapping the “thumbs-up” hand slightly on your chest.
Given that there are multiple ways to express personal pronouns, and that all of them deal with respecting the orientation of the targeted entity, I think the axiom should be reworded as:
Axiom IX: To manage personal pronouns, indicate the targeted entity by pointing your hand in the appropriate location. In general, use one of the index finger, a flat palm, or a thumbs-up.
The related footnote would accentuate the distinctions between the dominant hand using its index finger, a flat palm, or a thumbs-up.
Facebook Chief Operating Officer (COO) Sheryl Sandberg leaves work at 5:30 every day. That doesn’t surprise me at all. I’m also familiar with the stigma the article mentions towards people who leave work early, which the article’s author defines as before 8:00 PM. The latter view exists because it’s common sense to assume, if two people are working in the same job, and person A leaves at 6:00 PM while person B leaves at 10:00 PM, that person B is more hard working and the better employee. And Person B should be getting the promotions … the recommendations … the list goes on. But does it really have to be this way?
During the past few years, I keep reminding myself of “intense focus.” I consider my studying good when it is productive. That is, I have a high rate of material retention and understanding per hour of my studying. I hate, hate spending hours reading, thinking, or writing, and feeling like I haven’t made progress in whatever task I’m doing. And in almost all of those “wasteful” scenarios, a lack of focus is the issue. Which is, therefore, why I am unsurprised and pleased about Sandberg’s announcement. My hypothesis is that, when faced with a time constraint, people will be increasingly pressed to be productive and efficient during their work.
I certainly share this experience. I didn’t have to think hard for an easy example. In my Real Analysis class last semester, we had three exams. The first was a 4-hour take home exam, and the other two were 24-hour take home exams. Surprisingly, the exam length of the three was roughly similar. The first exam had seven questions, and the other two had eight. And the problems were relatively even in terms of time needed to solve them. Many students preferred the longer exam, since it gave them more time to think about and revise their answers with less fear of a time constraint.
But I argue that a shorter time constraint is beneficial because it forces me to stay alert. Knowing I had plenty of time on the later two exams, I felt myself uncontrollably browsing ESPN, my email, and other websites in between solved questions. But on the first exam, I “marathon-ed” the questions, refusing to spend my time on such trivial tasks. I only took deliberate breaks. Those are the ones that I put on my schedule before taking the exam, to make sure I do not suffer from burnout. Even though I got As on all three exams, the feeling of fruitfulness I had while taking the first exam was vastly different than what I experienced during the other two. This is why I generally advocate for time constraints on work. It’s okay if they are self-imposed. What matters is being efficient and not using the “I have all the time in the world” excuse when you’re taking unnecessary breaks.
I particularly wonder about work habits in academia. What happens to those professors who tend to leave work early1 as compared to those who stay up past their students pulling all-nighters? I’d be interesting in collecting data and seeing if those who spend more time in their offices may actually be doing themselves a disservice. But the problem is that times can be wildly unpredictable. A professor could leave work at 5:00 PM one day, leave at 3:30 AM the next, and not show up to his or her office at all on the third day. And of course, people vary. Some can sustain a level of incredible focus for long periods of time, while others may require more frequent breaks. Finally, there may be certain deadlines that cause people to work longer.
But what about me? What can I do to ensure that I take advantage of intense focus? As I mentioned before, I am working at Bard College this summer. This past week was my first on the job, and I was in the lab (during weekdays) from 9:00 AM to 6:00 PM. The 6:00 PM departure is an excellent time; it allows me to comfortably work out in the weight room before the 8:00 PM closing time, and I can also eat dinner in the 7:00 to 9:00 PM range, which is when I start getting hungry. And my weekends look like they will be free, allowing me to pursue other hobbies such as programming and running (and blogging, of course).
The lazy days between the end of my sophomore year and the start of my research internship are past me. It’s time to set laziness aside and … focus!
By early, I arbitrarily mean before 6:00 PM. ↩
So it begins.
My research career will officially commence, on the day after the Williams College Class of 2012 commencement, on Monday, June 4. (That’s tomorrow.) I was accepted into the Bard College Mathematics & Computation Research Experience for Undergraduates, funded by the National Science Foundation. I am working with two Bard College professors on a project called “Using Machine Learning to Simplify Text.” My understanding of the project is that we are going to be exploring ways that computers can take a piece of text and simplify its wording and vocabulary to make it comprehensible to a wider audience of readers. Hopefully there will be some serious study of algorithms in this project — I learned so much from Algorithm Design & Data Analysis at Williams College. And all these Project Euler questions I’ve been doing have reinforced my interest in programming.
I’ll be working from June 4 to July 27. It’s a week short in total length from the summer program I was at last summer.
After the summer ends, I hope to recap on what I’ve done. Stay tuned!
UPDATE May 13, 2015: Migrated the code syntax to match Jekyll’s syntax.
Project Euler is an interesting website that offers about 400 different mathematics and computer programming questions. They range from easy (finding the sum of some set of big numbers) to impossible (navigating through Rudin-Shapiro sequences). Just recently, I solved the 179th question with the help of some Java code. While my program gave me the correct answer, the execution time on my Macbook Pro laptop was 80 seconds — and there is an informal “60 seconds” rule that implies that code should be able to solve a problem in fewer than 60 seconds. So I wanted to determine in what ways I could optimize my code.
Here was the question: Find the number of integers 1 < n < 10^7, for which n and n + 1 have the same number of positive divisors. For example, 14 has the positive divisors 1, 2, 7, 14 while 15 has 1, 3, 5, 15.
This wasn’t too bad for me. I already had a method that could compute the sum of the divisors of a number based on problem 23, so I revised it to add up the number of divisors, rather than the sum. Then I just iterated through each number from 1 to 10 million. Here was the first version of my code:
I’m not going to say what the answer was, but as mentioned before, the execution time (endTime – startTime) was about 80 seconds. Looking at the code, the limiting factor is the 10 million calls I make to the method numOfDivisors(). So how can I improve this? In other words, how can I avoid making all those calls to my static method here?
To start, I initialized an array of 10,000,001 elements, called divs, where divs[x] refers to the number of divisors of x. Then, I used two nested for loops to make sure that each entry of divs[x] did hold the number of divisors of x. The outer for loop went from int i = 1 to 10,000,000, and the inner for loop went as far from int j = 1 to as large a number such that i*j <= x. This implies that all divisors for a number are counted! For instance, if we had the number 2, which has the divisors of 1 and 2, then the entry divs should be incremented twice — which it is, because of i=1 and j=2 first, then i=2, j=1 second.
Here, I avoid all the testing of “is a number is a factor of another number?”, as I do in my old code, because if I consider i*j = n, then I know that n has at least those two factors!
The updated code is as follows:
It gave me the right answer. And the runtime was an amazingly quick 1.9 seconds — much, much better! I don’t claim full credit for this second code, as I read the discussion forum for that problem after I solved it the first time, but it’s still nice to know how to optimize a program.