Seita's PlaceThis is my blog, where I have written over 275 articles on a variety of topics, most of which are about one of two major themes. The first is computer science, which is my area of specialty as a Ph.D. student at UC Berkeley. The second can be broadly categorized as "deafness," which relates to my experience and knowledge of being deaf.
https://danieltakeshi.github.io/
Wed, 23 May 2018 00:30:54 -0700Wed, 23 May 2018 00:30:54 -0700Jekyll v3.7.3International Conference on Robotics and Automation (ICRA) 2018, Day 1 of 5<p>Due to my clever sleep schedule, I was able to wake up at 5:00am and feel
refreshed. <a href="https://danieltakeshi.github.io/2018/05/22/icra-day0/">As I complained in my previous blog post</a>, the hotel I was at
lacks a fitness center, forcing me to go outside and run since I cannot for the
life of me live a few days without doing <em>something</em> to improve my physical
fitness.</p>
<p>I killed some time by reviewing details of the conference, then ran outside once
the sun was rising. I ran on the bridge that crosses the river and was able to
reach the conference location. It is close to a park, which has a splendid
display of “BRISBANE” as shown here:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/sunrise.JPG" />
<i>
A nice view of Brisbane's sunrise.
</i>
</p>
<p>After running, I prepared for the conference. I walked over and saw this where
we were supposed to register:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/registration.JPG" />
<i>
The line for registration on Monday morning.
</i>
</p>
<p>This picture doesn’t do justice in showing ICRA’s popularity. There were a <em>lot</em>
of attendees here.</p>
<p>But first, I had to meet my sign language interpreters! A few comments:</p>
<ul>
<li>I decided to use sign language interpreting services rather than captioning. I
have no idea if this will be better but it’s honestly hard to think about how
it can be <em>worse</em> than the captioning from UAI 2017.</li>
<li>Despite how Australia and America are both English-speaking countries, the
sign language used there (“Auslan”) is <em>not</em> the same as the one used in
America.</li>
<li>Thankfully, Berkeley’s DSP found an international interpreting agency which
could try and find interpreters familiar with American signing. They hired one
who specialized in ASL and who has lived in both Australia and the US, making
him an ideal choice. The other interpreters specialized in different sign
languages or <a href="https://en.wikipedia.org/wiki/International_Sign">International Sign</a>.</li>
<li>There was a “Deaf Interpreter/Hearing Interpreter” team. Essentially, this
means having a deaf person who knows multiple sign languages (e.g., ASL and
Auslan). That deaf interpreter is the one who signs for me, but he/she
actually looks at a <em>hearing</em> interpreter who can interpret in one of the sign
languages that both of them know (e.g., Auslan) but which <em>I</em> don’t. Thus, the
translation would be: spoken English, to Auslan, to American signing. The
reason for this is obvious, since in Australia, most interpreters know Auslan,
but not ASL. I wouldn’t see this team until the second day of the conference,
but the experience turned out to be highly unwieldy and wasn’t beneficial, so
I asked to discontinue the service.</li>
<li>All of the interpreting services were obtained after a <em>four month process</em> of
Berkeley’s DSP searching for an international interpreting agency and then
booking them for this conference. Despite the long notice, some of the
schedule was <em>still</em> in flux and incomplete at the conference start date, so
it goes to show that even four months might not be enough to get things booked
perfectly. To be fair, it’s more like two or three months, since conference
schedules aren’t normally released until a month or so <em>after</em> paper decisions
come out. That’s one of my complaints about academic conferences, but I’ll
save my ranting for a future blog post.</li>
</ul>
<p>I met them and after customary introductions, it was soon 9:00am, when the first
set of tutorials and workshops began. It seemed to be structured much like UAI
2017, in that there are workshops and tutorials on the first and last days,
while the “main conference” lies in between. (For us, this meant Mondays and
Fridays are for the workshops/tutorials.)</p>
<p>There were several full-day and half-day sessions offered on a variety of
topics. I chose to attend the “Deep Learning for Robotics Perception” tutorial,
because it had “Deep Learning” in the title.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/tutorial_morning.JPG" />
<i>
The morning tutorial on deep learning for robotics perception.
</i>
</p>
<p>For this conference, I decided <em>not</em> to bother and take detailed notes of every
talk. I did that for UAI 2017, which turned out to be of no help whatsoever as I
never once looked at my Google Doc notes after the conference ended. Instead, my
strategy now is to take pictures of any interesting slides, and then scan my
photos after the conference to see if there’s anything worth following-up.</p>
<p>The Deep Learning tutorial was largely on computer vision techniques that we
might use for robotics. Much of the first half was basic knowledge to me. In the
second half, it was a pleasure to see them mention the AUTOLAB’s work on
<a href="https://berkeleyautomation.github.io/gqcnn/">Grasp-Quality Convolutional Neural Networks</a>.</p>
<p>The interpreters had a massively challenging task with this tutorial. The one
who knew ASL well was fine, but another one — who mentioned ASL was her
fourth-best sign language — had to quit after just a few seconds (she
apologized profusely) and be replaced. The third, who was also somewhat rusty
with ASL, lasted his full time set, though admittedly his signing was awkward.</p>
<p>Fortunately, the one who had to quit early was able to recover and for her next
20-minute set, she was able to complete it, albeit with some unusual signs that
I could tell were international or Australian. Even with the American
interpreter, though, it was still <em>tremendously challenging</em> for me to even
follow the tutorial, so I felt a bit frustrated.</p>
<p>After some lunch, we had the afternoon tutorials in a similar format. I attended
the tutorial on visual servoing, featuring four 45-minute talks. The third was
from UC Berkeley Professor <a href="https://people.eecs.berkeley.edu/~pabbeel/">Pieter Abbeel</a>, who before beginning the talk
found me in the crowd<sup id="fnref:notchallenging"><a href="#fn:notchallenging" class="footnote">1</a></sup> and congratulated me for <a href="http://bair.berkeley.edu/blog/">making the BAIR
blog a success</a>.</p>
<p>You can imagine what his actual talk must have looked like: a packed, full room
of amazed attendees trying to absorb as much of Pieter’s rapid-fire presentation
as possible. I felt sorry for the person who had to present after Pieter, since
about 80% of people the room left after Pieter finished his talk.</p>
<p>The sign language situation in the afternoon tutorials wasn’t much better than
that of the morning tutorials, unfortunately. The presentation I understood the
most in the afternoon was, surprise surprise, Pieter’s, but that’s because I had
<em>already</em> read almost all of the corresponding research papers.</p>
<p>Later in the evening, we all gathered in the Great Hall, the largest room in the
exhibition, for some opening remarks from the conference organizer. Before that,
we had one of the more interesting conference events: a performance by
indigenous Australians. To put a long story short, in Brisbane it’s common
(according to my sign language interpreter) to begin large events by allowing
indigenous Australians to perform some demonstration. This is a sign of respect
for how these people inhabited Australia for many thousands of years.</p>
<p>For the show, several shirtless men with paint on them played music and danced.
They rubbed wood with some other device and created some smoke and fire. Perhaps
they got this cleared through the building’s security? I hope so. I took a photo
of their performance, which you see below. Unfortunately it’s not the one with
the smoke and fire.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/evening_natives.JPG" />
<i>
Native Australians giving us a show.
</i>
</p>
<p>I don’t know if anyone else felt this way, but does it feel awkward seeing
“natives” (whether Australian or American) wearing almost nothing while
“privileged Asians and Whites” like me sit in the audience wearing business
attire with our mandatory iPhones and Macbook Pro laptops in hand? Please don’t
get me wrong: I fully respect and applaud the conference organizers and the city
of Brisbane as a whole for encouraging this type of respect; I just wonder if
there are perhaps better ways to do this. It’s an open question, I think, and
no, <em>ignoring</em> history in America-style fashion is not a desirable alternative.</p>
<p>After the natives gave their show, to which they received rousing applause, the
conference chair (Professor Peter Corke) provided some welcoming remarks and
then conference statistics such as the ones shown in the following photo:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/evening_welcome.JPG" />
<i>
Some statistics from conference chair Peter Corke.
</i>
</p>
<p>There are <em>lots</em> of papers at ICRA! Here are a few relevant statistics from this
and other slides (not shown in this blog post):</p>
<ul>
<li>The acceptance rate was 40%, resulting in <em>1052 papers accepted</em>. That’s … a
lot! Remember, at least one author of each paper is supposed to attend the
conference, but in reality several often attend, along with industry sponsors
and so forth, so the number of attendees is surely much higher than 1052 even
when accounting for how several researchers can first-author multiple ICRA
papers.</li>
<li>Papers with authors from Venezuela had a 100% paper acceptance rate. I guess
it’s good to find the positives in Venezuela, given the country’s recent
free-fall, which <a href="https://www.wsj.com/articles/venezuelas-sham-election-1526841249">won’t be mitigated by their sham election</a>.</li>
<li>The 2018 edition of ICRA broke the record of the number of paper submissions
with 2586. The previous high was from 2016, which had around 2350 paper
submissions.</li>
<li>The United States had the highest number of papers submitted by country,
besting the next set of countries which were China, Germany, and France. When
you scale it by a <em>country’s population</em>, Singapore comes first (obviously!),
followed by Switzerland, Australia (woo hoo, home team!!), Denmark, and
Sweden. It’s unclear what these statistics tracked if authors of papers were
based in different countries.</li>
</ul>
<p>After this, we all went over to the welcome reception.</p>
<p>The first thing I noticed: <em>wow</em>, this is going to be noisy and crowded. At
least I would have a sign language interpreter who would tag along with me,
which despite being awkward from a social perspective is probably the best I can
hope for.</p>
<p>The second thing I noticed: <em>wow</em>, there are <em>lots</em> of booths that provide wine
and beer. Here’s one of many:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/evening_drinks.JPG" />
<i>
One of many drinking booths in the welcome reception on Monday night.
</i>
</p>
<p>To satisfy our need for food, several convention employees would walk around
with some finger food in their large plates. I learned from one of the sign
language interpreters who was tagging along with me that one of the food
samplings offered was a <em>kangaroo</em> dish. Apparently, kangaroo is a popular meat
item in Australia.</p>
<p>It is also quite tasty.</p>
<p>There were a large number of booths for ICRA sponsors, various robotics
competitions, or other demonstrations. For instance, here’s one of the many
robotics demonstrations, this time for “field robotics,” I suppose:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/evening_dirt.JPG" />
<i>
One of many robotics booths set up in the welcome reception.
</i>
</p>
<p>And it wouldn’t be a (well-funded) Australian conference if we didn’t get to pet
some animals. There were snakes and wombats (see below image) for us to touch:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/evening_wombat.JPG" />
<i>
We could pet a wombat in the welcome reception (plus snakes and other animals).
</i>
</p>
<p>I’ll tell you this: ICRA does <em>not</em> skimp on putting on a show. There was a
<em>lot</em> to process, and unusually for me, I fell asleep extremely quickly once I
got back to my hotel room.</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:notchallenging">
<p>Not that it was a challenging task, since I was sitting in
the front row and he’s seen the sign language interpreters at Berkeley
<em>many</em> times. <a href="#fnref:notchallenging" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Wed, 23 May 2018 03:00:00 -0700
https://danieltakeshi.github.io/2018/05/23/icra-day1/
https://danieltakeshi.github.io/2018/05/23/icra-day1/Prelude to ICRA 2018, Day 0 of 5<p>On Friday, May 18, I bade farewell to my apartment and my work office to go to
San Francisco International Airport (SFO). Why? I was en route to <a href="https://icra2018.org/">ICRA
2018</a>, the premier conference on robotics and — I believe — its largest
in terms of number of papers, conference attendees, and the sheer content
offered in the form of various tutorials, workshops, and sponsor events.</p>
<p>For travel, I booked an Air Canada round trip from San Francisco to Vancouver to
Brisbane. Yes, I had to go <em>north</em> and then go south …. there unfortunately
weren’t any direct flights from San Francisco to Brisbane during my time frame
(San Francisco to Sydney is a more popular route). But I didn’t mind, as I
could finally stop using United Airlines.</p>
<p>As usual, I got to the airport early, and then hiked over to the International
Terminal. At SFO, I’m most familiar with Terminal 3 (United) and the
International Terminal (for international travel) and for the latter, my
favorite place to eat is Napa Farms Market, which embodies the essence of San
Francisco cuisine. I had some excellent pork which was cut in-house, and
cauliflower rice (yeah, see what I said about SF?).</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/napa_farms.JPG" />
<i>
The SFO Napa Farms Market.
</i>
</p>
<p>Incidentally, for Terminal 3 dining, I highly recommend <em>Yankee Pier</em> and their
fish dishes.</p>
<p>My original plan was to pass security at around 2:00pm (which I did), then get a
nice lunch and relax at the gate before my scheduled 4:20pm departure time.
Unfortunately, while I was eating my Napa Farms Market dish, the waterfall of
delays would begin. After three separate delays, I soon learned that my flight
to Vancouver wouldn’t depart until after 7:10pm. At least, assuming there
weren’t any <em>more</em> delays after that.</p>
<p>Ouch, apparently United isn’t the only airline that’s struggling to keep things
on time. Maybe it’s a San Francisco issue; is there too much traffic? Or could
it be due to the airport’s awkward location? It’s a bit oddly situated in the
bay; it borders the <em>inside</em> of the bay, rather than the great Pacific Ocean.</p>
<p>The good news was twofold, though:</p>
<ul>
<li>I had a spare United Club pass that I could use to enter a club. Fortunately,
it turns out that <em>even if you are not flying United</em>, you can still get
access to the clubs with a same-day boarding pass and a club pass. I don’t
know if it helped that Air Canada is part of Star Alliance, but I’m guessing
this would be OK even if I was flying Air Whatchamacallit.</li>
<li>I had arranged for a <em>five hour layover</em> at Vancouver. My flight to Brisbane
was scheduled to leave past 11:00pm.</li>
</ul>
<p>This is why I aim for layovers of 5+ hours. It gives me an enormous buffer zone
in case of any delays (and at SFO, I expect them) and I really don’t want to be
late for academic conferences. Furthermore, the delays let me relax at airports
for a longer time (assuming you have access to lounges!) which means I can
string together a longer time block reading papers, reading books, and blogging,
all while “fine dining” from a graduate student’s perspective.</p>
<p>I went to the United Club at Terminal 3 since that’s the largest one, and the
lounge I’m most familiar with due to all my domestic United travels. I found a
nice place to work (more accurately, read a research paper) and enjoyed some of
the free food. I had a cappuccino, some cheese, crackers, fruit, and then
enjoyed the standard complimentary house wine.</p>
<p>The good news was that the flight wasn’t delayed too much longer, but I’m not
sure how much a 3+ delay reflects on Air Canada. Hopefully it’s a rare event. I
boarded the flight and was soon in Vancouver.</p>
<p>The flight itself was non-eventful and I don’t remember what I did. I entered
Vancouver, and saw some impressive artwork when I arrived.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/canada_1.JPG" />
<i>
Vancouver artwork.
</i>
</p>
<p>I couldn’t waste too much time, though. I had to pass through immigration.
Remember, United States =/= Canada. Just be warned: if you are <em>arriving</em> at a
country from a different country, <em>even if it is en route to yet another
country</em>, you still have to go through immigration and customs <em>and then</em> the
whole security pipeline after that. Fortunately, despite how the line looks long
in the photo here, I got through it quickly. Note also the sign language
interpreter video.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/canada_3.JPG" />
<i>
Immigration at Vancouver.
</i>
</p>
<p>I quickly hurried over to my gate and managed to get some decent snacks at one
of the few places that was open at 10:30pm. The good news is that I informed my
credit card company that I was traveling in Canada and Australia, so I could use
my Credit Card. Pro tip for anyone who is traveling! You don’t want your card to
be declined in the most awkward of moments.</p>
<p>And then … I boarded the <em>fourteen hour and a half</em> flight to Brisbane. Ouch!
A few things about it:</p>
<ul>
<li>There was free wine, so apparently the Air Canada policy is the same as United
for long-haul international flights. I wonder, though, why they offered the
wine an hour after we departed (which was past midnight in Vancouver and SFO
time) when it seemed like most passengers wanted to sleep. I thought they
would have held off providing wine until perhaps the 10-hour mark of the
flight, but I guess not. I drank some red wine and it was OK to me, though
another conference attendee (who I met once I arrived in Brisbane) told me she
hated it.</li>
<li>The flight offered <em>three</em> meals (!!), whereas I thought only one (or at most
two) would be provided. That’s nice of them. I had requested a “fruit plate”
dish upon paying for the tickets a few months ago, because I do not trust meat
on airline food. To my surprise, the airline respected the request and gave me
two fruit plates. I would have had a third, except I declined the last one
since I decided to try out the egg dish for the third one, but really, thank
you Air Canada for honoring my food request! I’ll remember this for the
future.</li>
<li>The guy who was in my set of three seats (sitting by the window) had an Nvidia
backpack, so I asked him about that, and it turned out he’s another conference
attendee. I also saw a few Berkeley students who I recognized on the plane.
I’m also pretty sure that other conference attendees could tell I was going
because I was awkwardly dragging a bulky poster tube.</li>
<li>I sat in the aisle seat. This is what you <em>want</em> for a 14.5-hour flight,
because it makes it so much easier to leave the seat and walk around, which I
had to do several times. <em>Always get aisle seats for long flights!!</em>
Amazingly, as far as I could tell, the Nvidia guy sitting at the window <em>never
left his seat for the entire flight</em>. I don’t think my body could handle that.</li>
<li>The flight offered the usual fare of electronic entertainment (movies, card
games, etc.) but I mostly slept (itself an accomplishment for me!) and read
<em>Enlightenment Now</em> — more on the book later!</li>
</ul>
<p>I arrived in Brisbane, then went through immigration <em>again</em>. But before that, I
passed through the following duty-free store which was rather judiciously placed
so that passengers had to pass it before getting to immigration and customs:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/brisbane_wine.JPG" />
<i>
The duty-free area we passed through upon arriving to Brisbane.
</i>
</p>
<p>There is a LOT of alcohol here! I enjoy some drinks now and then, but I wonder
if society worldwide is putting way too much emphasis on booze.</p>
<p>Going through immigration was no problem at all. Getting my taxi driver to go to
my hotel was a different matter. We had a testy conversation when I asked him to
drive me to my hotel. I kept asking him to repeat and repeat his question since
I couldn’t understand what he was saying, but then I explicitly showed him the
address which I had printed on a paper beforehand, and then he was fine. Yeah,
lesson learned — just give them a written address and they’ll be fine.
Fortunately, I made it to the hotel at 9:00am, but check-in wasn’t until 2:00pm
(as expected), so I left my luggage there and decided to explore the surrounding
Brisbane area.</p>
<p>ICRA this year is located at <a href="https://www.bcec.com.au/">Brisbane Convention Centre</a> which is in the
<a href="https://en.wikipedia.org/wiki/South_Bank_Parklands">South Bank Parklands</a>. I could tell that there’s a <em>LOT</em> to do. One of John
Canny’s students told me a few days ago that “there isn’t much to do in
Brisbane” so I had to send a friendly rebuke to him via text message.</p>
<p>I wandered around before finding a place to eat breakfast (in Australian time)
which was more accurately lunch for me. I had some spinach, tomatoes, poached
eggs, zucchini bread, and chai tea:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/lunch_brisbane.JPG" />
<i>
My lunch in the South Bank, Brisbane.
</i>
</p>
<p>I’m not much of a poached egg person but it was great! And adding milk to the
standard hot water and tea seems like an interesting combination that I’ll try
out more frequently in the future.</p>
<p>I took a few more pictures of the South Bank. Here’s one which shows an
<em>artificial beach</em> within the park on the left, with the real river on the
right.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/artificial_beach.JPG" />
<i>
The artificial beach (left) and South Bank river (right).
</i>
</p>
<p>Here’s a better view of the “beach”:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/artificial_beach_2.JPG" />
<i>
A view of the "beach."
</i>
</p>
<p>There’s tons of great stuff here: playgrounds, more restaurants, more extensive
beaches, some architectural and artistic pieces, and so forth. I can’t take
pictures of everything, unfortunately, so please check out <a href="https://www.visitbrisbane.com.au/south-bank?sc_lang=en-au">the South Bank
website</a> which should make you want to start booking some flights.</p>
<p>Just make sure you schedule your mandatory five-hour layover if SFO is part of
your itinerary.</p>
<p>I wandered around a bit and found a nice place to relax:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/view3.JPG" />
<i>
One of many places to relax in South Bank.
</i>
</p>
<p>I sat on a bench and read, <em>Enlightenment Now: The Case for Reason, Science,
Humanism, and Progress</em>. <a href="https://www.gatesnotes.com/Books/Enlightenment-Now">This is Bill Gates’ favorite book</a>, and is close to
being one of <em>my</em> favorites as well. I’ll blog about it later, but the main
theme is that, despite what we may think of from the media and what politicians
say, life <em>has</em> continued to be much better for humanity as a whole, and there
is never any better time to live in the world than today. I think my trip to
Brisbane epitomizes this:</p>
<ul>
<li>I can board a 14.5 hour flight to go from Canada to Vancouver and trust that
the safety record will result in a safe landing.</li>
<li>I can study robotics and AI, instead of engaging in backbreaking agriculture
labor.</li>
<li>I have the freedom to walk around safely and read books as I please, rather
than constantly worry about warfare or repercussions for reading
non-government sanctioned books.</li>
<li>I have the ability to easily check in a hotel well in advance, to know that I
will have a roof over my head.</li>
<li>I can blog and share the news about this to friends and family around the
world.</li>
</ul>
<p>Pinker doesn’t ignore the obvious hardships that many people face nowadays, but
he makes a strong case that we are not focusing enough on the positive trends
(e.g., decline in worldwide extreme poverty) and, <em>more importantly</em>, what we
can <em>learn</em> from them so that they <em>continue</em> rather than slide back.</p>
<p>I devoured <em>Enlightenment Now</em> for about an hour or two and took a break —
it’s a 500-page book filled with dense endnotes — and toured more of the South
Bank. Overall, the place is extremely impressive and great for tourists of all
shapes and sizes. Here are some (undoubtedly imprecise and biased) tradeoffs
between this and <a href="https://danieltakeshi.github.io/2017/08/14/uai-2017-day-three-of-five/">the Darling Harbor area that I went to for UAI 2017</a>:</p>
<ul>
<li><strong>Darling Harbor advantages</strong>: more high-end restaurants, better cruises,
feels cleaner and wealthier</li>
<li><strong>South Bank advantages</strong>: a larger variety of events (many family-friendly),
perhaps cheaper food, better running routes</li>
</ul>
<p>The bottom line is that both areas are great and if I had to pick any place to
visit, it would probably be the one that I haven’t been to the longest.</p>
<p>I then went back to my hotel at around 3pm, desperate to relax and shower. The
hotel I was in, <em>Ibis Brisbane</em>, is one of the cheaper ones here, and it shows
in what it provides. The WiFi is sketchy, the electrical outlets are located in
awkward configurations, there is no moisturizing cream, only two large towels
are offered, and there is no fitness center (really?!?!?!?!?!?!?!?!?).</p>
<p>It’s not as good as the hotel I stayed at in Sydney, but at least it’s a
functioning hotel and I can stay here for six nights without issues.</p>
<p>I showered and went out to explore to get some sources of nutrition. I found a
burger place and was going to eat quickly and head back to the hotel … when,
as luck would have it, <em>six other Berkeley students decided to come to the place
as soon as I was about to leave</em>. They generously allowed me to join their
group. It’s a good thing I was wearing a “Berkeley Computer Science” jacket!</p>
<p>I followed them to their hotel, which was heads and shoulders better than mine.
Their place was a full studio with a balcony, and the view was great:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/ICRA/night_view.JPG" />
<i>
The view of the South Bank at night from high up.
</i>
</p>
<p>I stayed for a while, then walked back to my hotel. I slept early, reaping the
benefits of judiciously drinking coffee and timing meals so that I can wake up
early <em>and</em> feel refreshed for tomorrow.</p>
Tue, 22 May 2018 03:00:00 -0700
https://danieltakeshi.github.io/2018/05/22/icra-day0/
https://danieltakeshi.github.io/2018/05/22/icra-day0/Practicing ROS Programming<p>I am currently engaging in a self-directed, badly-needed crash course on <em>ROS
Programming</em>. ROS (Robot Operating System) is commonly used for robotics
programming and research, and is robot-agnostic so knowledge of ROS should
generalize to different robot types. Yet even after <a href="https://danieltakeshi.github.io/about.html">publishing a robotics
paper</a>, I still didn’t feel like I understood how my ROS code was working
under the hood since other students had done much of the lower-level stuff
earlier. This blog post thus summarizes what I did to try and absorb ROS as fast
as possible.</p>
<p>To start learning about ROS, it’s a good idea (indeed, perhaps mandatory) to
take a look at the excellent <a href="http://wiki.ros.org/">ROS Wiki</a>. ROS is summarized as:</p>
<blockquote>
<p>ROS (Robot Operating System) provides libraries and tools to help software
developers create robot applications. It provides hardware abstraction, device
drivers, libraries, visualizers, message-passing, package management, and
more. ROS is licensed under an open source, BSD license.</p>
</blockquote>
<p>The ROS wiki is impressively rich and detailed. If you scroll down and click
“Tutorials”, you will see (as of this writing) <em>twenty</em> for beginners, and
<em>eight</em> for more advanced users. In addition, the Wiki offers a cornucopia of
articles related to ROS libraries, guidelines, and so on.</p>
<p>It’s impossible to read all of this at once, so <em>don’t</em>! Stick with the beginner
tutorials for now, and try and remember as much as you can. I recorded my notes
in <a href="https://github.com/DanielTakeshi/Self_Study_Courses/tree/master/Robots_and_Robotic_Manip">my GitHub repository for my “self-studying” here</a>. (Incidentally, that
repository is something I’m planning to <em>greatly</em> expand this summer with
robotics and machine learning concepts.)</p>
<p>As always, it is faster to learn by <em>doing</em> and reading, rather than reading
alone, so it is critical to run the code in the ROS tutorials. Unfortunately,
the code they use involves manipulating a “Turtle Sim” robot. This is perhaps my
biggest disappointment with the tutorials: the turtle is artificial and hard to
relate to real robots. Of course, this is somewhat unavoidable if the Wiki (and
ROS as a whole) wants to avoid providing favoritism to certain robots, so
perhaps it’s not fair criticism, but I thought I’d bring it up anyway.</p>
<p>To alleviate the disconnect between a turtle and what I view as a “real robot,”
it is critical to start running this on a real robot. But since real robots run
on the order of <a href="https://danieltakeshi.github.io/2017/07/29/before-robots-can-take-over-the-world-we-have-to-deal-with-calibration/">many thousands of dollars</a> and exhibit all the vagaries of
what you would expect from complex, physical systems (wear and tear, battery
drainage, breakdowns, etc.), I <em>highly recommend</em> that you start by using a
simulator.</p>
<p><a href="http://autolab.berkeley.edu/">In the AUTOLAB</a>, I have access to a Fetch and a Toyota HSR, both of which
provide a built-in simulator using <em><a href="http://gazebosim.org/">Gazebo</a></em>. This simulator is designed to
create a testing environment where I can move and adjust the robot in a variety
of ways, without having to deal with physical robots. The advantage of investing
time in testing the simulator is that the code one uses for that should
<em>directly</em> translate to the real, physical robot without any changes, apart from
adjusting the <code class="highlighter-rouge">ROS_MASTER_URI</code> environment variable.</p>
<p>Details on the simulator should be provided in the manuals that you get for the
robots. Once the simulator is installed (usually via <code class="highlighter-rouge">sudo apt-get install</code>)
and working, the next step is to figure out how to code. One way to do this is
to borrow someone’s existing code base and tweak it as desired.</p>
<p>For the Fetch, my favorite code base is the <a href="https://github.com/cse481wi18/cse481wi18">one used in the University of
Washington’s robotics course</a>. It is a highly readable, modular code base
which provides a full-blown Python Fetch API with much of the stuff I need: arm
movement, base movement, head movement, etc. On top of that, there’s a whole set
of GitHub wiki pages which provides high-level descriptions of how ROS and other
things work. When I was reading these — which was <em>after</em> I had done a bit of
ROS programming — I was smiling and nodding frequently, as the tutorials had
confirmed some of what I had assumed was happening.</p>
<p>The primary author of the code base and Wiki is Justin Huang, a PhD student with
Professor Maya Cakmak. Justin, you are awesome!</p>
<p>I ended up taking bits and pieces from Justin’s code, and added a script for
dealing with camera images. <a href="https://github.com/DanielTakeshi/ros-simple-examples">My GitHub code repository here</a> contains the
resulting code, and this is the main thing I used to learn ROS programming. I
documented my progress in various README files in that repository, so if you’re
just getting started with ROS, you might find it helpful.</p>
<p>Playing around with the Gazebo simulator, I was able to move the Fetch torso to
its highest position and then assign the joint angles so that it’s gripper
actually <em>coincides</em> with the base. Oops, heh, I suppose that’s a flaw with the
simulator?</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/robo_manip/fetch_gazebo.png" />
<br />
<i>
The amusing result when you command the Fetch's arm to point directly downards.
The gripper "cuts" through the base, which can't happen on the physical robot.
</i>
</p>
<p>Weird physics notwithstanding, the Gazebo simulator has been a <em>lifesaver</em> for
me in understanding ROS, since I can now <em>see</em> the outcome on a simulated
version of the real robot. I hope to continue making progress in learning ROS
this summer, and to use other tools (such as rviz and MoveIt) that could help
accelerate my understanding.</p>
<p>I’m currently en route to the <a href="https://icra2018.org/">International Conference on Robotics and
Automation (ICRA) 2018</a>, which should provide me with another environment
for massive learning on anything robotics. If you’re going to ICRA and would
like to chat, <a href="https://danieltakeshi.github.io/about.html">please drop me a line</a>.</p>
Fri, 18 May 2018 16:00:00 -0700
https://danieltakeshi.github.io/2018/05/18/practicing-ros-programming/
https://danieltakeshi.github.io/2018/05/18/practicing-ros-programming/Interpretable and Pedagogical Examples<p><a href="https://danieltakeshi.github.io/2018/04/29/algorithmic-teaching/">In my last post</a>, I discussed a paper on algorithmic teaching. I mentioned
in the last paragraph that there was a related paper, <em><a href="https://arxiv.org/abs/1711.00694">Interpretable and
Pedagogical Examples</a></em>, that I’d be interested in reading in detail. I was
able to do that sooner than expected, so naturally, I decided to blog about it.
A few months ago, <a href="https://blog.openai.com/interpretable-machine-learning-through-teaching/">OpenAI had a blog post discussing the contribution and
ramifications of the paper</a>, so I’m hoping to focus more on stuff they didn’t
cover to act as a complement.</p>
<p>This paper is currently “only” on arXiv as it <a href="https://openreview.net/forum?id=H1wt9x-RW">was rejected from ICLR 2018</a>
— not due to lack of merit, it seems, but because the authors had their names
on the manuscript, violating the double-blind nature of ICLR. I find it quite
novel, though, and hope it finds a home somewhere in a conference.</p>
<p>There are several contributions of this over prior work in machine teaching and
the like. First, they use deep recurrent neural networks for both the student
and the teacher. Second and more importantly, they show that with <em>iterative</em>
— not <em>joint</em> — training, the teacher will teach using an <strong>interpretable</strong>
strategy that matches human intuition, and which furthermore is efficient in
conveying concepts with the fewest possible samples (hence, “pedagogical”). This
paper focus on <em>teaching by example</em>, but there are other ways to teach, such as
using pairwise comparisons as <a href="https://blog.openai.com/deep-reinforcement-learning-from-human-preferences/">in this other OpenAI paper</a>.</p>
<p>How does this work? We consider a two-agent environment with a student
<script type="math/tex">\mathbf{S}</script> and a teacher <script type="math/tex">\mathbf{T}</script>, both of which are parameterized by
deep recurrent neural networks <script type="math/tex">\theta_{\mathbf{S}}</script> and
<script type="math/tex">\theta_{\mathbf{T}}</script>, respectively. The setting also involves a set of
<em>concepts</em> <script type="math/tex">\mathcal{C}</script> (e.g., different animals) and <em>examples</em>
<script type="math/tex">\mathcal{E}</script> (e.g., images of those animals).</p>
<p>The student needs to map a series of <script type="math/tex">K</script> examples to concepts. At each time
step <script type="math/tex">t</script>, it guesses the concept <script type="math/tex">\hat{c}</script> that the teacher is trying to
convey. The teacher, at each time step, takes in <script type="math/tex">\hat{c}</script> along with the
concept it is trying to convey, and must output an example that (ideally) will
make <script type="math/tex">\hat{c}</script> “closer” to <script type="math/tex">c</script>. Examples may be continuous or discrete.</p>
<p>As usual, to train <script type="math/tex">\mathbf{S}</script> and <script type="math/tex">\mathbf{T}</script>, it is necessary to devise
an appropriate <em>loss function</em> <script type="math/tex">\mathcal{L}</script>. In this paper, the authors
chose to have <script type="math/tex">\mathcal{L}</script> be a function from <script type="math/tex">\mathcal{C}\times \mathcal{C}
\to \mathbb{R}</script> where the input is the true concept and the student’s concept
after the <script type="math/tex">K</script> examples. This is applied to <em>both</em> the student and teacher;
they use the <em>same</em> loss function and are updated via gradient descent.
Intuitively, this makes sense: both the student and teacher want the student to
know the teacher’s concept. The loss is usually the <script type="math/tex">L_2</script> (continuous) or the
cross-entropy (discrete).</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/quals/interpretable_examples.png" />
<br />
<i>
A collection of important aspects from the paper "Interpretable and Pedagogical
Examples." Top left: a visualization of the training process. Bottom left: joint
training baseline which should train the student but not create interpretable
teaching strategies. Right: iterative training procedure which should create
interpretable teaching strategies.
</i>
</p>
<p>The figure above includes a visualization of the training process. It also
includes both the joint and iterative training procedures. The student’s
function is written as <script type="math/tex">\mathbf{S}(e_k | \theta_{\mathbf{S}})</script>, and this is
what is used to produce the next concept. The authors don’t explicitly pass in
the previous examples or the student’s previously predicted concepts (the latter
of which would make this an “autoregressive” model) because, presumably, the
recurrence means the hidden layers implicitly encode the essence of this prior
information. A similar thing is seen with how one writes the teacher’s function:
<script type="math/tex">\mathbf{T}(c_i, \hat{c}_{i,k-1} | \theta_{\mathbf{T}})</script>.</p>
<p>The authors argue that joint training means the teacher and student will
“collude” and produce un-interpretable teaching, while iterative training lets
them obtain interpretable teaching strategies. Why? They claim:</p>
<blockquote>
<p>The intuition behind separating the optimization into two steps is that if
<script type="math/tex">\mathbf{S}</script> learns an interpretable learning strategy in Step 1, then
<script type="math/tex">\mathbf{T}</script> will be forced to learn an interpretable teaching strategy in
Step 2. The reason we expect <script type="math/tex">\mathbf{S}</script> to learn an “interpretable”
strategy in Step 1 is that it allows <script type="math/tex">\mathbf{S}</script> to learn a strategy that
exploits the natural mapping between concepts and examples.</p>
</blockquote>
<p>I think the above reason boils down to the fact that the teacher “knows” the
true concepts <script type="math/tex">c_1,\ldots,c_n</script> in the minibatch of concepts above, and those
are fixed throughout the student’s training portion. Of course, this would
certainly be easier to understand after implementing it in code!</p>
<p>The experimental results are impressive and cover a wide range of scenarios:</p>
<ul>
<li>
<p><strong>Rule-Based</strong>: this is the “rectangle game” from cognitive science, where
teachers provide points within a rectangle, and the student must guess the
boundary. The intuitive teaching strategy would be to provide two points at
opposite corners.</p>
</li>
<li>
<p><strong>Probabilistic</strong>: the teacher must teach a bimodal mixture of Gaussians
distribution, and the intuitive strategy is to provide points at the two
modes (I assume, based on the relative weights of the two Gaussians).</p>
</li>
<li>
<p><strong>Boolean</strong>: how does the teacher teach an object property, when objects may
have multiple properties? The intuitive strategy is to provide two points
where, of all the properties in the provided/original dataset, the only one
that the two have in common is what the teacher is teaching.</p>
</li>
<li>
<p><strong>Hierarchical</strong>: how does a teacher teach a hierarchy of concepts? The
teacher learns the intuitive strategy of picking two examples whose lowest
common ancestor is the concept node. Here, the authors use images from a
“subtree” of ImageNet and use a pre-trained Res-Net to cut the size of all
images to be vectors in <script type="math/tex">\mathbb{R}^{2048}</script>.</p>
</li>
</ul>
<p>For the first three above, the loss is <script type="math/tex">\mathcal{L}(c,\hat{c}) =
\|c-\hat{c}\|_2^2</script>, whereas the fourth problem setting uses the cross entropy.</p>
<p>There is also evaluation that involves human subjects, which is the second
definition of “interpretability” the authors invoke: <em>how effective is
<script type="math/tex">\mathbf{T}</script>’s strategy at teaching humans</em>? They do this using the
probabilistic and rule-based experiments.</p>
<p>Overall, this paper is enjoyable to read, and the criticism that I have is
likely beyond the scope that any one paper can cover. One possible exception:
understanding the neural network architecture and training. The architecture,
for instance, is not specified <em>anywhere</em>. Furthermore, some of the training
seemed excessively hand-tuned. For example, the authors tend to train using
<script type="math/tex">X</script> examples for <script type="math/tex">K</script> iterations but I wonder if these needed to be tuned.</p>
<p>I think I would like to try implementing this algorithm (using PyTorch to
boot!), since it’s been a while since I’ve seriously tried replicating a prior
result.</p>
Mon, 30 Apr 2018 16:00:00 -0700
https://danieltakeshi.github.io/2018/04/30/i-and-p-examples/
https://danieltakeshi.github.io/2018/04/30/i-and-p-examples/Algorithmic and Human Teaching of Sequential Decision Tasks<p>I spent much of the last few months preparing for the UC Berkeley EECS PhD
qualifying exams, as you might have been able to tell by the style of my recent
blogging (mostly paper notes) and my lack of blogging for the last few weeks.
The good news is that I passed the qualifying exam. <a href="https://danieltakeshi.github.io/2015-09-01-my-prelims/">Like I did for my
prelims</a>, I wrote a “transcript” of the event. I will make it public in a
future date. In this post, I discuss an interesting paper that I skimmed for my
quals but didn’t have time to read in detail until after the fact: <em><a href="https://dl.acm.org/citation.cfm?id=2900946">Algorithmic
and Human Teaching of Sequential Decision Tasks</a></em>, a 2012 AAAI paper by Maya
Cakmak and Manuel Lopes.</p>
<p>This paper is interesting because it offers a different perspective on how to do
imitation learning. Normally, in imitation learning, there is a fixed set of
expert demonstrations <script type="math/tex">D_{\rm expert} = \{\tau_1, \ldots, \tau_K \}</script> where
each demonstration <script type="math/tex">\tau_i = (s_0,a_0,s_1\ldots,a_{N-1},s_N)</script> is a sequence of
states and actions. Then, a learner has to run some algorithm (classically,
either behavior cloning or inverse reinforcement learning) to train a policy
<script type="math/tex">\pi</script> that, when executed in the same environment, is as good as the expert.</p>
<p>In many cases, however, it makes sense that the teacher can select the <em>most
informative</em> demonstrations for the student to learn a task. This paper thus falls
under the realm of <em>Active Teaching</em>. This is not to be confused with <em>Active
Learning</em>, as they clarify here:</p>
<blockquote>
<p>A closely related area for the work presented in this paper is Active Learning
(AL) (Angluin 1988; Settles 2010). The goal of AL, like in AT, is to reduce
the number of demonstrations needed to train an agent. AL gives the learner
control of what examples it is going to learn from, thereby steering the
teacher’s input towards useful examples. In many cases, a teacher that chooses
examples optimally will teach a concept significantly faster than an active
learner choosing its own examples (Goldman and Kearns 1995).</p>
</blockquote>
<p>This paper sets up the student to internally run inverse reinforcement learning
(IRL), and follows prior work in assuming that the value function can be written
as:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
V^{\pi}(s) \;&{\overset{(i)}=}\; \mathbb{E}_{\pi, s}\left[ \sum_{t=0}^\infty \gamma^t R(s_t)\right] \\
&{\overset{(ii)}=}\; \mathbb{E}_{\pi, s}\left[ \sum_{t=0}^\infty \gamma^t \sum_{i=1}^k w_i f_i(s_t) \right] \\
&{\overset{(iii)}=}\; \sum_{i=1}^k w_i \cdot \mathbb{E}_{\pi, s}\left[ \sum_{t=0}^\infty \gamma^t f_i(s_t)\right] \\
&{\overset{(iv)}=}\; \bar{w}^T \bar{\mu}_{\pi,s}
\end{align*} %]]></script>
<p>where in (i) I applied the definition of a value function when following policy
<script type="math/tex">\pi</script> (for notational simplicity, when I write a state under the expectation,
like <script type="math/tex">\mathbb{E}_s</script>, that means the expectation assumes we start at state
<script type="math/tex">s</script>), in (ii) I substituted the reward function by assuming it is a linear
combination of <script type="math/tex">k</script> features, in (iii) I re-arranged, and finally in (iv) I
simplified in vector form using new notation.</p>
<p>We can augment the <script type="math/tex">\bar{\mu}</script> notation to also have the initial <em>action</em> that
was chosen, as in <script type="math/tex">s_a</script>. Then, using the fact that the IRL agent assumes
that: “if the teacher chooses action <script type="math/tex">a</script> in state <script type="math/tex">s</script>, then <script type="math/tex">a</script> must be at
least as good as all the other available actions in <script type="math/tex">s</script>”, we have the
following set of constraints from the demonstration data <script type="math/tex">D</script> consisting of all
trajectories:</p>
<script type="math/tex; mode=display">\forall (s,a) \in D, \forall b, \quad \bar{w}^T(\bar{\mu}_{\pi,s_a}-\bar{\mu}_{\pi,s_b}) \ge 0</script>
<p>The paper’s main technical contribution is as follows. They argue that the above
set of (half-space) constraints results in a subspace <script type="math/tex">c(D)</script> that contains the
true weight vector, which is equivalent to obtaining the true reward function
assuming we know the features. The weights are assumed to be bounded into some
hypercube, <script type="math/tex">% <![CDATA[
-M_w < \bar{w} < M_w %]]></script>. By sampling <script type="math/tex">N</script> different weight vectors
<script type="math/tex">\bar{w}_i</script> within that hypercube, they can check the percentage of sampled
weights that lie within that true subspace with this (indirect) metric:</p>
<script type="math/tex; mode=display">G(D) = -\frac{1}{N}\sum_{i=1}^N \mathbb{1}\{\bar{w}_i \in c(D)\}</script>
<p>Mathematically their problem is to find the set of demonstrations <script type="math/tex">D</script> that
maximizes <script type="math/tex">G(D)</script>, because if that value is larger, then the sampled weights
are more likely to satisfy all the constraints, meaning that it has the property
of representing the true reward function.</p>
<p>Note carefully: we’re allowed to change <script type="math/tex">D</script>, the demonstration set, but we
can’t change the way the weights are sampled: they have to be sampled from a
fixed hypercube.</p>
<p>Their algorithm is simple: do a greedy approximation. First, select a starting
state. Then, select the demonstration <script type="math/tex">\tau_j</script> that increases the current <script type="math/tex">G(
\{D \cup \tau_j \} )</script> value the most. Repeat until <script type="math/tex">G(D)</script> is high enough.</p>
<p>For experiments, the paper relies on two sets of Grid-World mazes, shown below:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/quals/cakmak.png" width="450" />
<br />
<i>
The two grid-worlds used in the paper.
</i>
</p>
<p>Each of these domains has three features, and furthermore, only one is “active”
at a given square in the map, so the vectors are all one-hot. Both domains have
two tasks (hence, there are four tasks total), each of which is specified by a
particular value of the feature weights. This is the same as specifying a reward
function, so the optimal path for an agent may vary.</p>
<p>The paper argues that their algorithm results in the most informative
demonstration in the teacher’s set. For the first maze, only one demonstration
is necessary to convey each of the two tasks offered: for the second, only two
are needed for the two tasks.</p>
<blockquote>
<p>From observing example outcomes of the optimal teaching algorithm we get a
better intuition about what constitutes an informative demonstration for the
learner. A good teacher must show the range of important decision points that
are relevant for the task. The most informative trajectories are the ones
where the demonstrator makes rational choices among different alternatives, as
opposed to those where all possible choices would result in the same behavior.</p>
</blockquote>
<p>That the paper’s experiments involve these hand-designed mazes is probably one
of the main weaknesses. There’s no way this could extend to high dimensions,
when sampling from a hypercube (even if it’s “targeted” in some way, and not
sampled naively) would never result in a weight vector that satisfies all the
IRL constraints.</p>
<p>To conclude, this AAAI paper, though short and limited in some ways, provided me
with a new way of thinking about imitation learning with an active teacher.</p>
<p>Out of curiosity about follow-up, I looked at the Google Scholar of papers that
have cited this. Some interesting ones include:</p>
<ul>
<li>Cooperative Inverse Reinforcement Learning, NIPS 2016</li>
<li>Showing versus Doing: Teaching by Demonstration, NIPS 2016</li>
<li>Enabling Robots to Communicate Their Objectives, RSS 2017</li>
</ul>
<p>I’m surprised, though, that one of my recent favorite papers, <em><a href="https://arxiv.org/abs/1711.00694">Interpretable
and Pedagogical Examples</a></em>, didn’t cite this one. That one is somewhat
similar to this work except it uses more sophisticated Deep Neural Networks
within an iterative training procedure, and has far more impressive experimental
results. I hope to talk about that paper in a future blog post and to
re-implement it in code.</p>
Sun, 29 Apr 2018 16:00:00 -0700
https://danieltakeshi.github.io/2018/04/29/algorithmic-teaching/
https://danieltakeshi.github.io/2018/04/29/algorithmic-teaching/One-Shot Visual Imitation Learning via Meta-Learning<p>A follow-up paper to the one <a href="https://danieltakeshi.github.io/2018/04/01/maml/">I discussed in my previous post</a> is <a href="https://sites.google.com/view/one-shot-imitation">One-Shot
Visual Imitation Learning via Meta-Learning</a>. The idea is, again, to train
neural network parameters <script type="math/tex">\theta</script> on a distribution of tasks such that the
parameters are easy to fine-tune to new tasks sampled from the distribution. In
this paper, the focus is on imitation learning from raw pixels and showing the
effectiveness of a one-shot imitator on a physical PR2 robot.</p>
<p>Recall that the original MAML paper showed the algorithm applied to supervised
regression (for sinusoids), supervised classification (for images), and
reinforcement learning (for MuJoCo). This paper shows how to use MAML for
imitation learning, and the extension is straightforward. First, each imitation
task <script type="math/tex">\mathcal{T}_i \sim p(\mathcal{T})</script> contains the following information:</p>
<ul>
<li>
<p>A trajectory <script type="math/tex">\tau = \{o_1,a_1,\ldots,o_T,a_T\} \sim \pi_i^*</script> consists of a
sequence of states and actions from an <em>expert policy</em> <script type="math/tex">\pi_i^*</script>. Remember,
this is imitation learning, so we can assume an expert. Also, note that the
expert policy is <em>task-specific</em>.</p>
</li>
<li>
<p>A loss function <script type="math/tex">\mathcal{L}(a_{1:T},\hat{a}_{1:T}) \to \mathbb{R}</script>
providing feedback on how closely our actions match those of the expert’s.</p>
</li>
</ul>
<p>Since the focus of the paper is on “one-shot” learning, we assume we only have
one trajectory available for the “inner” gradient update portion of
meta-training for each task <script type="math/tex">\mathcal{T}_i</script>. However, if you recall from MAML,
we actually need at least one more trajectory for the “outer” gradient portion
of meta-training, as we need to compute a “validation error” for each sampled
task. This is <em>not</em> the overall meta-test time evaluation, which relies on an
entirely new <em>task</em> sampled from the distribution (and which only needs one
trajectory, not two or more). Yes, the terminology can be confusing. When I
refer to “test time evaluation” I always refer to when we have trained
<script type="math/tex">\theta</script> and we are doing few-shot (or one-shot) learning on a new task that
was not seen during training.</p>
<p>All the tasks in this paper use continuous control, so the loss function for
optimizing our neural network policy <script type="math/tex">f_\theta</script> can be described as:</p>
<script type="math/tex; mode=display">\mathcal{L}_{\mathcal{T}_i}(f_\theta) = \sum_{\tau^{(j)} \sim p(\mathcal{T}_i)}
\sum_{t=1}^T \| f_\theta(o_t^{(j)}) - a_t^{(j)} \|_2^2</script>
<p>where the first sum normally has one trajectory only, hence the “one-shot
learning” terminology, but we can easily extend it to several sampled
trajectories if our task distribution is very challenging. The overall objective
is now:</p>
<script type="math/tex; mode=display">{\rm minimize}_\theta \sum_{\mathcal{T}_i\sim p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_i}
(f_{\theta_i'}) = \sum_{\mathcal{T}_i\sim p(\mathcal{T})}
\mathcal{L}_{\mathcal{T}_i} \Big(f_{\theta - \alpha \nabla_\theta \mathcal{L}_{\mathcal{T}_i}(f_\theta)}\Big)</script>
<p>and one can simply run Adam to update <script type="math/tex">\theta</script>.</p>
<p>This paper uses two new techniques for better performance: a two-headed
architecture, and a bias transformation.</p>
<ul>
<li>
<p><strong>Two-Headed Architecture</strong>. Let <script type="math/tex">y_t^{(j)}</script> be the vector of
post-activation values just before the last fully connected layer which maps
to motor torques. The last layer has parameters <script type="math/tex">W</script> and <script type="math/tex">b</script>, so the inner
loss function <script type="math/tex">\mathcal{L}_{\mathcal{T}_i}(f_\theta)</script> can be re-written as:</p>
<script type="math/tex; mode=display">\mathcal{L}_{\mathcal{T}_i}(f_\theta) = \sum_{\tau^{(j)} \sim p(\mathcal{T}_i)}
\sum_{t=1}^T \| Wy_t^{(j)} + b- a_t^{(j)} \|_2^2</script>
<p>where, I suppose, we should write <script type="math/tex">\phi = (\theta, W, b)</script> and re-define
<script type="math/tex">\theta</script> to be all the parameters used to compute <script type="math/tex">y_t^{(j)}</script>.</p>
<p>In this paper, the test-time single demonstration of the new task is normally
provided as a sequence of observations (images) and actions. However, they
also experiment with the more challenging case of removing the provided
actions for that single test-time demonstration. They simply remove the
action and use this inner loss function:</p>
<script type="math/tex; mode=display">\mathcal{L}_{\mathcal{T}_i}(f_\theta) = \sum_{\tau^{(j)} \sim p(\mathcal{T}_i)}
\sum_{t=1}^T \| Wy_t^{(j)} + b\|_2^2</script>
<p>This is still a bit confusing to me. I’m not sure why this loss function leads
to the desired outcome. It’s also a bit unclear how the two-headed
architecture training works. After another read, maybe only the <script type="math/tex">W</script> and
<script type="math/tex">b</script> are updated in the inner portion?</p>
<p>The two-headed architecture seems to be beneficial on the simulated pushing
task, with performance improving by about 5-6 percentage points. That may not
sound like a lot, but this was in simulation and they were able to test with
444 total trials.</p>
<p>The other confusing part is that if we assume we’re allowed to have access to
expert actions, then the real-world experiment actually used the single-headed
architecture, and not the two-headed one. So there wasn’t a benefit to the
two-headed one <em>assuming</em> we have actions. Without actions, of course, the
two-headed one is our only option.</p>
</li>
<li>
<p><strong>Bias Transformation</strong>. After a certain neural network layer (which in this
paper is after the 2D spatial softmax applied after the convolutions to
process the images), they concatenate this vector of parameters. They claim
that</p>
<blockquote>
<p>[…] the bias transformation increases the representational power of the
gradient, without affecting the representation power of the network itself.
In our experiments, we found this simple addition to the network made
gradient-based meta-learning significantly more stable and effective.</p>
</blockquote>
<p>However, the paper doesn’t seem to show too much benefit to using the bias
transformation. A comparison is reported in the simulated reaching task, with
a dimension of 10, but it could be argued that performance is similar without
the bias transformation. For the two other experimental domains, I don’t think
they reported with and without the bias transformation.</p>
<p>Furthermore, neural networks already have biases. So is there some particular
advantage to having more biases packed in one layer, and furthermore, with
that layer being the same spot where the robot configuration is concatenated
with the processed image (<a href="https://danieltakeshi.github.io/2018/03/30/self-supervision-part-2/">like what people do with self-supervision</a>)? I
wish I understood. The math that they use to justify the gradient
representation claim makes sense; I’m just missing a tiny step to figure out
its practical significance.</p>
</li>
</ul>
<p>They ran their setups on three experimental domains: simulated reaching,
simulated pushing, and (drum roll please) real robotic tasks. For these domains,
they seem to have tested up to 5.5K demonstrations for reaching and 8.5K for
pushing. For the real robot, they used 1.3K demonstrations (ouch, I wonder how
long that took!). The results certainly seem impressive, and I agree that this
paper is a step towards generalist robots.</p>
Wed, 04 Apr 2018 16:00:00 -0700
https://danieltakeshi.github.io/2018/04/04/one-shot-vi-meta-learning/
https://danieltakeshi.github.io/2018/04/04/one-shot-vi-meta-learning/Model-Agnostic Meta-Learning<p>One of the recent landmark papers in the area of meta-learning is <a href="https://arxiv.org/abs/1703.03400">MAML:
Model-Agnostic Meta-Learning</a>. The idea is simple yet surprisingly effective:
train neural network parameters <script type="math/tex">\theta</script> on a distribution of tasks so that,
when faced with a <em>new</em> task, can be rapidly adjusted through just a few
gradient steps. In this post, I’ll briefly go over the notation and problem
formulation for MAML, and meta-learning more generally.</p>
<p>Here’s the notation and setup, mostly following the paper:</p>
<ul>
<li>
<p>The overall model <script type="math/tex">f_\theta</script> is what MAML is optimizing, with parameters
<script type="math/tex">\theta</script>. We denote <script type="math/tex">\theta_i'</script> as weights that have been adapted to the
<script type="math/tex">i</script>-th task through one or more gradient steps. Since MAML can be applied to
classification, regression, reinforcement learning, and imitation learning
(plus even more stuff!) we generically refer to <script type="math/tex">f_\theta</script> as mapping from
inputs <script type="math/tex">x_t</script> to outputs <script type="math/tex">a_t</script>.</p>
</li>
<li>
<p>A <strong>task</strong> <script type="math/tex">\mathcal{T}_i</script> is defined as a tuple <script type="math/tex">(T_i, q_i, \mathcal{L}_{\mathcal{T}_i})</script>, where:</p>
<ul>
<li>
<p><script type="math/tex">T_i</script> is the time horizon. For (IID) supervised learning problems like
classification, <script type="math/tex">T_i=1</script>. For reinforcement learning and imitation
learning, it’s whatever the environment dictates.</p>
</li>
<li>
<p><script type="math/tex">q_i</script> is the transition distribution, defining a prior over initial
observations <script type="math/tex">q_i(x_1)</script> and the transitions <script type="math/tex">q_i(x_{t+1}\mid
x_{t},a_t)</script>. Again, we can generally ignore this for simple supervised
learning. Also, for imitation learning, this reduces to the distribution
over expert trajectories.</p>
</li>
<li>
<p><script type="math/tex">\mathcal{L}_{\mathcal{T}_i}</script> is a loss function that maps the sequence of
network inputs <script type="math/tex">x_{1:T}</script> and outputs <script type="math/tex">a_{1:T}</script> to a scalar value
indicating the quality of the model. For supervised learning tasks, this is
almost always the cross entropy or squared error loss.</p>
</li>
</ul>
</li>
<li>
<p>Tasks are drawn from some <em>distribution</em> <script type="math/tex">p(\mathcal{T})</script>. For example, we
can have a distribution over the abstract concept of doing well at “block
stacking tasks”. One task could be about stacking blue blocks. Another could
be about stacking red blocks. Yet another could be stacking blocks that are
numbered and need to be ordered consecutively. Clearly, the performance of
meta-learning (or any alternative algorithm, for that matter) on optimizing
<script type="math/tex">f_\theta</script> depends on <script type="math/tex">p(\mathcal{T})</script>. The more diverse the
distribution’s tasks, the harder it is for <script type="math/tex">f_\theta</script> to quickly learn new
tasks.</p>
</li>
</ul>
<p>The MAML algorithm specifically finds a set of weights <script type="math/tex">\theta</script> that are
easily fine-tuned to new, held-out tasks (for testing) by optimizing the
following:</p>
<script type="math/tex; mode=display">{\rm minimize}_\theta \sum_{\mathcal{T}_i\sim p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_i}
(f_{\theta_i'}) = \sum_{\mathcal{T}_i\sim p(\mathcal{T})}
\mathcal{L}_{\mathcal{T}_i} \Big(f_{\theta - \alpha \nabla_\theta \mathcal{L}_{\mathcal{T}_i}(f_\theta)}\Big)</script>
<p>This assumes that <script type="math/tex">\theta_i' = \theta - \alpha \nabla_\theta
\mathcal{L}_{\mathcal{T}_i}(f_\theta)</script>. It is also possible to do multiple
gradient steps, not just one. Thus, if we do <script type="math/tex">K</script>-shot learning, then
<script type="math/tex">\theta_i'</script> is obtained via <script type="math/tex">K</script> gradient updates based on the task.
However, “one shot” is cooler than “few shot” and also easier to write, so we’ll
stick with that.</p>
<p>Let’s look at the loss function above. We are optimizing over a sum of loss
functions across several tasks. But we are evaluating the (outer-most) loss
functions <strong>while assuming we made gradient updates to our weights</strong> <script type="math/tex">\theta</script>.
What if the loss function were like this:</p>
<script type="math/tex; mode=display">{\rm minimize}_\theta \sum_{\mathcal{T}_i\sim p(\mathcal{T})} \mathcal{L}_{\mathcal{T}_i} (f_{\theta})</script>
<p>This means <script type="math/tex">f_\theta</script> would be capable of learning how to perform well across
all these tasks. But there’s no guarantee that this will work on <strong>held-out
tasks</strong>, and generally speaking, unless the tasks are so closely related, it
shouldn’t work. (I’ve tried doing some similar stuff in the past with the Atari
2600 benchmark where a “task” was “doing well on game X”, and got networks to
optimize across several games, but generalization was not possible without
fine-tuning.) Also, even if we were allowed to fine-tune, it’s very unlikely
that one or few gradient steps would lead to solid performance. MAML should do
better <em>precisely</em> because it optimizes <script type="math/tex">\theta</script> so that it can adapt to new
tasks with just a few gradient steps.</p>
<p>MAML is an effective algorithm for meta-learning, and one of its advantages over
other algorithms such as <script type="math/tex">{\rm RL}^2</script> is that it is parameter-efficient.
The gradient updates above do not introduce extra parameters. Furthermore, the
actual optimization over the full model <script type="math/tex">\theta</script> is <em>also</em> done via SGD</p>
<script type="math/tex; mode=display">\theta = \theta - \beta \left( \nabla_\theta \sum_{\mathcal{T}_i\sim p(\mathcal{T})}
\mathcal{L}_{\mathcal{T}_i} \Big(f_{\theta - \alpha \nabla_\theta \mathcal{L}_{\mathcal{T}_i}(f_\theta)}\Big) \right)</script>
<p>again introducing no new parameters. (The update is actually Adam if we’re doing
supervised learning, and TRPO if doing RL, but SGD is the foundation of those
and it’s easier for me to write the math. Also, even though the updates may be
complex, I think the <em>inner</em> part, where we have <script type="math/tex">f_{\theta - \alpha
\nabla_\theta \mathcal{L}_{\mathcal{T}_i}(f_\theta)}</script>, I think that is always
vanilla SGD, but I could be wrong.)</p>
<p>I’d like to emphasize a key point: the above update mandates <em>two</em> instances of
<script type="math/tex">\mathcal{L}_{\mathcal{T}_i}</script>. One of these — the one in the subscript to
get <script type="math/tex">\theta_i'</script> should involve the <script type="math/tex">K</script> training instances from the task
<script type="math/tex">\mathcal{T}_i</script> (or more specifically, <script type="math/tex">q_i</script>). The outer-most loss function
should be computed on <em>testing</em> instances, also from task <script type="math/tex">\mathcal{T}_i</script>.
This is important because we want our ultimate evaluation to be done on testing
instances.</p>
<p>Another important point is that we do <em>not</em> use those “testing instances” for
evaluating meta-learning algorithms, as that would be cheating. For testing, one
takes a <em>held-out set of test tasks entirely</em>, adjusts <script type="math/tex">\theta</script> for however
many steps are allowed (one in the case of one-shot learning, etc.) and then
evaluates according to whatever metric is appropriate for the task distribution.</p>
<p>In a subsequent post, I will further investigate several MAML extensions.</p>
Sun, 01 Apr 2018 16:00:00 -0700
https://danieltakeshi.github.io/2018/04/01/maml/
https://danieltakeshi.github.io/2018/04/01/maml/Zero-Shot Visual Imitation<p>In this post, i will further investigate one of the papers <a href="https://danieltakeshi.github.io/2018/03/23/self-supervision-part-1/">i discussed in an
earlier blog post</a>: <a href="https://openreview.net/forum?id=BkisuzWRW">Zero-Shot Visual Imitation</a> (Pathak et al., 2018).</p>
<p>For notation, I denote states and actions at some time step <script type="math/tex">t</script> as <script type="math/tex">s_t</script> and
<script type="math/tex">a_t</script>, respectively, <em>if</em> they were obtained through the agent exploring in
the environment. A hat symbol, <script type="math/tex">\hat{s}_t</script> or <script type="math/tex">\hat{a}_t</script>, refers to a
<em>prediction</em> made from some machine learning model.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/quals/forward_and_inverse.JPG" width="700" />
<br />
<i>
Basic forward (left) and inverse (right) model designs.
</i>
</p>
<p>Recall the basic forward and inverse model structure (figure above). A <strong>forward
model</strong> takes in a state-action pair and predicts the subsequent state
<script type="math/tex">\hat{s}_{t+1}</script>. An <strong>inverse model</strong> takes in a current state <script type="math/tex">s_t</script> and
some goal state <script type="math/tex">s_g</script>, and must predict the action that will enable the agent
go from <script type="math/tex">s_t</script> to <script type="math/tex">s_t</script>.</p>
<ul>
<li>
<p>It’s easiest to view the goal input to the inverse model as either the very
next state <script type="math/tex">s_{t+1}</script>, or the final desired goal of the trajectory, but some
papers also use <script type="math/tex">s_g</script> as an arbitrary checkpoint (Agrawal et al., 2016, Nair
et al., 2017, Pathak et al., 2018). For the simplest model, it probably makes
most sense to have <script type="math/tex">s_g = s_{t+1}</script> but I will use <script type="math/tex">s_g</script> to maintain
generality. It’s true that <script type="math/tex">s_g</script> may be “far” from <script type="math/tex">s_t</script>, but the inverse
model can predict a <em>sequence</em> of actions if needed.</p>
</li>
<li>
<p>If the states are images, these models tend to use convolutions to get a lower
dimensional featurized state representation. For instance, inverse models
often process the two input images through tied (i.e., shared) convolutional
weights to obtain <script type="math/tex">\phi(s_t)</script> and <script type="math/tex">\phi(s_{t+1})</script>, upon which they’re
concatenated and then processed through some fully connected layers.</p>
</li>
</ul>
<p><a href="https://danieltakeshi.github.io/2018/03/03/learning-to-poke-by-poking">As I discussed earlier</a>, there are a number of issues related to this basic
forward/inverse model design, most notably about (a) the high dimensionality of
the states, and (b) the multi-modality of the action space. To be clear on (b),
there may be many (or no) action(s) that let the agent go from <script type="math/tex">s_t</script> to
<script type="math/tex">s_g</script>, and the number of possibilities increases with a longer time horizon,
if <script type="math/tex">s_g</script> is many states in the future.</p>
<p>Let’s understand how the model proposed in Zero-Shot Visual Imitation mitigates
(b). Their inverse model takes in <script type="math/tex">s_g</script> as an arbitrary checkpoint/goal state
and must output a sequence of actions that allows the agent to arrive at
<script type="math/tex">s_g</script>. To simplify the discussion, let’s suppose we’re only interested in
predicting one step in the future, so <script type="math/tex">s_g = s_{t+1}</script>. Their predictive
physics design is shown below.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/quals/zsvi_one_step.JPG" width="700" />
<br />
<i>
The basic one-step model, assuming that our inverse model just needs to predict
one action. The convolutional layers for the inverse model use the same tied
network convolutional weights. The action loss is the cross-entropy loss
(assuming discrete actions), and is not written in detail due to cumbersome
notation.
</i>
</p>
<p>The main novelty here is that our predicted action <script type="math/tex">\hat{a}_t</script> from the
inverse model is provided as input to the forward model, along with the current
state <script type="math/tex">s_t</script>. We then try and obtain <script type="math/tex">s_{t+1}</script>, the actual state that was
encountered during the agent’s exploration. This loss <script type="math/tex">\mathcal{L}(s_{t+1},
\hat{s}_{t+1})</script> is the standard Euclidean distance and is added with the action
prediction loss <script type="math/tex">\mathcal{L}(a_t,\hat{a}_t)</script> which is the usual cross-entropy
(for discrete actions).</p>
<p>Why is this extra loss function from the successor states used? It’s because we
mostly don’t care which action we took, <em>so long as it leads to the desired next
state</em>. Thus, we really want <script type="math/tex">\hat{s}_{t+1} \approx s_{t+1}</script>.</p>
<p>Two extra long-ended comments:</p>
<ul>
<li>
<p>There’s some subtlety with making this work. The state loss
<script type="math/tex">\mathcal{L}(s_{t+1}, \hat{s}_{t+1})</script> treats <script type="math/tex">s_{t+1}</script> as ground truth,
but that <em>assumes</em> we took action <script type="math/tex">a_t</script> from state <script type="math/tex">s_t</script>. If we instead
took <script type="math/tex">\hat{a}_t</script> from <script type="math/tex">s_t</script>, and <script type="math/tex">\hat{a}_t \ne a_t</script>, then it seems like
the ground-truth should no longer be <script type="math/tex">s_{t+1}</script>?</p>
<p>Assuming we’ve trained long enough, then I understand why this will work,
because the inverse model will predict <script type="math/tex">\hat{a}_t = a_t</script> most of the time,
and hence the forward model loss makes sense. But one has to <em>get</em> to that
point first. In short, the forward model training must assume that the given
action will actually result in a transition from <script type="math/tex">s_t</script> to <script type="math/tex">s_{t+1}</script>.</p>
<p>The authors appear to mitigate this with pre-training the inverse and forward
models separately. Given ground truth data <script type="math/tex">\mathcal{D} =
\{s_1,a_1,s_2,\ldots,s_N\}</script>, we can pre-train the forward model with this
collected data (no action predictions) so that it is effective at
understanding the effect of actions.</p>
<p>This would also enable better training of the inverse model, which (as the
authors point out) depends on an accurate forward model to be able to check
that the predicted action <script type="math/tex">\hat{a}_t</script> has the desired effect in state-space.
The inverse model itself can also be pre-trained entirely on the ground-truth
data while <em>ignoring</em> <script type="math/tex">\mathcal{L}(s_{t+1}, \hat{s}_{t+1})</script> from the
training objective.</p>
<p>I think this is what the authors did, though I wish there were a few more
details.</p>
</li>
<li>
<p>A surprising aspect of the forward model is that it appears to predict the
<em>raw</em> states <script type="math/tex">s_{t+1}</script>, which could be very high-dimensional. I’m surprised
that this works, given that (Agrawal et al., 2016) explicitly avoided this by
predicting lower-dimensional features. Perhaps it works, but I wish the
network architecture was clear. My guess is that the forward model processes
<script type="math/tex">s_t</script> to be a lower dimensional vector <script type="math/tex">\psi(s_t)</script>, concatenates it with
<script type="math/tex">\hat{a}_t</script> from the inverse model, and then up-samples it to get the
original image. <a href="http://bamos.github.io/2016/08/09/deep-completion/#ml-heavy-generative-adversarial-net-gan-building-blocks">Brandon Amos describes up-sampling in his excellent blog
post</a>. (Note: don’t call it “deconvolution.”)</p>
</li>
</ul>
<p>Now how do we extend this for <em>multi-step</em> trajectories? The solution is simple:
make the inverse model a recurrent neural network. That’s it. The model still
predicts <script type="math/tex">\hat{a}_t</script> and we use the same loss function (summing across time
steps) and the same forward model. For the RNN, the convolutional layers
<script type="math/tex">\phi</script> take in the current state but they always take in <script type="math/tex">s_g</script>, the goal
state. They also take in <script type="math/tex">h_{i-1}</script> and <script type="math/tex">a_{i-1}</script> the previous hidden unit
and the previous action (not the <em>predicted action</em>, that would be a bit silly
when we have ground truth).</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/quals/inverse_recurrent.JPG" width="700" />
<br />
<i>
The multi-step trajectory case, visualizing several steps out of many.
</i>
</p>
<p>Thoughts:</p>
<ul>
<li>
<p>Why not make the <em>forward</em> model recurrent?</p>
</li>
<li>
<p>Should we weigh shorter-term actions highly instead of summing everything
equally as they appear to be doing?</p>
</li>
<li>
<p>How do we actually decide the length of the action vector to predict? Or said
in a better way, when do we decide that we’ve attained <script type="math/tex">s_g</script>?</p>
</li>
</ul>
<p>Fortunately, the authors answer that last thought by training a deep neural
network that can learn a stopping criterion. They say:</p>
<blockquote>
<p>We sample states at random, and for every sampled state make positives of its
temporal neighbors, and make negatives of the remaining states more distant
than a certain margin. We optimize our goal classifier by cross-entropy loss.</p>
</blockquote>
<p>So, states “close” to each other are positive samples, whereas “father” samples
are negative. Sure, that makes sense. By distance I assume simple Euclidean
distance on raw pixels? I’m generally skeptical of Euclidean distance but it
might be necessary if the forward model also optimizes the same objective. I
also assume this is applied after each time step, testing whether <script type="math/tex">s_i</script> at
time <script type="math/tex">i</script> has reached <script type="math/tex">s_g</script>. Thus, it is not known ahead of time how many
actions the RNN must be able to predict before the goal is reset.</p>
<p>An alternative is mentioned about treating stopping as an action. There’s some
resemblance to this and <a href="https://danieltakeshi.github.io/2017/11/24/ddo/">DDO’s option termination criterion</a>.</p>
<p>Additionally, we have <a href="https://openreview.net/forum?id=BkisuzWRW">this relevant comment on OpenReview</a>:</p>
<blockquote>
<p>The independent goal recognition network does not require any extra work
concerning data or supervision. The data used to train the goal recognition
network is the same as the data used to train the PSF. The only prior we are
assuming is that nearby states to the randomly selected states are positive
and far away are negative which is not domain specific. This prior provides
supervision for obtaining positive and negative data points for training the
goal classifier. Note that, no human supervision or any particular form of
data is required in this self-supervised process.</p>
</blockquote>
<p>Yes, this makes sense.</p>
<p>Now let’s discuss the experiments. The authors test several ablations of their
model:</p>
<ul>
<li>
<p>An inverse model with no forward model at all (Nair et al., 2017). This is
different from their earlier paper which used a forward model for
regularization purposes (Agrawal et al., 2016). The model in (Nair et al.,
2017) just used the inverse model for predicting an action given current image
<script type="math/tex">I_t</script> and (critically!) a goal image <script type="math/tex">I_{t+1}'</script> specified by a human.</p>
</li>
<li>
<p>A more sophisticated inverse model with an RNN, but no forward model. Think of
my most recent hand-drawn figure above, except without the forward portion.
Furthermore, this baseline also does not use the action <script type="math/tex">a_i</script> as input to
the RNN structure.</p>
</li>
<li>
<p>An even more sophisticated model where the action history is now input to the
RNN. Otherwise, it is the same as the one I just described above.</p>
</li>
</ul>
<p>Thus, all three of their ablations do not use the forward consistency model and
are solely trained by minimizing <script type="math/tex">\mathcal{L}(a_t,\hat{a}_t)</script>. I suppose this
is reasonable, and to be fair, testing these out in physical trials takes a
while. (Training should be less cumbersome because data collection is the
bottleneck. Once they have data, they can train all of their ablations quickly.)
Finally, note that all these inverse models take <script type="math/tex">(s_t,s_g)</script> as input, and
<script type="math/tex">s_g</script> is not necessarily <script type="math/tex">s_{t+1}</script>. This, I remember from the greedy planner
in (Agrawal et al., 2016).</p>
<p>The experiments are: navigating a short mobile robot throughout rooms and
performing rope manipulation with the same setup from (Nair et al., 2017).</p>
<ul>
<li>
<p><strong>Indoor navigation</strong>. They show the model an image of the target goal, and
check if the robot can use it to arrive there. This obviously works best when
few actions are needed; otherwise, waypoints are necessary. However, for
results to be interesting enough, the target image should not have any overlap
with the starting image.</p>
<p>The actions are: (1) forward 10cm, (2) turn left, (3) turn right, and (4)
standing still. They use several “tricks” such as using action repeats,
applying a reset maneuver, etc. A ResNet acts as the image processing
pipeline, and then (I assume) the ResNet output is fed into the RNN along with
the hidden layer and action vector.</p>
<p>Indeed, it seems like their navigating robot can reach goal states and is
better than the baselines! They claim their robot learns first to turn and
then to move to the target. To make results more impressive, they tested all
this on a different floor from where the training data was collected. Nice!
The main downside is that they conducted only eight trials for each method,
which might not be enough to be entirely convincing.</p>
<p>Another set of experiments tests imitation learning, where the goal images are
far away from the robot, thus mandating a series of checkpoint images
specified by a human. Every fifth image in a human demonstration was provided
as a waypoint. (Note: this doesn’t mean the robot will take exactly five steps
for each waypoint even if it was well trained, because it may take four or six
or some other number of actions before it deems itself close enough to the
target.) Unfortunately, I have a similar complaint as earlier: I wish there
were more than just three trials.</p>
</li>
<li>
<p><strong>Rope manipulation</strong>. They claim almost a 2x performance boost over (Nair et
al., 2017) while using the same training data of 60K-70K interaction pairs.
That’s the benefit of building upon prior work. They surprisingly never say
how many trials they have, and their table reports only a “bootstrapped
standard deviation”. Looking at (Nair et al., 2017), I cannot find where the
35.8% figure comes from (I see 38% in that paper but that’s not 35.8%…).</p>
<p><a href="https://openreview.net/forum?id=BkisuzWRW">According to OpenReview comments</a> they also trained the model from
(Agrawal et al., 2016) and claim 44% accuracy. This needs to be in the final
version of the paper. The difference from (Nair et al., 2017) is that (Agrawal
et al., 2016) jointly train a forward model (but not to enforce dynamics but
just as a regularizer), while (Nair et al., 2017) do not have any forward
model.</p>
</li>
</ul>
<p>Despite the lack of detail in some areas of the paper, (where’s the appendix?!?)
I certainly enjoyed reading it and would like to try out some of this stuff.</p>
Fri, 30 Mar 2018 16:00:00 -0700
https://danieltakeshi.github.io/2018/03/30/self-supervision-part-2/
https://danieltakeshi.github.io/2018/03/30/self-supervision-part-2/A Critical Comparison of Three Half Marathons I Have Run<p>I have now run in three half marathons: the Berkeley Half Marathon (November
2017), the Kaiser Permanente San Francisco Half Marathon (February 2018), and
the Oakland Half Marathon (March 2018).</p>
<p>To be clear, the Kaiser Permanente San Francisco half marathon is <em>not</em> the same
as a <a href="http://www.thesfmarathon.com/">separate set of San Francisco races in the summers</a>. The Oakland Half
Marathon is also technically the “Kaiser Permanente […]” but since there’s
only one main set of Oakland races a year — known as the “Running Festival”
— we can be more lenient in our naming convention.</p>
<p>All these races are popular, and the routes are relatively flat and therefore
great for setting PRs. I would be happy to run any of these again. In fact, I’ll
probably will, for <em>all</em> three!</p>
<p>In this post, I’ll provide some brief comments on each of the races. Note that:</p>
<ul>
<li>
<p>When I list registration fees, it’s not always a clear-cut comparison since
prices jack up closer to race day. I think I managed to get an “early bird”
deal for all these races, so hopefully the prices are somewhat comparable.
Also, I <em>include</em> taxes in the fee I list.</p>
</li>
<li>
<p>By “packet pickup” I refer to when runners pick up whatever racing material is
needed (typically a timing chip, bib, sometimes gear as well) a day or two
before the actual race. These pickup events also involve some deals for food
and running equipment from race sponsors. Below is a picture that I took of
the Oakland package pickup:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/running/packet_pickup_oakland.JPG" />
<br />
</p>
</li>
<li>
<p>While I list “pros” and “cons” of the races, most are minor in the grand
scheme of things, and this review is for those who might be picky. I
reiterate that I will probably run in all of these again the next time around.</p>
</li>
</ul>
<p>OK, let’s get started!</p>
<h2 id="berkeley-half-marathon">Berkeley Half Marathon</h2>
<ul>
<li>Website: <a href="http://berkeleyhalfmarathon.com/">here</a>.</li>
<li>Price I paid: about $100, including a $10 bib shipping fee.</li>
</ul>
<p>Pros:</p>
<ul>
<li>
<p>The race has a great “local feel” to it, with lots of Berkeley students and
residents both running in the race or cheering us as spectators. I saw a
number of people that I knew, mostly other student runners, and it was nice to
say hi to them. There was also a cool drumming band which played while we were
entering the portion of the race close to the San Francisco Bay.</p>
</li>
<li>
<p>The course is mostly flat, and enters a few Berkeley neighborhoods (again, a
great local feel to it). There’s also a relatively straight section at the
roughly 8-11 mile range by the San Francisco Bay and which lets you see the
runners ahead of you when you’re entering the portion (for extra motivation).
<a href="https://danieltakeshi.github.io/2016-04-23-a-nice-running-route-through-the-berkeley-marina-and-cesar-chavez-park/">As I discussed two years ago</a>, I regularly run by this area so I was used
to the view, but I can see it being attractive for those who don’t use the
same routes.</p>
</li>
<li>
<p>There are lots of pacers, for half-marathon finish times of 1:27, 1:35 (2x),
1:45, 1:55, etc.</p>
</li>
<li>
<p>The post-race food sampling selection was fantastic! There were the obligatory
water bottles and bananas, but I also had tasty <a href="https://powercrunch.com/">Power Crunch protein
bars</a>, Muscle Milk (this is clearly bad for you, but never mind), pretzels,
cookies, coffee, etc. There was also beer, but I didn’t have any.</p>
</li>
<li>
<p>Post-race deals are excellent. I used them to order some Power Crunch bars at
a discount.</p>
</li>
<li>
<p>The packet pickup had some decent free food samples. The race shirt is
interesting — it’s a different style from prior years and feels somewhat odd
but I surprisingly like it, and I’ll be wearing it both to school and for when
I run in my own time.</p>
</li>
</ul>
<p>Cons:</p>
<ul>
<li>
<p>There’s a $10 bib mailing fee, and I realize now that it’s pointless to pay
for it because we <em>also</em> have to pick up a timing chip during packet pickup,
and that’s when we could have gotten the bibs. Thus, there seems to be no
advantage to paying for the bib to be mailed. Furthermore, I wish the timing
chip were attached to the bib; we had to tie it within our shoelaces. I think
it’s far easier to stick it on the bib.</p>
</li>
<li>
<p>The starting location is a bit awkwardly placed in the center of the city,
though to be fair, I’m not sure of a better spot. Certainly it’s less
convenient for drop-offs and Uber rides compared to, say, Golden Gate Park.</p>
</li>
<li>
<p>There were seven water stops, one of which had electrolytes and GU energy
chews. (Unfortunately, when running, I actually dropped two out of the four GU
chews I was given … please use the longer, thinner packages that the Oakland
race uses!!) The other two races offered richer goodies at the aid stations so
next time, I’ll bring my own energy stuff.</p>
</li>
<li>
<p>It was the most expensive of the races I’ve run in, though the difference
isn’t that much, especially if you avoid making the mistake of getting your
bib mailed to you.</p>
</li>
<li>
<p>The photography selection after the race is excellent, but it’s expensive and
most of it is concentrated near the end of the race when it’s crowded, so most
pictures weren’t that interesting.</p>
</li>
</ul>
<h2 id="kaiser-permanente-san-francisco-half-marathon">Kaiser Permanente San Francisco Half Marathon</h2>
<ul>
<li>Website: <a href="https://getfitkpsf.com/">here</a>.</li>
<li>Price I paid: about $80.</li>
</ul>
<p>Upsides:</p>
<ul>
<li>
<p>The race route is great! I enjoyed running through Golden Gate Park and seeing
the Japanese Tea Garden, the California Academy of Sciences, and so on.
There’s also a very long, straight section in the second half of the race
(longer than Berkeley’s!) by the ocean where you can again see the runners
ahead of you on their way back.</p>
</li>
<li>
<p>There’s a great selection of post-race sampling, arguably on par with Berkeley
though there’s no beer. There were water bottles and bananas, along with CLIF
Whey protein bars, Ocho candy, some coffee/caffeine-base drinks, etc.</p>
</li>
<li>
<p>The price is the cheapest of the three, which is surprising since I figured
things in San Francisco would be more expensive. I suspect it has to do with
much of the race being in Golden Gate Park, and the course is set so that
there isn’t a need to close many roads. On a related note, it’s also easy to
drop off and pick up racers.</p>
</li>
<li>
<p>You have to finish the race to get your shirt. Of course this is minor, but I
believe it’s not a good idea to wear the official race shirt on race day.
Incidentally, there’s no package pickup, which means we don’t get free samples
or deals, but it’s probably better for me since I would have had to Uber a
long distance to and back. You get the bib and timing chip mailed in advance,
and the timing chip is (thankfully) attached to the bib.</p>
</li>
</ul>
<p>Downsides:</p>
<ul>
<li>
<p>No pacers. I don’t normally try to stick to a pacer during my races, but I
think they’re useful.</p>
</li>
<li>
<p>While there was a great selection of post-race food sampling, there was no
beer offered, in contrast to the Berkeley and Oakland races.</p>
</li>
<li>
<p>With regards to post-race photographs, my comments on this are basically
identical to those of the Berkeley race.</p>
</li>
<li>
<p>All the aid stations had electrolytes (I think Nuun) in addition to water. It
was a bit unclear to me which cups corresponded to what beverage, though in
retrospect I should have realized that the “blank” cups had water and cups
with a lightning sign on them had the electrolytes. The drinks situation
is better than the Berkeley race, but the downside is that there were no GU
energy chews, so perhaps it’s a wash with respect to the aid stations?</p>
</li>
<li>
<p>It felt like there were fewer people cheering us on when we raced,
particularly compared to the Berkeley race.</p>
</li>
<li>
<p>I don’t think there were as many post-race discount deals. I was hoping that
there were some deals for the CLIF whey protein bars, which would have been
the analogue of the Power Crunch discount for the Berkeley race. The discount
deals also lasted only a week, compared to <em>two months</em> for Berkeley’s
post-race stuff.</p>
</li>
</ul>
<h2 id="oakland-running-festival-half-marathon">Oakland Running Festival Half Marathon</h2>
<ul>
<li>Website: <a href="http://oaklandmarathon.com/">here</a>.</li>
<li>Price I paid: about $90.</li>
</ul>
<p>Upsides:</p>
<ul>
<li>
<p>The race started at 9:45am, whereas the Berkeley and San Francisco races each
started at about 8:10am. While I consider myself a morning person, that’s for
<em>work</em>. If I want to set a half marathon PR, a 9:45am starting time is far
better.</p>
</li>
<li>
<p>The Oakland race easily has the best aid stations compared to the other two
races. Not only were there electrolytes at each station, but some also had
bananas, GU gels, and GU chews (yes, GU has a lot of products!). Throughout
the race I consumed two half-bananas (easy to eat since you can squeeze them),
one GU gel, and one GU chew package, which contained about eight chews. This
was very helpful!</p>
</li>
<li>
<p>There were lots of spectators and locals cheering us on, possibly as much as
the Berkeley race had.</p>
</li>
<li>
<p>The view of Lake Merritt is excellent, and it’s probably the main visual
attraction. Other than that, the race enters the city of Oakland throughout
mostly the business sector. Also this was the only one of the three races
where a marathon was simultaneously offered, so there were a few marathoners
mixed in with us.</p>
</li>
<li>
<p>There’s a great package pickup (which I showed a photo of earlier), which
probably had as many deals as the Berkeley package pickup. We had to show up
to the pickup to get the bib and the timing chip (attached to the bib). While
I was there, I bought several GU products that I’ll use for my future
long-distance training sessions.</p>
</li>
<li>
<p>Each runner got tickets for <em>two</em> free Lagunitas Beer cups. We had this
offering after the race, but one was enough for me. I’m not sure how people
can down two servings quickly.</p>
</li>
<li>
<p>There were pacers for various distances.</p>
</li>
<li>
<p>Race photos are <em>free</em>, which is definitely refreshing compared to the other
two races. <em>Disclaimer</em>: I’m writing this post one day after the race
occurred, and I won’t be able to download the photos for a few days, so the
quality may be worse on a per-photo basis.</p>
</li>
<li>
<p>Unfortunately, I don’t think there are <em>any</em> post-race deals. Hopefully
something will show up in my inbox soon so I can turn this into an “upside.”
<em>Update 03/27/2018</em>: heh, a day later, I get an email in my inbox showing that
there <em>are</em> some race deals. Excellent! The deals seems to be just as good as
the other races, so I’ll put it as an upside.</p>
</li>
</ul>
<p>Downsides:</p>
<ul>
<li>
<p>The race scenery is probably less appealing than the Berkeley or San Francisco
races. The route mostly weaves throughout the city roads, and there aren’t
clear views of the Bay. Also, the turn near the end of the race when we see
Lake Merritt again is narrow and awkwardly placed, and it’s also hilly, which
is <em>not</em> what I want to see at the 12th and 13th mile checkpoints.</p>
</li>
<li>
<p>The post-race food sampling was probably weaker compared to the other two,
though it’s debatable. There were water bottles, as you can see in my photo
below, along with bananas and some peanut butter bars and energy drinks. I
think the other races had more, and I was disappointed when the Oakland
website said that racers would “receive bagels” because I didn’t see any! On
the positive side, I got a free package of <a href="http://www.thesfmarathon.com/">GU stroopwafel</a>, so again, it’s
debatable.</p>
</li>
<li>
<p>The race isn’t as good at storing your sweats. At Berkeley, we could save our
sweats in the Berkeley high school gym, and it was easy for us to retrieve our
bags after the race. For Oakland, it was stored in a small tent and we had to
stand in line for a while before a volunteer could find our stuff.</p>
</li>
</ul>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/running/oakland_end_of_race.JPG" />
<br />
<i>
The finish line of the Oakland races (including the half marathon).
</i>
</p>
<h2 id="conclusion">Conclusion</h2>
<p>I’m really happy that I started running half marathons. I’m signed up to run the
<a href="http://www.thesfmarathon.com/">San Francisco Second Half-Marathon</a> in July. If you’re interested in
training with me, <a href="https://danieltakeshi.github.io/about.html">let me know</a>.</p>
Mon, 26 Mar 2018 15:00:00 -0700
https://danieltakeshi.github.io/2018/03/26/three-half-marathons/
https://danieltakeshi.github.io/2018/03/26/three-half-marathons/Self Supervision and Building Visual Predictive Models<p>I enjoy reading robotics and deep reinforcement learning papers that cleverly
apply self-supervision to learn some task. There’s something oddly appealing
about an agent “semi-randomly” acting in a world and learning something useful
out of the data it collects. Some papers, for instance, build <em>visual
predictive models</em>, which are those that enable the agent to anticipate the
future states of the world, which may be raw images (or more commonly, a latent
feature representation of them). Said another way, the agent learns an internal
physics model. The agent can then use it to plan because it knows the effect of
its actions, so it can run internal simulations and pick the action that results
in the most desirable outcome.</p>
<p>In this blog post, I’ll discuss a few papers about self-supervision and visual
predictive models by providing a brief description of their contributions. A
subsequent blog post will discuss the papers’ relationships to each other in
further detail.</p>
<h2 id="paper-1-learning-visual-predictive-models-of-physics-for-playing-billiards-iclr-2016">Paper 1: Learning Visual Predictive Models of Physics for Playing Billiards (ICLR 2016)</h2>
<p>“Billiards” in this paper refers to a generic, 2-D simulated environment of
balls that move and bounce around walls according to the laws of physics. As the
authors correctly point out, this is an environment that easily enables
extensive experiments: altering the number of balls, changing their sizes or
colors, and so forth.</p>
<p>While the agent “sees” a 2-D image of the environment, that is not the direct
input to the neural network nor is it what the neural network predicts.</p>
<ul>
<li>
<p>The <em>input</em> consists of the past four “glimpses” of the object, and the
applied forces (which we assume known and tracked). The glimpses should be the
128x128 RGB image of the environment, but perhaps “blacking out” everything
except the object. (I’m not sure about the technical details, but the idea is
intuitive.) Thus, the same network is used for <em>each</em> of the balls in the
environment, which the authors call an “object-centric” model. As one would
expect, the input image is passed through a series of convolutional layers and
then the forces are concatenated with that feature representation.</p>
</li>
<li>
<p>The <em>output</em> is the object’s predicted velocity for the current and subsequent
(up to <script type="math/tex">h</script>) times. It is <em>not</em> the standard latent feature representation
that other visual predictive models normally apply, because in billiards, they
assume it is enough to know the displacements of the balls to track them.</p>
</li>
</ul>
<p>The model is trained by minimizing</p>
<script type="math/tex; mode=display">\sum_{k=1}^h w_k\|\tilde{u}_{t+k} - u_{t+k}\|_2^2</script>
<p>where <script type="math/tex">w_k</script> is a weighing factor that is larger for shorter-term (smaller
<script type="math/tex">k</script>) time steps. Good, this makes sense.</p>
<p>The authors show that they are able to predict the trajectories of balls, and
that this can be generalized and also used for planning.</p>
<h2 id="paper-2-learning-to-poke-by-poking-experiental-learning-of-intuitive-physics-nips-2016">Paper 2: Learning to Poke by Poking: Experiental Learning of Intuitive Physics (NIPS 2016)</h2>
<p>I discussed this paper in a <a href="https://danieltakeshi.github.io/2018/03/03/learning-to-poke-by-poking">previous blog post</a>. Heh, you can tell that I’m
interested in this stuff.</p>
<h2 id="paper-3-learning-hand-eye-coordination-for-robotic-grasping-with-deep-learning-and-large-scale-data-collection-ijrr-2017">Paper 3: Learning Hand-Eye Coordination for Robotic Grasping with Deep Learning and Large-Scale Data Collection (IJRR 2017)</h2>
<p>This is the famous (or infamous?) “arm-farm” paper from Google. The dataset
here is <em>MASSIVE</em> — I don’t know of a self-supervision paper with real robots
that contains this much data. The authors collected 800,000 (semi-)random grasp
attempts collected over two months by running up to 14 robots in parallel. In
fact, even this somewhat understates the total amount of data: <em>each grasp</em>
consists of <script type="math/tex">T</script> training data points of the form <script type="math/tex">(I_t^i, p_T^i - p_t^i,
\ell_i)</script> which contains the current camera image, the vector from the current
pose to the one that is eventually reached, and the success of the grasp.</p>
<p>The data then enables the robot to effectively learn hand-eye coordination by
continuous visual servoing, without the need for camera calibration. Given a
camera image of the workspace, and independently of the calibration or robot
pose, the trained CNN predicts the probability that the motion of the gripper
results in successful grasps.</p>
<p>During data collection, the labels (either a successful grasp or not) must be
automatically supplied. The authors do this with (a) checking if the gripper
closed or not, and (b) an image subtraction test, testing the image before and
after the object was grasped. This makes sense to me. The first test is used,
and then the second is a backup to check for small objects. I can see how it
might fail, though, such as if the robot grasped the wrong object or pushed
the target object to the side rather than picking it up, either of which would
result in a different image than the starting one</p>
<p>The use of robots running in parallel means that each can collect a diverse
dataset on its own, in part due to different actions and in part due to
different material properties of each gripper. This is an application of the A3C
concept from Deep Reinforcement Learning for real, physical robotics.</p>
<p>There are a lot of things that I like from this paper, but one that really seems
intriguing for future AI applications is that the data enabled the robots to
learn different grasping strategies for different types of objects, such as the
soft vs hard difference the authors observed.</p>
<h2 id="paper-4-learning-to-act-by-predicting-the-future-iclr-2017">Paper 4: Learning to Act by Predicting the Future (ICLR 2017)</h2>
<p>I discussed this paper in a <a href="https://danieltakeshi.github.io/2017/10/10/learning-to-act-by-predicting-the-future/">previous blog post</a>.</p>
<h2 id="paper-5-combining-self-supervised-learning-and-imitation-for-vision-based-rope-manipulation-icra-2017">Paper 5: Combining Self-Supervised Learning and Imitation for Vision-Based Rope Manipulation (ICRA 2017)</h2>
<p>The same architectural idea from the “Learning to Poke” paper is used in this
one to jointly learn forward and inverse dynamics models. Instead of poking, the
robot learns rope manipulation, a complicated task to model with hard-coded
physics.</p>
<p>In my opinion, one of the weaknesses in the “Learning to Poke” paper was the
greedy planner. The planner saw the current and <em>goal</em> images, and had to infer
the intermediate actions. This prevented the robot from learning longer-horizon
tasks, because the goal image could be quite different from the current one. In
this paper, the authors allow for longer-horizon learning by providing one human
demonstration of the task. The demonstration consists of a sequence of images,
each of which are repeatedly fed into the neural network model at each time
step. Thus, the goal image should be the one that correspond to the next time
step, which appears to be more tractable.</p>
<p>They ran their Baxter robot autonomously for 500 hours, collecting 60,000
training data points.</p>
<h2 id="paper-6-curiosity-driven-exploration-by-self-supervised-prediction-icml-2017">Paper 6: Curiosity-Driven Exploration by Self-Supervised Prediction (ICML 2017)</h2>
<p>They build on top of an existing RL algorithm, A3C, by modifying the reward
function so that at each time step <script type="math/tex">t</script>, the reward is <script type="math/tex">r_t^{i}+r_t^{e}</script>
instead of just <script type="math/tex">r_t^{e}</script>, where <script type="math/tex">r_t^{i}</script> is the <em>curiosity reward</em> and
<script type="math/tex">r_t^{e}</script> is the reward from the environment.</p>
<p>In sparse rewards, such as the Doom environment from OpenAI they use (and, I
might add, the recent robotics environments, also from OpenAI) the environment
reward is zero almost everywhere, except for 1 at the goal. This makes it
effectively an intractable problem for off-the-shelf RL algorithms. Hence, by
building a predictive model, given current and subsequent states <script type="math/tex">s_t</script> and
<script type="math/tex">s_{t+1}</script> they can assign the curiosity reward to be</p>
<script type="math/tex; mode=display">r_t^i = \frac{\eta}{2}\|\hat{\phi}(s_{t+1}) - \phi(s_{t+1})\|_2^2</script>
<p>which measures the difference in the predicted <em>latent space</em> of the successor
state, respectively. The inverse dynamics model takes in <script type="math/tex">(s_t,s_{t+1})</script>
during training and predicts <script type="math/tex">a_t</script>. The forward dynamics model predicts the
latent successor state <script type="math/tex">\hat{\phi}(s_{t+1})</script> shown above.</p>
<p>They argue that their form of curiosity has three benefits: solving tasks with
sparse rewards, exploring the environment, and learning skills that can be
reused and applied in different scenarios. One interesting conjecture from the
third claim is that if the agent simply does the same thing over and over again,
the curiosity reward will go down to zero because the agent is stuck in the same
latent space. Only by “learning” new actions that substantially change the
latent space will the agent then be able to obtain new rewards.</p>
<p>The results on Doom and Mario environments are impressive.</p>
<h2 id="paper-7-zero-shot-visual-imitation-iclr-2018">Paper 7: Zero-Shot Visual Imitation (ICLR 2018)</h2>
<p>Wait, zero-shot visual imitation (learning)? How is this possible?</p>
<p>First, let’s be clear on their technical definition: “zero-shot” means that they
are still allowed to observe a demonstration of the task, but it has to be only
the state space (i.e., images), so actions are <em>not</em> included. The second part
of the definition means that expert demonstrations (regardless of states or
actions) are not allowed during training.</p>
<p>OK, that makes sense. So … the robot just sees the images of the demo at
inference time, and must imitate it. That’s a high bar. The key must be to
develop a sufficient prior — but how? By having the agent move (semi-)randomly
to learn physics, of course!</p>
<p>In terms of the visual predictive model, the paper does a nice job describing
four different models, starting from the ICRA 2017 rope manipulation paper and
moving towards the one they use for their experiments. Their final model
conditions on the final goal and uses recurrent neural networks, and is
augmented with a separate neural network that predicts whether the goal has been
attained or not.</p>
<p>The paper presents two sets of experiments. One is a navigation task using a
mobile robot, and the other is a rope manipulation task using the Baxter robot.
With zero-shot visual imitation, the Baxter robot <em>doubles</em> the performance of
rope manipulation compared to the results from ICRA 2017. Thus, if I’m thinking
about rope manipulation benchmarks, I better check out this paper and not the
ICRA 2017 one. I also assume that zero-shot visual imitation would result in
better poking performance than “Learning to Poke” if the poking requires
long-term planning.</p>
<p>Results for the navigation agent are also impressive.</p>
<p>This is not a deep reinforcement learning paper, though one could argue for the
use of Deep RL as an alternative to self-supervision. Indeed, that was a point
raised by one of the reviewers.</p>
<h2 id="additional-references">Additional References</h2>
<p>Here are a few additional papers that are somewhat related to the above, and
which I don’t have time to write about in detail … yet.</p>
<ul>
<li>
<p><a href="https://arxiv.org/abs/1605.07157">Unsupervised Learning for Physical Interaction through Video Prediction</a>
is another interesting paper on imagining the future based on predicting pixel
motion.</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1709.04905">One-Shot Visual Imitation Learning via Meta-Learning</a> allows robots to
learn how to perform tasks with a single demonstration. It’s somewhat related
to the “Zero-Shot Visual Imitation” paper, except those papers use very
different solutions for different problems. I’d like to compare them in more
detail later.</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1611.05397">Reinforcement Learning with Unsupervised Auxiliary Tasks</a> works by having
a reinforcement learning agent consider a series of “pseudo” loss functions
that it considers under its objective function.</p>
</li>
<li>
<p><a href="https://arxiv.org/abs/1802.06070">Diversity is All You Need</a>, which argues that by using entropy correctly,
an agent can automatically learn useful skills in an environment. It’s related
to the “Curiosity” paper in discovering new skills.</p>
</li>
</ul>
Fri, 23 Mar 2018 16:00:00 -0700
https://danieltakeshi.github.io/2018/03/23/self-supervision-part-1/
https://danieltakeshi.github.io/2018/03/23/self-supervision-part-1/