Seita's PlaceThis is my blog, where I have written over 250 articles on a variety of topics, most of which are about one of two major themes. The first is computer science, which is my area of specialty as a Ph.D. student at UC Berkeley. The second can be broadly categorized as "deafness," which relates to my experience and knowledge of being deaf.
https://danieltakeshi.github.io/
Thu, 09 Nov 2017 21:26:31 -0800Thu, 09 Nov 2017 21:26:31 -0800Jekyll v3.6.2Understanding and Categorizing Scalable MCMC and MH Papers at a High Level<p>When reading academic papers about a certain subfield, I often find it difficult
to clearly understand how they <em>connect</em> with each other. For example, what
algorithms are based on other algorithms? Can the contributions of two papers be
combined? Would combining them result in notable improvements or just
on-the-margin, negligible changes? (The answer to that last question is usually
“the latter” but it’s something we should at least <em>consider</em>.)</p>
<p>This post is an attempt to unify my understanding of papers related to scalable
Markov Chain Monte Carlo and scalable Metropolis-Hastings. By “scalable,” I
refer to the usual meaning of using these algorithms in the large data regime.</p>
<p>These are the papers I’m trying to understand:</p>
<ul>
<li><em>MCMC Using Hamiltonian Dynamics</em>, Handbook of MCMC 2010</li>
<li><em>Bayesian Learning via Stochastic Gradient Langevin Dynamics</em>, ICML 2011</li>
<li><em>Stochastic Gradient Hamiltonian Monte Carlo</em>, ICML 2014</li>
<li><em>Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget</em>, ICML 2014</li>
<li><em>Towards Scaling up Markov Chain Monte Carlo: An Adaptive Subsampling Approach</em>, ICML 2014</li>
<li><em>Firefly Monte Carlo: Exact MCMC with Subsets of Data</em>, UAI 2014</li>
<li><em>On Markov Chain Monte Carlo Methods For Tall Data</em>, JMLR 2017.</li>
</ul>
<p>(All of them are freely available online.)</p>
<p>First, I’ll briefly discuss why we care about the problem of scalability with
MCMC and MH. Then, I’ll group these papers into categories and explain how they
are connected to each other. This will then motivate <a href="https://arxiv.org/abs/1610.06848">our UAI 2017 paper</a>,
<em>An Efficient Minibatch Acceptance Test for Metropolis-Hastings</em>.</p>
<h1 id="why-markov-chain-monte-carlo">Why Markov Chain Monte Carlo?</h1>
<p>I’m not going to review MCMC here as you can find many other references, both
online and in textbooks. It may help to look at <a href="https://danieltakeshi.github.io/2016-06-19-some-recent-results-on-minibatch-markov-chain-monte-carlo-methods/">my blog post from June 2016</a>
where I describe the general problem setting. <a href="http://bair.berkeley.edu/blog/2017/08/02/minibatch-metropolis-hastings/">My more recent BAIR Blog post</a>
also contains some potentially useful background material.</p>
<p>But why use MCMC at all? Here’s one reason: if we use it to sample some model’s
parameter <script type="math/tex">\theta</script>, then the chain of samples <script type="math/tex">\{\theta_1, \ldots,
\theta_T\}</script> should let us quantify useful statistics about properties of
interest. Two of these are the expectation and variance of something, which we
might apply on the parameter itself. We can estimate (for example) the
expectation by taking a sequence of the <script type="math/tex">K</script> most recent samples (or a
subsampled sequence) from our chain <script type="math/tex">\{\theta_{T-K+1}, \ldots, \theta_T\}</script> and
then taking the sample vector-valued expectation. More generally, letting <script type="math/tex">f</script>
be a function of the parameters, we can compute the expectation
<script type="math/tex">\mathbb{E}[f(\theta)]</script> using the expectation of the sampled values
<script type="math/tex">\{f(\theta_{T-K+1}),\ldots, f(\theta_T)\}</script>.</p>
<p>We can’t do this if we take stochastic gradient steps, because the samples from
SGD <em>are not from the posterior distribution</em> of the parameter. SGD is designed
to converge around a <em>single point</em> in the space of possible <script type="math/tex">\theta \in
\Theta</script> values, unlike MCMC methods which are supposed to approximate a
<em>distribution</em>, which can then be used for sample estimates of expectations and
variances.</p>
<p>My perspective is supported in papers such as the SGLD paper from 2011 (one of
the papers I listed above); the authors (Welling & Teh) claim that:</p>
<blockquote>
<p>Bayesian methods are appealing in their ability to capture uncertainty in
learned parameters and avoid overfitting. Arguably with large datasets there
will be little overfitting. Alternatively, as we have access to larger
datasets and more computational resources, we become interested in building
more complex models, so that there will always be a need to quantify the
amount of parameter uncertainty.</p>
</blockquote>
<p>So … that’s why we like the Bayesian perspective. These authors are
rock-stars, by the way, so I generally trust their conclusions.</p>
<p>I’ll be honest, though: I can’t think of something nontrivial I’ve done in which
the Bayesian perspective was <em>that</em> useful to me. In Deep Learning, Deep
Imitation Learning, and Deep Reinforcement Learning, I’ve never used priors and
posteriors; RMSProp or Adam is good enough, and it seems like this goes for the
rest of the community. Maybe it’s just not that necessary in these domains? I
have two papers on my reading list, <a href="https://papers.nips.cc/paper/6501-deep-exploration-via-bootstrapped-dqn">Bootstrapped DQNs</a> and <a href="https://papers.nips.cc/paper/6117-bayesian-optimization-with-robust-bayesian-neural-networks">Robust Bayesian
Neural Networks</a>, which might clarify some of my questions regarding how much
of a Bayesian perspective is needed in Deep Learning. I should also definitely
check out the <a href="http://bayesiandeeplearning.org/">Bayesian Deep Learning NIPS workshop</a>.</p>
<h1 id="langevin-dynamics-and-hamiltonian-dynamics">Langevin Dynamics and Hamiltonian Dynamics</h1>
<p>This section concerns the following three papers:</p>
<ul>
<li>MCMC Using Hamiltonian Dynamics, Handbook of MCMC 2010</li>
<li>Bayesian Learning via Stochastic Gradient Langevin Dynamics, ICML 2011</li>
<li>Stochastic Gradient Hamiltonian Monte Carlo, ICML 2014</li>
</ul>
<p>I gave a brief introduction to Langevin Dynamics <a href="https://danieltakeshi.github.io/2016-06-19-some-recent-results-on-minibatch-markov-chain-monte-carlo-methods/">in my earlier blog post</a>,
so just to summarize for this one, Langevin Dynamics injects an appropriate
amount of noise so that (in our context) a gradient-based algorithm will
converge to a <em>distribution</em> over the posterior of <script type="math/tex">\theta</script>. The <em>Stochastic
Gradient</em> Langevin Dynamics algorithm combines the computational efficiencies of
SGD by using a minibatch gradient, but uses the Langevin noise to appropriately
cover the posterior:</p>
<blockquote>
<p>[…] Langevin dynamics which injects noise into the parameter updates in such
a way that the trajectory of the parameters will converge to the full
posterior distribution rather than just the maximum a posteriori mode.</p>
</blockquote>
<p>As a follow-up, the Stochastic Gradient <em>Hamiltonian Monte Carlo</em> (SGHMC)
algorithm is similar to SGLD in that it uses a minibatch gradient along with
“exploration noise.” This time, the noise is from Hamiltonian Monte Carlo, which
is more sophisticated than Langevin Dynamics since HMC introduces extra momentum
variables which allows for larger jumps.</p>
<p>Radford Neal’s excellent 2010 book chapter goes over HMC in great detail, so I
won’t go through the details here (though I’d like to write a blog post solely
about HMC — so stay tuned!). Just to give a quick overview, though, our
problem context is similar, where we have a target posterior:</p>
<script type="math/tex; mode=display">p(\theta \mid x_1, \ldots, x_N) \propto \exp(-U(\theta))</script>
<p>with <em>potential energy</em> function</p>
<script type="math/tex; mode=display">U(\theta) = -\log p(\theta) - \sum_{i=1}^{N} \log p(x_i \mid \theta).</script>
<p>(Don’t worry too much about the “potential energy” terminology; HMC was
originally developed from a physics background. We’re still in the same problem
setting.)</p>
<p>HMC generates samples from a <em>joint distribution</em> that involves extra <em>momentum</em>
variables:</p>
<script type="math/tex; mode=display">\pi (\theta, r) \propto \exp\left(-U(\theta) - \frac{1}{2}r^TMr\right)</script>
<p>where <script type="math/tex">r</script> are the momentum variables and <script type="math/tex">M</script> is a mass matrix. The update
rules are:</p>
<ul>
<li><script type="math/tex">\theta = \theta + \tau \cdot M^{-1}r</script>.</li>
<li><script type="math/tex">r = r - \tau \cdot \nabla U(\theta)</script>.</li>
</ul>
<p>where <script type="math/tex">\tau</script> is some step size. If this doesn’t make sense, read Neal’s 2010
book chapter.</p>
<p>The result from HMC is a set of samples <script type="math/tex">\{(\theta_i,r_i)\}_{i=1}^T</script>. But
we’re only interested in the <script type="math/tex">\theta_i</script>s, so … we simply drop the <script type="math/tex">r_i</script>
terms to get our samples for <script type="math/tex">\theta</script>. Amazingly, <script type="math/tex">\theta</script> is sampled from
the correct target distribution, which one can show via some “reversibility”
analysis.</p>
<p>SGHMC needs a little massaging to actually get it to sample the target
distribution, since simply taking a subset of the data to compute an
approximation to <script type="math/tex">\nabla U(\theta)</script> will lose the “Hamiltonian Dynamics”
property; the authors resolve this by using second-order Langevin Dynamics to
counteract the effect of too much gradient noise in estimating <script type="math/tex">\nabla
U(\theta)</script>, and the result is a similar algorithm to SGLD except with a
different noise term.</p>
<p>Just to be clear, both SGLD and SGHMC are minibatch, gradient-based algorithms
that are also considered “Bayesian methods.” Neither are pure random walks,
i.e., neither use Gaussian <em>proposals</em> because the proposals are based on the
<em>stochastic gradient</em> value, <em>plus</em> some additive noise term. For SGLD, that
extra noise is actually a random walk, but not for SGHMC.</p>
<p>For both SGLD and SGHMC, we have to apply the Metropolis-Hastings test for
computer implementations due to discretization error, even though <em>in theory</em> we
shouldn’t have to since energy is preserved. In both papers, the authors
decrease step sizes to zero so that the MH rejection rate goes to zero.
Intuitively, smaller step sizes mean samples are concentrated into higher
regions of the posterior, and the gradient ensures going in the direction of
greatest increase of the posterior probability. In addition, decreasing step
sizes also means discretization error decreases, which yet again further reduces
the need for MH tests. While this is great, because the MH test requires
full-batch computation, perhaps we are missing out somehow by keeping our step
sizes small.<sup id="fnref:notsure"><a href="#fn:notsure" class="footnote">1</a></sup></p>
<h1 id="metropolis-hastings">Metropolis-Hastings</h1>
<p>In this section, I discuss the remaining papers listed at the introduction of
this post. They are related in some form to the Metropolis-Hastings algorithm,
which is commonly used in MCMC techniques to act as a correction to ensure that
samples do not deviate too frequently away from the target posterior
distribution.</p>
<p>As I mentioned in both of my <a href="http://bair.berkeley.edu/blog/2017/08/02/minibatch-metropolis-hastings/">earlier</a> <a href="https://danieltakeshi.github.io/2016-06-19-some-recent-results-on-minibatch-markov-chain-monte-carlo-methods/">blog</a> posts, conventional MH tests
require a full pass over the entire dataset. This makes them extremely costly,
and is one of the reasons why both SGLD and SGHMC emphasized how decreasing step
sizes results in lower discretization error, so that they could omit the MH
tests.</p>
<p>Their computational cost has raised the question over whether using subsamples
of the data for the MH test computation is feasible. It’s not as straightforward
as taking a fixed-sized subset (i.e., minibatch) of the dataset because that
results in a non-trivial target distribution which is not the desired
posterior.</p>
<p>The following two papers propose subsampling-based algorithms that attempt to
tackle the high cost of full-batch MH tests:</p>
<ul>
<li>Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget, ICML 2014</li>
<li>Towards Scaling up Markov Chain Monte Carlo: An Adaptive Subsampling Approach,
ICML 2014</li>
</ul>
<p>I discussed the first one in <a href="https://danieltakeshi.github.io/2016-06-19-some-recent-results-on-minibatch-markov-chain-monte-carlo-methods/">an earlier blog post</a>. The second one follows a
similar procedure as the first one, except that it uses a slightly different way
of interpreting when to stop data collection. The downside, as I painfully
realized when I tried to implement this, is that due to its concentration
bounds, it requires a real-valued parameter which depends on the <em>entire</em>
collection of <script type="math/tex">\{\log p(x_i\mid \theta), \log p(x_i\mid \theta')\}_{i=1}^N</script>
values <em>each iteration</em>, which defeats the point of using a subset of the data.
(Here, I use <script type="math/tex">\theta'</script> to denote the proposed distribution.)</p>
<p>The authors of the Adaptive Subsampling paper have a follow-up JMLR 2017 paper
(it was under review for a <em>long</em> time) which expands upon this discussion. I
found it quite useful, particularly because of their proof (in Section 6.1)
about how naive subsampling for the MH test results in a nontrivial and
hard-to-interpret target distribution. In Section 6.3, they introduce a novel
contribution where they rely on <em>subsampling noise for exploration</em>; that is,
use the minibatch-induced noise (which is Gaussian by the Central Limit Theorem)
to explore the posterior. However, they showed that this approach still seems to
require <script type="math/tex">O(n)</script> data points each iteration. On the other hand, they didn’t
investigate this method in too much detail, so it’s hard to comment on its
usefulness.</p>
<p>The last related work was the Firefly paper, which won the Best Paper Award at
UAI 2014. It can perform exact MCMC, but the main drawback is (emphasis mine):</p>
<blockquote>
<p>FlyMC is compatible with a wide variety of modern MCMC algorithms, and <strong>only
requires a lower bound on the per-datum likelihood factors</strong>.</p>
</blockquote>
<p>To be clear on what this means, they require the existence of functions
<script type="math/tex">B_i(\theta)</script> satisfying <script type="math/tex">0 \le B_i(\theta) \le p(x_i \mid \theta)</script> for
<em>all</em> <script type="math/tex">i</script>. How realistic is that? I have no idea, honestly, but it seems like
something that is difficult to achieve in practice, especially because it’s
conditioning on <script type="math/tex">\theta</script> and <script type="math/tex">\theta'</script>, which will vary considerably
throughout sampling. There is some interesting discussion about this <a href="https://xianblog.wordpress.com/2014/04/02/firefly-monte-carlo/">at
Christian Roberts’ excellent blog</a>, with Ryan Adams (the professor co-author)
commenting.</p>
<p>The prior work then motivated our work, where we avoided needing these
assumptions and showed that we could cut the M-H test cost time down to one
equivalent with SGD, without loss of performance. There’s no free lunch,
though; our algorithm has applicability constraints but those are hopefully not
that restrictive. <a href="http://bair.berkeley.edu/blog/2017/08/02/minibatch-metropolis-hastings/">Check out our BAIR blog post</a> for more information.</p>
<h1 id="conclusion">Conclusion</h1>
<p>I’ve discussed these set of papers and tried grouping them together to see a
coherent theme in all of this. Hopefully this makes it clearer what these papers
are trying to do.</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:notsure">
<p>I’m actually not sure if we can even use the Metropolis-<em>Hastings</em>
test (and not just the “Metropolis Algorithm”) with SGHMC. The
authors of the SGHMC paper claim that MH tests are impossible for both SGLD
and SGHMC since the reverse proposal probability <script type="math/tex">q(\theta \mid \theta')</script>
cannot be computed. It seems to me, however, that one can compute the SGLD
reverse probability because that’s a Gaussian centered at the gradient term
with some known variance. What am I missing here? At the very least,
applying the MH test to regular HMC should be OK, since we can omit the
proposal probabilities. And that’s what both the SGHMC authors (judging
from <a href="https://github.com/tqchen/ML-SGHMC">Tianqi Chen’s source code</a>) and Radford Neal do in their experiments. <a href="#fnref:notsure" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Thu, 09 Nov 2017 02:00:00 -0800
https://danieltakeshi.github.io/2017/11/09/understanding-and-categorizing-scalable-mcmc-and-mh-papers-at-a-high-level/
https://danieltakeshi.github.io/2017/11/09/understanding-and-categorizing-scalable-mcmc-and-mh-papers-at-a-high-level/Don't Focus on Writing Ability; Focus on Technical Skills<p>In the process of applying to graduate school, and then visiting schools that
admitted me, I was told that PhD students needed to possess solid writing
ability in addition to technical skills. One UT Austin professor told me he
believed liberal arts students (like me) were better prepared than those from
large research universities, presumably because of our increased exposure to
writing courses. One Cornell professor emphasized the importance of writing by
telling me that he spent at least 50 percent of his professional life writing.
A Berkeley professor who I frequently collaborate with has a private Google Doc
that he gives to students with instructions on writing papers, particularly
about how to structure an introduction, what phrases to use, and so on.</p>
<p>The ability to write well is an important skill for academics, and I don’t mean
to dismiss this outright. However, I think that we need to be very clear that
technical skills matter <em>far, far more</em> for the typical graduate student, at
least for computer science students focusing in artificial intelligence like me.
I would additionally argue that factors such as research advisors and graduate
student collaborators matter more than writing ability.</p>
<p>Perhaps the emphasis on writing skills is aimed at two groups of people:
international students, and the very best graduate students for whom technical
skills are relatively less of a research bottleneck. I won’t comment too much on
the former group, besides saying that I absolutely respect their commitment to
learning the English language and that I know I’m incredibly lucky to be a
native English user.</p>
<p>I bring up the second group because much of the advice I get are from faculty at
top institutions who were stellar graduate students. Perhaps most of their
academic life is dominated by the time it takes to convert research
contributions to a paper, instead of the time it takes to <em>actually come up with
the contribution itself</em>. For instance, this is what UT Austin professor Scott
Aaronson had to say in an <a href="https://www.scottaaronson.com/blog/?p=478">old 2005 (!!) blog post</a>, back when he was a
postdoc (emphasis mine):</p>
<blockquote>
<p>I’ll estimate that <strong>I spend at least two months on writing for every week on
research.</strong> I write, and rewrite, and rewrite. Then I compress to 10 pages for
the STOC/FOCS/CCC abstract. Then I revise again for the camera-ready version.
Then I decompress the paper for the journal version. Then I improve the
results, and end up rewriting the entire paper to incorporate the improvements
(which takes much more time than it would to just write up the improved
results from scratch). Then, after several years, I get back the referee
reports, which (for sound and justifiable reasons, of course) tell me to
change all my notation, and redo the proofs of Theorems 6 through 12, and
identify exactly which result I’m invoking from [GGLZ94], and make everything
more detailed and rigorous. But by this point I’ve forgotten the results and
have to re-learn them. And all this for a paper that maybe five people will
ever read.</p>
</blockquote>
<p>Two months of writing for every week of research? I have no idea how that is
humanly possible.</p>
<p>For me, the <em>reverse</em> holds: I probably spend two months of <em>research</em> for every
week of actual writing. What dominates my academic life is the time it takes (a)
to process the details from academic papers so that I understand how their ideas
work, and (b) to build upon those results with my own novel contribution.
Getting intuition on novel artificial intelligence concepts takes a considerable
amount of mathematical thinking, and getting them to work in practice requires
programming skills. Both math and programming fall under the realm of “technical
skills.”</p>
<p>Obviously, once I HAVE a research contribution, <em>then</em> I have to “worry” about
writing it, but I enjoy writing so it is no big deal.</p>
<p>But again, <em>the research contribution itself must first exist</em>. That’s what
frustrates me about much of the academic advice that I see. Yes, it’s easier to
tell someone how to write (use this phrase, don’t use this phrase, active
instead of passive, blah blah blah), but it would be better to explain the
thought process on how to come up with an original research contribution.</p>
<p>I conclude:</p>
<blockquote>
<p>I would happily trade away some of my writing ability for a commensurate
increase in technical skill.</p>
</blockquote>
<p>Again, I am not disregard writing ability, since it is incredibly valuable for
many reasons (such as for blogging!!) and more applicable than technical skills
in life. However, I believe that the biggest priority for computer science
doctoral students should be to focus on technical skills.</p>
Sat, 04 Nov 2017 03:00:00 -0700
https://danieltakeshi.github.io/2017/11/04/dont-focus-on-writing-ability-focus-on-technical-skills/
https://danieltakeshi.github.io/2017/11/04/dont-focus-on-writing-ability-focus-on-technical-skills/Random Thoughts on Recent News<h2 id="nobel-peace-prize">Nobel Peace Prize</h2>
<p>The winner of the 2017 Nobel Peace Prize is the International Campaign to
Abolish Nuclear Weapons (ICAN). <a href="https://www.nobelprize.org/nobel_prizes/peace/laureates/2017/press.html">The award statement is</a>:</p>
<blockquote>
<p>The organization is receiving the award for its work to draw attention to the
catastrophic humanitarian consequences of any use of nuclear weapons and for
its ground-breaking efforts to achieve a treaty-based prohibition of such
weapons.</p>
</blockquote>
<p>The first part of this statement — that nuclear weapons have “catastrophic
humanitarian consequences” — is obvious and should be universally agreed upon.</p>
<p>Unfortunately, while the second part about creating a treaty-based prohibition
sounds great in theory, it would not work in practice. So long as rogue states
such as North Korea continue to develop their own nuclear programs, the United
States needs to maintain its own stockpile of such weapons. The Wall Street
Journal editorial board got it correct when they described the award as the
<a href="https://www.wsj.com/articles/the-nobel-alternate-reality-prize-1507331044">Nobel Alternate Reality prize</a>, and I honestly think they weren’t harsh
enough on the Nobel Committee.</p>
<p>This award reminded me about when President Obama was once considering adopting
a “No Nuclear First” policy. While I generally supported President Obama, this
would have been a mistake. Fortunately, <a href="https://www.nytimes.com/2016/09/06/science/obama-unlikely-to-vow-no-first-use-of-nuclear-weapons.html">his own administration shot down the
idea</a>. Now the next step is to convince politicians such as <a href="https://www.feinstein.senate.gov/public/index.cfm/press-releases?ID=B9F9380E-A2DA-4818-BFBF-88A97ED9470F">Senator Dianne
Feinstein</a> to steer away from these ideas.</p>
<h2 id="harvey-weinstein">Harvey Weinstein</h2>
<p>As most of us know, <a href="https://www.nytimes.com/2017/10/05/us/harvey-weinstein-harassment-allegations.html">Harvey Weinstein systematically assaulted women</a>
throughout much of his famed career, and is rightfully being scorned and
disgraced. Good riddance.</p>
<p>My first reaction was, <em>wow</em>. How did Weinstein get away with his behavior over
all these years? I certainly hope that other men who have done these things face
similar consequences and avoid getting away scot-free by paying their way out.</p>
<p>By the way, the fact that Weinstein was a prominent supporter of Democratic
causes is, in my opinion, irrelevant. There are plenty of bad men in all
political parties. I don’t need to name them — we know who they are. Let’s
condemn them and, if they’ve donated money to politicians, to encourage those
politicians to redirect that money to organizations that combat sexual assault.</p>
<p>Now the uncomfortable question I face is: <em>are there Harvey Weinsteins in
(academic) computer science?</em> I hope not. From my experience, I have never
noticed any man (or woman, for that matter) exhibit the kind of behavior as
Weinstein. At least that counts for something.</p>
<p>Lastly, I certainly hope <em>I</em> have never been like Weinstein. One of the things
I’ve learned from reading Weinstein-like stories is to <em>avoid touching people</em>.
I am not much of a “touching” person. One handshake when I meet new people is
enough for me! Sure, if other men or women want to hug me, fine, go ahead. I
just won’t <em>initiate</em> the hugging, sorry. I am constantly worried that I’ll
commit a micro-aggression.</p>
<h2 id="the-nba-and-nfl">The NBA and NFL</h2>
<p>NBA rookie Lonzo Ball has a huge target on him due to his outspoken Dad, LaVar
ball. LaVar has made a few comments that remind me of Weinstein, particular with
<a href="https://www.usatoday.com/story/sports/2017/07/31/lavar-ball-female-referee-sparks-referee-group-break-adidas/524666001/">his criticism of female basketball referees</a>. To be clear, LaVar Ball
doesn’t seem <em>remotely</em> as bad as Weinstein, but at least I see the
<em>resemblence</em>, if you get what I mean. And I am, quite frankly, annoyed at all
the attention he gets.</p>
<p>The good news is that at least he’s <em>there</em> as a Dad and seems extremely
supportive of his children. I got a poignant reminder about this from <a href="http://theundefeated.com/features/nba-wizards-coach-scott-brooks-wishes-he-had-had-a-father-like-lavar-ball/">reading
this article from The Undefeated</a> about how Scott Brooks wished he had a
father. I can certainly see how someone like him would view the situation
differently.</p>
<p>In the NFL, Houston Texans owner Robert McNair made an <a href="https://www.si.com/nfl/2017/10/28/bob-mcnair-inmates-comment-houston-texans-players">ill-advised comment
about calling NFL players “inmates”</a>. Ouch. That was a mistake, and I’m happy
he showed remorse and seems to regret his actions. If NFL players protest over
these comments, well I can’t blame them. If they want to kneel for the flag, for
instance, that’s their First Amendment right to do so and I will support them.</p>
<p>On the same token, the NFL players have America’s attention. Great. Now the next
and exponentially harder step is to figure out what to do with that attention.
Kneeling for the flag won’t work forever, but I don’t know how else the players
should proceed so that their messages and goals have a high probability of
seeing reality.</p>
<h2 id="quantum-computing">Quantum Computing</h2>
<p>There was a recent article <a href="https://www.wsj.com/articles/the-computer-that-could-rule-the-world-1509143922">in the Wall Street Journal about the race to create
a quantum computer</a>. Even though I have very little intuition on how quantum
computers work, the author makes a reasonably compelling case for this to be our
next “Manhattan Project”. I would, however, like to raise two points.</p>
<p>First, I reserve the right to dismiss this race as ill-informed if <a href="https://www.scottaaronson.com/blog/">Scott
Aaronson</a> says so on his blog. (<em>Professor Aaronson, I am waiting …</em>)</p>
<p>Second, why not consider another project that is <em>also</em> crucial for the United
States (and the planet, for that matter)? That would be <strong>Energy Independence</strong>,
or the <strong>The Green Revolution</strong>, as proposed by authors such as Thomas L.
Friedman. Rather than rely on Middle Eastern countries (and indirectly, radical
Islam) for our oil, we can instead develop our own. Or even better, we can focus
on renewable energy.</p>
<p>I am under no illusions that this will be hard to achieve, both for political
reasons and because people don’t like to be pressured to do things that lower
their standard of living. For example, I still use my car regularly even though
I know that public transportation would be better for the environment.</p>
<p>The good news is that with more people living in cities, it will be easier for
the country to use less energy, and that’s why I still have hope. For additional
details, I recommend reading <a href="http://www.sierraclub.org/planet/2017/04/climate-hope">Climate of Hope</a>, which was published just a
few months ago.</p>
<h2 id="dynamic-routing-through-capsules">Dynamic Routing Through Capsules</h2>
<p>OK, this isn’t really news that will capture the minds of the general public,
but it’s certainly taking the AI-world by storm. Researchers Sara Sabour,
Nicholas Frosst, and Geoffrey Hinton finally posted a <a href="https://arxiv.org/abs/1710.09829">long-awaited preprint,
<em>Dynamic Routing Through Capsules</em></a>. There is substantial interest in this
paper because Hinton has long been brainstorming alternatives to backpropagation
for training neural networks. He has also been thinking about <a href="https://arxiv.org/abs/1701.06538">dramatically
different neural network designs</a> rather than making incremental changes to
fully connected or convolutional nets. This is despite how he, perhaps more than
anyone else, has been responsible for bringing their good-ness out in the open,
revolutionizing the field of AI and, indeed, the world.</p>
<p>I wish I had the time to read the paper and write a detailed read-through blog
post, but alas, it came on arXiv the night before I had an interview.</p>
<p>At this point, I think any top-tier conference paper with Hinton’s name attached
to it is worth reading. I’m <a href="https://danieltakeshi.github.io/2017/04/06/sir-tim-berners-lee-wins-the-turing-award/">already on-record as saying that</a> I predict
Geoffrey Hinton to win the next Turing Award, so I expect that his papers will
have extremely high research contribution.</p>
<h2 id="surgical-and-home-robotics">Surgical and Home Robotics</h2>
<p>Lastly but certainly not least, consider checking out some recent research from
the <a href="http://autolab.berkeley.edu/">Berkeley AUTOLAB</a>, of which I’m a member. The Berkeley AI Research blog
recently featured back-to-back posts about imitation learning algorithms with
applications to <a href="http://bair.berkeley.edu/blog/2017/10/17/lfd-surgical-robots/">surgical robotics</a> and <a href="http://bair.berkeley.edu/blog/2017/10/26/dart/">home robotics</a>, respectively.
Recall that I <a href="https://danieltakeshi.github.io/2017/06/20/the-bair-blog-is-now-live/">serve on the BAIR Blog editorial board</a>, and in particular, I
was in charge for formatting and then pushing these posts live. I hope these two
posts are informative to the lay reader interested in AI.</p>
Sun, 29 Oct 2017 03:00:00 -0700
https://danieltakeshi.github.io/2017/10/29/random-thoughts-on-recent-news/
https://danieltakeshi.github.io/2017/10/29/random-thoughts-on-recent-news/Learning to Act by Predicting the Future<p>I first heard about the paper <em>Learning to Act by Predicting the Future</em> after
one of the authors, Vladlen Koltun, came to give a highly entertaining talk as
part of <a href="https://berkeley-deep-learning.github.io/cs294-131-f17/">Berkeley’s Deep Learning seminar course (CS 294-131)</a>.</p>
<p>In retrospect, I’m embarrassed it took me this long to find out about the work.
It’s research that feels highly insightful and should have been clear to us all
along — yet somehow we never saw it until those authors presented it to us.
To me, that’s an indicator of high-caliber research.</p>
<p>Others have agreed. <em>Learning to Act by Predicting the Future</em> was accepted as
an oral presentation at ICLR 2017, meaning that it was one of the top 15 or so
papers. You can check out the <a href="https://openreview.net/forum?id=rJLS7qKel">favorable reviews on OpenReview</a>. It was
<a href="https://blog.acolyer.org/2017/05/12/learning-to-act-by-predicting-the-future/">also featured on Adrian Colyer’s blog</a>. And of course, it was featured in my
Deep Learning class.</p>
<p>So what is the research contribution of the paper? Here’s a key passage in the
introduction which explains their framework:</p>
<blockquote>
<p>Our approach departs from the reward-based formalization commonly used in RL.
Instead of a monolithic state and a scalar reward, we consider a stream of
sensory input <script type="math/tex">\{s_t\}</script> and a stream of measurements <script type="math/tex">\{m_t\}</script>. The
sensory stream is typically high-dimensional and may include the raw visual,
auditory, and tactile input. The measurement stream has lower dimensionality
and constitutes a set of data that pertain to the agent’s current state.</p>
</blockquote>
<p>To be clear, at each time <script type="math/tex">t</script>, we get one sensory input and one set of
(scalar-valued) measurements, so our observation is <script type="math/tex">o_t = \langle s_t, m_t
\rangle</script>. Their running test platform in the paper is the first-person shooter
Doom environment, so <script type="math/tex">s_t</script> represents <em>images</em> and <script type="math/tex">m_t</script> represents
<em>attributes</em> in the game such as health and supply levels.</p>
<p>This is an intuitive difference between <script type="math/tex">s_t</script> and <script type="math/tex">m_t</script>. There are, however,
two important <em>algorithmic</em> differences:</p>
<ul>
<li>
<p>Given actions taken by the agent, they attempt to <em>predict</em> <script type="math/tex">m_t</script>, hence
“predicting the future”. It’s very hard to predict full-blown images, but
predicting (much-smaller) measurements shouldn’t be nearly as challenging.</p>
</li>
<li>
<p>The measurement vector <script type="math/tex">m_t</script> is used to shape the agent’s <em>goals</em>. They
assume the agent wants to maximize</p>
<script type="math/tex; mode=display">u(f;g) = g^\top f</script>
<p>where</p>
<script type="math/tex; mode=display">f = \langle m_{t+\tau_1}-m_t, \ldots, m_{t+\tau_n}-m_t\rangle</script>
<p>Thus, the goal is to maximize this inner product of the <em>future</em> measurements
and a parameter vector <script type="math/tex">g</script> weighing the relative importance of each terms.
Note that this instantly generalizes the case with a scalar reward signal in
MDPs: we’d set the elements of <script type="math/tex">g</script> such that they are <script type="math/tex">\gamma^0, \gamma^1,
\gamma^2, \ldots</script>, i.e. corresponding to discounted rewards. (I’m assuming
that <script type="math/tex">m_t</script> is a scalar here, but this generalizes to the vector case with
<script type="math/tex">f</script> a matrix, as we could flatten <script type="math/tex">f</script> and <script type="math/tex">g</script>.)</p>
</li>
</ul>
<p>In order to predict <script type="math/tex">m_t</script>, they have to train a function to do so, which they
parameterize with (you guessed it) deep neural networks. They define the
function <script type="math/tex">F</script> as the predictor, with</p>
<script type="math/tex; mode=display">p_t^a = F(o_t, a_t, g ; \theta)</script>
<p>Thus, given the observation, action, and goal vector parameter, we can <em>predict</em>
the resulting measurements, so that during test-time applications, <script type="math/tex">p_t^a</script> is
“plugged in” for <script type="math/tex">f</script> and the action which maximizes <script type="math/tex">u</script> is chosen. To make
this work mathematically, of course, <script type="math/tex">f,g,</script> and <script type="math/tex">p_t^a</script> must all have the
same dimension. And to be clear, even though the reward (as they define it) is a
function of <script type="math/tex">g</script>, we are not “training” the <script type="math/tex">g</script> parameters but the parameters
for <script type="math/tex">F</script>.</p>
<p>The parameters of <script type="math/tex">F</script>, incidentally, are trained in an <em>unsupervised</em> manner,
or using “self-supervision” since the labels can be generated automatically by
having the agent wander around in the world and then repeatedly computing the
value of the function output at each of those time steps. Then, after some time
has passed, we simply minimize the <script type="math/tex">L_2</script> loss. Nice, no humans needed for
labeling! When I was reading this, I was reminded of the Q-learning update,
since the update rule automatically assumes that the “target” is the usual
“reward plus discounted max Q-value” thingy, without human intervention. To
further the connection with Q-learning, they use an <em>experience memory</em> in the
same way as the DQN algorithm used experience <em>replay</em> (<a href="https://danieltakeshi.github.io/2016/12/01/going-deeper-into-reinforcement-learning-understanding-dqn/">see my earlier blog
post about DQN</a>). Another concept that came to mind was Sergey Levine’s
excellent paper on <a href="https://sites.google.com/site/brainrobotdata/home">learning hand-eye coordination</a>, where he and his
collaborators were able to automatically generate labels. I need to figure out
how to do stuff like this more often.</p>
<p>Anyway, given that <script type="math/tex">F</script> takes in three inputs, one would intuitively expect
that it has three separate input networks and concatenates them at some point.
Indeed, that’s what they do in their network, shown below.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/learning_to_act_predict_future.png" alt="network_architecture" />
</p>
<p>After concatenation, the network follows the paradigm of the Dueling DQN
architecture by having separate expectation and value (“action”) streams. It
might not be clear why this is useful, so if you’re puzzled, I recommend reading
the Dueling DQN paper for justification (I need to re-read that as well).</p>
<p>They benchmark their paradigm (called DFP for “Direct Future Prediction”) on
Doom with four scenarios of increasing difficulty. The baselines are the
well-known DQN and A3C algorithms, along with a relatively obscure “DSR”
algorithm (but which used the Doom platform, facilitating comparisons). I’m not
sure why they used DQN instead of, say, Double DQN or prioritized variants since
those are assumed to be strictly better, but at least they test using A3C which
as far as I can tell is on par with the best DQN variants. It’s too bad that
<a href="https://github.com/openai/baselines">OpenAI baselines</a> wasn’t around in time for the authors to use it for this
paper.</p>
<p>They say that</p>
<blockquote>
<p>We set the temporal offsets <script type="math/tex">\tau_1, \ldots, \tau_n</script> of predicted future
measurements to 1, 2, 4, 8, 16, and 32 steps in all experiments. Only the
latest three time steps contribute to the objective function, with
coefficients <script type="math/tex">(0.5, 0.5, 1)</script>.</p>
</blockquote>
<p>I <em>think</em> they say this to mean that their goal vector <script type="math/tex">g</script> contains only three
nonzero components, corresponding to <script type="math/tex">\tau_{n-2}, \tau_{n-1}</script> and <script type="math/tex">\tau_n</script>.
But then I’m confused: why do they need to have all the other <script type="math/tex">\tau_i</script> for
<script type="math/tex">1 \le i \le n-3</script>? What’s also confusing is that for the two complicated
environments with ammo, health, and frags, their training is set to maximize a
linear combination of those three, with coefficients <script type="math/tex">(0.5, 0.5, 1.0)</script>. The
same vector is repeated here!</p>
<p>I wish they had expanded upon their discussion since this is new stuff from
their paper. Why did they choose this and that value? What is the intuition? I
know it’s easy to ask this from a reading/reviewing perspective, but that’s only
because the concept is new; for example, they do not need to justify why they
chose the dueling-style architecture because they can refer to the Dueling DQN
paper.</p>
<p>Regarding experiments, I don’t have much intuition on the vizDoom environments,
as I have never used those, but their results look impressive on the two harder
scenarios, which also provide more measurements (three instead of one). Their
method out-performs sophisticated baselines in various settings, including
those from the Visual Doom AI competition in September 2016.</p>
<p>At the end of the experimental section, after a few ablation studies (heh, my
favorite!) they convincingly claim that</p>
<blockquote>
<p>This supports the intuition that a dense flow of multivariate measurements is
a better training signal than a scalar reward.</p>
</blockquote>
<p>In my words: <em>dense signals are better than sparse signals.</em> In some cases,
sparsity is desirable (e.g. in attention models, we <em>want</em> sparsity to focus on
a few important components) but for <em>rewards</em> in reinforcement learning, we
definitely need dense signals. Note that getting such signals wouldn’t be
possible if the authors kept clinging to the usual MDP formulation. Indeed,
Koltun made it a point in his talk to emphasize how he disagreed with the
constraints imposed on us by the MDP formulation, with the usual “set of states,
actions, rewards […]”. This is one of the things I wish I was better at:
identifying certain gaps in assumptions that everyone makes, and trying to
figure out where we can improve them.</p>
<p>That’s all I have to say for this paper now. For more details, I would check
<a href="http://vladlen.info/publications/learning-act-predicting-future/">the paper website</a>. Have fun!</p>
Tue, 10 Oct 2017 03:00:00 -0700
https://danieltakeshi.github.io/2017/10/10/learning-to-act-by-predicting-the-future/
https://danieltakeshi.github.io/2017/10/10/learning-to-act-by-predicting-the-future/Thoughts on Dale Carnegie's "How to Win Friends and Influence People"<p>Last night, I finished reading Dale Carnegie’s book <em>How to Win Friends and
Influence People: The Only Book You Need to Lead You to Success</em>. This is the
31st book I’ve read in 2017, and hopefully I will exceed the <a href="https://danieltakeshi.github.io/2016/12/31/all-the-books-i-read-in-2016-plus-my-thoughts-long">38 books I read in
2016</a>.</p>
<p>Carnegie’s book is well-known. It was originally published in 1936 (!!) during
the Great Depression, but as the back cover argues, it is “equally valuable
during booming economies or hard times.” I read the 1981 edition, which updated
some of the original material to make it more applicable to the modern era. Even
though it means the book loses its 1936 perspective, it’s probably a good idea
to keep it updated to avoid confusing the reader, and Carnegie — who passed
away in 1955 — would have wanted it. You can read more about the <a href="https://en.wikipedia.org/wiki/How_to_Win_Friends_and_Influence_People">book’s
history on its Wikipedia page</a>.</p>
<p>So, is the book over-hyped, or is it actually insightful and useful? I think the
answer is yes to both, but we’ll see what happens in the coming years when I
especially try to focus on applying his advice. The benefit of self-help books
clearly depends on how well the reader can apply it!</p>
<p>I don’t like books that bombard the reader with hackneyed, too-good-to-be-true
advertisements. Carnegie’s book certainly suffers from this, starting from the
terrible subtitle (seriously, “The Only Book”??). Now, to be fair, I don’t know
if he wrote that subtitle or if it was added by someone later, and if it was
1936, it would have definitely been more original. Certainly in the year 2017,
there is no shortage of lousy self-help books.</p>
<p>The good news is that once you get beyond the hyped-up advertising, the actual
advice in the book is sound. My summary of it: advice that is <em>obvious</em>, but
that <em>we sometimes (often??) forget to follow</em>.</p>
<p>Indeed, Carnegie admits that</p>
<blockquote>
<p>I wrote the book, and yet frequently I find it difficult to apply everything I
advocated.</p>
</blockquote>
<p>This text appears in the beginning of a book titled “Nine Suggestions to Get the
Most Out of This Book”. I am certainly going to be following those suggestions.</p>
<p>The advice he has is split into four rough groups:</p>
<ul>
<li>Fundamental Techniques in Handling People</li>
<li>Six Ways to Make People Like You</li>
<li>How to Win People to Your Way of Thinking</li>
<li>Be a Leader: How to Change People Without Giving Offense or Arousing
Resentment</li>
</ul>
<p>Each group is split into several short chapters, ending in a quick one-phrase
summary of the advice. Examples range from “Give honest and sincere
appreciation” (first group), “smile” (second group), “If you are wrong, admit it
quickly and emphatically” (third group), and “Talk about your own mistakes
before criticizing the other person” (fourth group). Chapters contain anecdotes
of people with various backgrounds. Former U.S. Presidents Washington, Lincoln,
and both Roosevelts are featured, but there are also many examples from people
leading less glamorous lives. The examples in the book seem reasonable, and I
enjoyed reading about them, but I do want to point out the caveat that some of
these stories seem way too good to be true.</p>
<p>One class of anecdotes that fits this criteria: when people are able to get
others to do what they want <em>without</em> actually bringing it up! For example,
suppose you run a business and want to get a stubborn customer to buy your
products. You can ask directly and he or she will probably refuse, or you can
praise the person, show appreciation, etc., and somehow magically that person
will want to buy your stuff?!? Several anecdotes in the book are variants of
this concept. I took notes (with a pencil) to highlight and comment in the book
as I was reading it, and I frequently wrote “I’m skeptical”. Fortunately, many
of the anecdotes are more realistic, and the advice itself is, as I mentioned
before, accurate and helpful.</p>
<p>I have always wondered what it must be like to have a “normal” social life. I
look at groups of friends going out to meals, parties, and so forth, and I
repeatedly wonder:</p>
<ul>
<li><em>How did they first get together?</em></li>
<li><em>What is their secret to liking each other??</em></li>
<li><em>Do I have any ounce of hope of breaking into their social circle???</em></li>
</ul>
<p>Consequently, what I most want to get out of the book is based on the second
group, <em>how to make people like me</em>.</p>
<p>Unfortunately, I suffer from the social handicap of being deaf. While talking
with one person usually isn’t a problem, I can’t follow conversations with noisy
backgrounds and/or with many people. Heck, handing a conversation with <em>two</em>
other people is often a challenge, and whenever this happens, I constantly fear
that my two other “conversationalists” will talk to themselves and leave me out.
And how on earth do I possibly network in noise-heavy academic conferences or
workshops??? Gaaah.</p>
<p>Fortunately, what I find inspiring about Carnegie’s advice is that it is generic
and highly applicable to the vast majority of people, regardless of
socioeconomic status, disability condition, racial or ethnic background, and so
forth. Obviously, the benefit of applying this advice will vary depending on
people’s backgrounds, but for the vast majority of people, <em>there should be some
positive, non-zero benefit</em>. That is what really counts.</p>
<p>I will keep <em>How to Win Friends and Influence People</em> on my desk as a constant
reminder for me to keep applying these principles. Hopefully a year from now, I
can look back and see if I have developed into a better, more fulfilled man.</p>
Sun, 17 Sep 2017 06:00:00 -0700
https://danieltakeshi.github.io/2017/09/17/thoughts-on-how-to-win-friends-and-influence-people/
https://danieltakeshi.github.io/2017/09/17/thoughts-on-how-to-win-friends-and-influence-people/Please, Denounce Racism and White Supremacy Immediately<p>President Trump, you should have clearly and unequivocally denounced racism and
white supremacy immediately, <em>without</em> trying to pin the blame on “both sides”
or whatever other un-related group comes to mind. Your delayed statement does
not redeem yourself.</p>
<p>The failure to call out and condemn white supremacy is perhaps <em>the epitome</em> of
political correctness. We tragically saw one person, Heather Heyer, murdered
from the events in Charlottesville. In this case, political correctness really
is deadly.</p>
<p>The Ku Klux Klan, neo-Nazis, and other white nationalist groups do not belong in
our society. We need to always condemn them and aim to eradicate their presence
so that America can become a better place.</p>
<p>America has come a long way since the days of George Washington, Abraham
Lincoln, and Martin Luther King Jr., but we still have lots of progress to go
before we can truly claim that America provides an equal playing field for its
citizens.</p>
Tue, 15 Aug 2017 16:00:00 -0700
https://danieltakeshi.github.io/2017/08/15/please-denounce-racism-and-white-supremacy-immediately/
https://danieltakeshi.github.io/2017/08/15/please-denounce-racism-and-white-supremacy-immediately/Uncertainty in Artificial Intelligence (UAI) 2017, Day 5 of 5<h1 id="day-five">Day Five</h1>
<p>Today, August 15, was the last day of UAI 2017. We had <em>workshops</em>, which you
can think of as one-day conferences with fewer people. UAI 2017 offered three
workshops, and I attended the <a href="http://bmaw2017.azurewebsites.net/"><strong>Bayesian Modeling Applications Workshop</strong></a>.
It was a small workshop with only ten of us present at the 9:00am starting time,
though a few more would trickle in during the first hour.</p>
<p>Here were some of the highlights:</p>
<ul>
<li>
<p><a href="https://www.cs.ubc.ca/~poole/">David Poole</a> from the University of British Columbia gave the opening talk
on <em>Probabilistic Reasoning with Complex Heterogeneous Observations and
Applications in Geology and Medicine</em>. This one was largely about
<em>ontologies</em>. Unfortunately, in the interest of time, he had to skip a lot of
the content.</p>
</li>
<li>
<p>The other talks were more directly related to <em>Bayesian networks</em>, which I
studied a lot in undergrad and also for my AI prelim exams.</p>
</li>
<li>
<p>There was another talk about <a href="http://www.openmarkov.org/">OpenMarkov</a>. I got mostly distracted when the
speaker emphasized the advantage that the software was open source. Maybe
this is me coming from Deep Learning, but open source should be the
expectation, not the norm. (MuJoCo is the one exception for Deep Learning, but
hopefully that will soon no longer be the case.) I was reminded of Zack
Lipton’s <a href="http://www.kdnuggets.com/2015/12/tensor-flow-terrific-deep-learning-library.html">blog post on a sober perspective of Tensorflow</a> when he wrote
that “A number of other news outlets marveled that Google made the code open
source.”.</p>
</li>
</ul>
<p>I don’t have much else to say because I didn’t take detailed notes.</p>
<p>Upon the evening of August 15, the conference officially ended. Tomorrow, I’ll
board a 15-hour direct flight from Sydney to San Francisco, and life will be
back to normal.</p>
<h1 id="closing-thoughts">Closing Thoughts</h1>
<p>What are some of my thoughts now that UAI 2017 has concluded? Here is a rough
categorization of the pros:</p>
<ul>
<li>
<p>I enjoyed <a href="https://danieltakeshi.github.io/2017/08/15/uai-day-four-of-five/">giving a talk on my research</a>. And the paper won an award!</p>
</li>
<li>
<p>I identified a few interesting papers and concepts from tutorials which
I should investigate in more detail once I have time.</p>
</li>
<li>
<p>I met (a.k.a. “networked with”) a few students and faculty, and hopefully this
will help spread my name. I should email them later.</p>
</li>
<li>
<p>The venue and location were awesome. This place is probably the best in
Australia for tourism.</p>
</li>
</ul>
<p>Here are the cons:</p>
<ul>
<li>
<p>Captioning. Gah. <a href="https://danieltakeshi.github.io/2017/08/11/uai-2017-day-one-of-five/">As you know, it wasn’t set up on the first day</a>, and even
when the service was present, I still had a hard time following talks. The
lack of mobility of captioners is also a drawback. Even so, it was better than
nothing.</p>
</li>
<li>
<p>I don’t feel like I sufficiently networked. Yes, I networked a bit (as
mentioned recently) but probably to a lesser extent compared to other
students. How again do people normally network at conferences, particularly if
they’re unpopular and unknown like me? (The rock stars, of course, don’t need
to do anything, as people flock <em>to</em> them, not the other way around.)</p>
</li>
</ul>
<p>Despite these not so insignificant drawbacks, I’m extremely happy that I
attended UAI 2017. I thank the conference organizers for arranging UAI and hope
that they enjoyed it at least as much as I did.</p>
<p>I should elaborate on the venue, location, and related stuff. The hotel had
excellent service, and the breakfast buffet was awesome. I had to resist eating
so quickly! A picture of an example breakfast of mine is shown below:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day5_breakfast.JPG" alt="sydney" />
<i>
My breakfast on August 15, the last full day of UAI 2017.
</i>
</p>
<p>The coffee was great, both at the hotel and in the conference. I’ve used coffee
machines that produced utter junk lattes and cappuccinos, but the ones at ICC
Sydney made great coffee.</p>
<p>Darling Harbor, of course, is great. Here are two final views of it:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day5_harbor1.JPG" alt="sydney" />
<i>
A view of the harbor.
</i>
</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day5_harbor2.JPG" alt="sydney" />
<i>
Another view of the harbor.
</i>
</p>
<p>Yeah. Someday, I’ll be back.</p>
Tue, 15 Aug 2017 13:00:00 -0700
https://danieltakeshi.github.io/2017/08/15/uai-day-five-of-five/
https://danieltakeshi.github.io/2017/08/15/uai-day-five-of-five/Uncertainty in Artificial Intelligence (UAI) 2017, Day 4 of 5<p>For the fourth day of UAI 2017 (August 14), I skipped my 4:30am workout to get
two more full practice runs of my talk. Thus, by the time I entered the
conference venue, I was feeling confident.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day4_keynote.JPG" alt="sydney" />
<i>
Professor Terry Speed gives the keynote talk of the day, and the fifth one
overall for the conference.
</i>
</p>
<p>The day started off with our last keynote for the conference, presented by
Professor <a href="https://www.wehi.edu.au/people/terry-speed">Terry Speed</a> who currently works at the Walter and Eliza Hall
Institute of Medical Research in Australia. Interestingly enough, he used to be
a Berkeley statistics professor until his retirement in 2009.</p>
<p>His talk reminded me of Professor Heller’s talk from the other day. There’s <em>a
lot</em> of work being done at the intersection of medicine and artificial
intelligence. My impression is that Professor Speed’s work is more about RNA and
DNA sequencing, while Professor Heller’s is about modeling diseases and health
conditions. There might be overlap with the techniques they use, since both
briefly mentioned (recurrent) neural networks. Yeah, they’ve fallen for the
hype.</p>
<p>While Professor Speed is funny and skilled at presenting, I had a hard time
processing the technical content, because I kept mentally rehearsing for my
talk. This also happened during the first oral session (on causality) which
preceded the one that contained my talk. Does anyone else find it hard to focus
on talks that precede theirs?</p>
<p>Finally, at around noon, after several Disney Research people taught us about
stochastic gradient descent for imbalanced data, it was my turn.</p>
<p>Whew … I walked up and got my laptop and clicker set up, though before I could
begin, the conference chair gave a few remarks about my paper and then presented
me with the <strong>Honorable Mention for Best Student Paper Award</strong>. After some
applause, I also pointed out my coauthor <a href="https://people.eecs.berkeley.edu/~jfc/">John Canny</a> in the audience, and
got everyone to applaud for him as well. Then I began my talk.</p>
<p>I’m pleased to report that my talk went as well as I could have hoped for, with
one exception that I’ll bring up later.</p>
<p>Here’s the list of rather sloppy reminders that I made for myself and which I
reviewed beforehand:</p>
<ul>
<li>Don’t be flat-footed, don’t stand like a robot.</li>
<li>Don’t swing side to side!!</li>
<li>Must stay vigilant and alert!</li>
<li>Must not have a flat voice. Try to vary it. Lots of deliberate pauses. With
smiles!</li>
<li>Talk LOUD, since I will likely be moving away from the microphone.</li>
<li>Don’t put my hand in my pockets!</li>
<li>Thank them at the beginning for the award, and at the end thank the audience
for their attention.</li>
</ul>
<p>I also wrote talk-specific reminders to include phrases such as “I hope you
remember this” when reaching this slide, and so forth.</p>
<p>One thing that’s perhaps unique about me is my stance on giving talks. I touched
up on this briefly when discussing my class review for <a href="https://danieltakeshi.github.io/2016/12/20/review-of-algorithmic-human-robot-interaction-cs-294-115-at-berkeley/">Algorithmic Human-Robot
Interaction</a>, but I’ll expand the discussion here with some <strong>bold text</strong> to
catch your attention.</p>
<p><strong>I view talking in front of a large audience as an absolute privilege that I
CANNOT waste.</strong> Thus, my talk must be polished, but in addition, <strong>I must make
it MEMORABLE and keep the viewers as ALERT as possible.</strong> This means I need to
be <strong>loud</strong>, <strong>funny</strong>, and <strong>highly active</strong>. Even if this comes at the cost of
a slight reduction in the amount of technical material that appears on my
slides.</p>
<p>For the vast majority of conference talks, while some audience members pay rigid
attention, many will also be checking their phones and laptops. Realistically,
<strong>there’s no way to keep everyone’s attention for the entire talk</strong>, especially
in conferences when there are many talks back-to-back. Even with coffee, people
can’t absorb all this information. Thus, I think it’s best to simply <strong>get the
audience interested</strong> so that they can <strong>look up the material later</strong> in their
own time.</p>
<p>One absolute <em>sure-fire</em> way to lose the already-fragile attention span of
humans is to stand frozen behind a microphone and read off text-filled slides
(or a pre-made script) with a flat voice. Sorry, but when people do that, I want
to <strong>yell</strong> at them: <strong>What are you doing?!? You’re wasting such a great
opportunity to impress your audience!!</strong> The fact that the <em>majority</em> of
conference speakers — mostly students, but sometimes faculty are guilty of
this as well — still do this is simply mind-boggling to me. It’s completely
baffling.</p>
<p>I understand that non-native English speakers might have difficulty with knowing
what phrases to emphasize and so forth. But that doesn’t mean they can’t smile
and be active when presenting, and the people who are guilty of robotic speaking
are not always non-native English speakers.</p>
<p>Of course, there are certain times when it’s best not to follow my speaking
techniques. I would obviously not apply this style at a funeral. Academia,
however, is not entirely conservative in presentation style. Sure, you can be a
boring robot reading off a script, but you can <em>also</em> be active and constantly
be in engagement with the audience and no one’s going to stop you.</p>
<p>Whew. Anyway, sorry for that mini-rant but this felt like something important I
should bring up. You can expect that whenever I give a polished academic talk at
a conference, I am not going to be a boring or typical speaker.</p>
<p>For my talk, I did not stand behind the lectern with the microphone; I stood in
front of it like Terry Speed did (see the picture above).</p>
<p>I also deliberately did not walk too fast when talking. The key is to walk a
little bit, stand still, point the laser pointer at the slides, make a joke or
two, <em>make eye contact with the audience</em>, and then slowly walk to the other
side of the room.</p>
<p>I think the talk was great. I followed my advice and made some comments to get
the crowd to laugh. One of them, for instance, was “please remember this figure
for the rest of your life.”</p>
<p>OK, now what was that one “exception” I referred to earlier? It happened during
the question-answer session. Professor <a href="https://danieltakeshi.github.io/2016/12/20/review-of-algorithmic-human-robot-interaction-cs-294-115-at-berkeley/">John Duchi</a> asked if I could prove
that the method “converges to the correct posterior distribution” or something
like that. I must have laid an egg because I don’t think my answer satisfied him
(though to be fair, I thought his question was too vague).</p>
<p>Then John Duchi and coauthor John Canny (who were sitting next to each other)
started discussing amongst themselves, as humorously pointed out by conference
chair Kristian Kersting. Incidentally, Kristian was standing next to me during
this Q&A to repeat questions from the audience, since I can’t hear/understand
them. He had to relay John Duchi’s question to me even though John was
<em>literally</em> five meters away from me.</p>
<p>After my talk concluded, the other conference chair, Gal Elidan, came to me and
shook my hand (I didn’t see him do that to anyone else). Throughout the rest of
the day, no less than six people came to me and said they liked my talk.</p>
<p>I certainly felt relieved after presenting. It was also our lunch break. I
wasn’t sure what to do, but fortunately, John Canny came to my rescue and said
that I should join him plus a few others for lunch. It turns out those “others”
were: Gal Elidan, Kristian Kersting, Terry Speed, and John Duchi. Gulp. I would
of course never have the courage to ask to join them for lunch myself, given
that just about the only thing I’m better at than those guys is blogging.</p>
<p>John Duchi made the choice to eat at a small lunch/bar place called <em>Social</em>. I
ate a pork burger and mostly watched the conversation, since I was unable to get
involved.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day4_lunch.JPG" alt="sydney" />
<i>
We finish lunch at Social. From left to right, we have Professors Terry Speed,
Kristian Kersting (behind Terry Speed), Gal Elidan (behind John Duchi), John
Duchi, and John Canny.
</i>
</p>
<p>After that, we had another oral session and then a poster session.</p>
Tue, 15 Aug 2017 03:00:00 -0700
https://danieltakeshi.github.io/2017/08/15/uai-day-four-of-five/
https://danieltakeshi.github.io/2017/08/15/uai-day-four-of-five/Uncertainty in Artificial Intelligence (UAI) 2017, Day 3 of 5<p>The third day of UAI 2017 (August 13) started off with Stanford Professor
<a href="https://cs.stanford.edu/people/chrismre/">Christopher Ré</a> giving the first keynote talk of the day about his group’s
project called <a href="https://hazyresearch.github.io/snorkel/">Snorkel</a>. Chris is epitome of a “rock-star academic,” and he
has a ridiculous amount of publications in the last few years. His lengthy list
of awards includes the well-known MacArthur “Genius” Fellowship.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day3_chrisre.JPG" alt="sydney" />
<i>
Stanford Professor Christopher Ré gives the first keynote talk of the day.
</i>
</p>
<p>I really enjoyed Professor Ré’s talk, both for the content and for the “style”
(i.e. at the right technical level, good visuals, etc.). He inserted some humor
now and then, as you can see in the slide above. Anyone want to win the parasite
award? Heh. I also try to include humor in my talks.</p>
<p>Anyway, as I mentioned earlier, the main technical part of the talk was about
Snorkel. It’s an important project, because it helps us deal with “dark” or
unlabeled data, which is what the vast majority of data will look like in real
life. How can we make sense of dark data, and “clean it up” to make it more
useful? This is critical because, as Professor Ré said in the talk:</p>
<blockquote>
<p>Training data is the new, new oil.</p>
</blockquote>
<p>(Yes, he said “new” twice.)</p>
<p>I was amusingly reminded of <a href="https://www.youtube.com/watch?v=21EiKfQYZXc">Andrew Ng’s famous</a> (or infamous, depending on
your opinion) phrase:</p>
<blockquote>
<p>AI is the new electricity.</p>
</blockquote>
<p>In case you are curious, I agree with both of the above quotes.</p>
<p>You can find more information about Snorkel on the project website. What’s great
is that there are also lots of blog posts. His group really likes to write blog
posts! At least I have something in common with them.</p>
<p>The first oral session was about “Representations”, a research sub-field which I
am unfortunately not familiar with, and so I had an extremely hard time
following the material. I tried to gather pieces of what I could and recorded
anything interesting in my ongoing Google Doc containing my notes from UAI 2017.
<a href="https://danieltakeshi.github.io/2017/08/13/uai-2017-day-two-of-five/">As I stated in my last blog post</a>, <strong>I do not try to follow talks in their
entirety</strong> — I couldn’t do that even if I <em>wanted</em> to — but I record <em>bits
and pieces</em> of intriguing stuff which are candidates for future investigation.</p>
<p>During breaks, I worked on outlining these blog posts; I drafted them in Google
Docs.</p>
<p>The second oral session was about … reinforcement learning! Awesome. At least
I should have more background information for this material. Of the four papers
presented, the fourth one seemed to have the most interesting material in it.
The UAI organizers must have agreed, because the authors (<a href="http://www.cs.cmu.edu/~shayand/">Shayan Doroudi</a>
along with Philip Thomas and Emma Brunskill) won the UAI 2017 Best Paper Award
for the paper “<a href="http://www.cs.cmu.edu/~shayand/papers/UAI2017.pdf">Importance Sampling for Fair Policy Selection</a>.”</p>
<p><em>Fairness</em> is becoming a recurring theme in research and academia nowadays along
with <em>safety</em> and (as you’ll see later) <em>health care</em>. The talk was excellent
since Shayan has good speaking skills. He motivated the problem with a quick
example of choosing between two policies, one of which was obviously better than
the other. Despite the apparent simplicity of choosing the policies, importance
sampling approaches can actually choose the <em>worse</em> policy more often than not.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day3_best_paper.JPG" alt="sydney" />
<i>
Shayan Doroudi (to the left) being presented with the UAI 2017 Best Paper Award
by the conference chairs. Congratulations!
</i>
</p>
<p>We now have the following award-winning reinforcement learning papers:</p>
<ul>
<li>Importance Sampling for Fair Policy Selection (UAI 2017 best paper)</li>
<li>Modular Multitask Reinforcement Learning with Policy Sketches (ICML 2017
runner-up best paper)</li>
<li>Value Iteration Networks (NIPS 2016 best paper)</li>
<li>Dueling Network Architectures for Deep Reinforcement Learning (ICML 2016 best
paper)</li>
</ul>
<p>At some point, I’d like to grind through the details in these papers. I know the
high level idea of each of these but aside from perhaps the dueling networks
paper, the details elude me.</p>
<p>After a brief lunch break with another student from India — whom I found
standing alone and thus it made sense for us to go to lunch together — we had
our second keynote talk. Duke Professor <a href="http://www2.stat.duke.edu/~kheller/">Katherine Heller</a> gave a talk about
machine learning in health care. Gee, is anyone seeing a trend with machine
learning applications?</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day3_heller.JPG" alt="sydney" />
<i>
Katherine Heller gives the second keynote talk of the day.
</i>
</p>
<p>You can see the outline of her talk in my picture above. I remember that she
discussed the following topics:</p>
<ul>
<li>Health conditions: chronic kidney disease, sepsis, and multiple sclerosis.</li>
<li>Health issues: delayed diagnosis results in problems, surgery can introduce
complications; her exact figure was 15% of the time but there are obvious
simplifications with numbers like that.</li>
<li><em>Modeling</em> health issues: using graphical models with latent variables.
Basically, given health conditions (or a sequence of conditions measured at
times) what can we say about the patient’s health? I also saw a few mentions
of RNNs and LSTMs there (wow, really?) and would be interested in learning
more.</li>
</ul>
<p>Given that much of the talk was about, I believe, modeling health care, I
sometimes wonder how accurate our models are. The United States has one of the
most inefficient health care systems in the developed world, and I wish we could
use some machine learning to cut away at the inefficiency.</p>
<p>After Professor Heller’s talk, we had the usual poster session. I managed to
engage in a few interesting one-on-one conversations, which is good enough for
me!</p>
<p>We then had a special event provided by the conference: a dinner cruise along
Darling Harbor. Awesome! The buffet options included a wide range of food
options: prawns, Indian chicken curry, Thai fish curry, pastas, potatoes, and of
course, lots of salad options.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day3_cruise1.JPG" alt="sydney" />
<i>
UAI 2017 conference attendees lining up to get dinner.
</i>
</p>
<p>I don’t know about how others feel, but <em>every time</em> there’s an event like this
where I have to pick my own seat amidst a large dinner gathering, I worry and
overthink it <em>way</em> too much. Fortunately, there was a student who I met earlier
at the conference who told me to sit near the center (gulp) of a table filled
with other graduate students, thus saving me the stress of coming up with the
decision myself. I was happy with this because it meant I wasn’t sitting by
myself, and because it’s better for me to know other graduate students (and
potential future collaborators/colleagues) rather than, for instance, industry
sponsors.</p>
<p>Yes, it was <em>extremely</em> noisy in the ship, and I couldn’t participate in
substantive conversations, but hey, at least I was sitting with other graduate
students. And it seemed like there was some ongoing discussion regarding my
blog, judging by how several of the other students nearby kept looking at my
name tag in order to correctly spell my name in Google.</p>
<p>Throughout the cruise, we would frequently walk to the top of the ship and view
Darling Harbor.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day3_cruise2.JPG" alt="sydney" />
<i>
A view of Luna Park.
</i>
</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day3_cruise3.JPG" alt="sydney" />
<i>
A view of the Sydney Opera House.
</i>
</p>
<p>It’s times like these when I wish I had a girlfriend, so that we could go on a
vacation together and explore Darling Harbor.</p>
Mon, 14 Aug 2017 03:00:00 -0700
https://danieltakeshi.github.io/2017/08/14/uai-2017-day-three-of-five/
https://danieltakeshi.github.io/2017/08/14/uai-2017-day-three-of-five/Uncertainty in Artificial Intelligence (UAI) 2017, Day 2 of 5<p>For the second day of UAI 2017 (August 12)<sup id="fnref:note"><a href="#fn:note" class="footnote">1</a></sup>, I followed the same initial
routine from the previous day. I woke up early, had a 4:30am gym session, ate a
hearty breakfast at the hotel’s buffet, and then walked over to the conference
venue. The talks were held in the room shown in the following image:</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day2_talk_room.JPG" alt="sydney" />
<i>
Room C4.8 in the ICC Sydney building.
</i>
</p>
<p>Yeah, it’s a fairly small room. Compare that to <a href="https://twitter.com/Genomicsplc/status/894844813761171456">the room used for the ICML
keynote talks</a>, which is in the same building. Wow!</p>
<p>Fortunately, the second day of UAI started off <a href="https://danieltakeshi.github.io/2017/08/11/uai-2017-day-one-of-five/">better than the first one</a>,
since the captioner (a.k.a. “CART provider” or “stenographer”) arrived. Whew.</p>
<p>At around 8:30am, the day began with some opening remarks from one of the
chairs. After that, it was time for MIT robotics professor <a href="http://people.csail.mit.edu/lpk/">Leslie
Kaelbling</a>’s one-hour keynote talk on <em>Intelligent Robots in an Uncertain
World.</em> It was a nice, relatively high-level talk which centered on Partially
Observable Markov Decision Processes (POMDPs) and belief states, with
applications that focused on robotics.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day2_keynote1.JPG" alt="sydney" />
<i>
Professor Kaelbling gives the initial keynote talk for UAI 2017.
</i>
</p>
<p>The experiments that she showed in the slides involved the PR2 robot, which I
assume is one of the primary robots in her lab. I wish I could use the PR2 one
of these days, or at least a robot similar to it.</p>
<p>The final part of her talk contained a request for the UAI community to figure
out how to perform action selection in belief spaces. In other words, if we
don’t know everything about the environment (which is always the case in real
applications) we have to pick actions on the basis of what we <em>believe</em> about
the world.</p>
<p>Overall, it was an excellent talk. There were a few sections that were
relatively difficult for me to follow, but I’m not sure if it was because there
was too much information to process in the slides (some of them had a lot!) or
if it was because I had a hard time getting used to the captioning.</p>
<p>After the keynote talk, we had oral sessions. In these, authors of papers
accepted to the conference give 20 minute talks. Not all the papers have oral
talks, though; they’re reserved for those with the highest reviews. Also,
typically the <em>first author</em> is the one who gives presentations.</p>
<p>Today, there were four oral sessions, each of which consisted of one broad
research topic and three research papers in each (so each session was an
hour long). The first oral session was about deep models. Yay! Ming Jin
started off the oral sessions with his excellent talk on inverse
reinforcement learning.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day2_oral1.JPG" alt="sydney" />
<i>
UC Berkeley electrical engineering PhD student
<a href="http://www.jinming.tech/">Ming Jin</a> gives a talk about his paper
<a href="https://arxiv.org/abs/1512.08065">Inverse Reinforcement Learning via Deep Gaussian Process</a>.
</i>
</p>
<p>The two other talks for this oral session were also interesting and perhaps more
related to core Deep Learning research.</p>
<p>The second oral session was on the subject of <em>machine learning</em>, which is
probably not the best name for it, but whatever. Unfortunately, the papers were
quite mathematical and the speakers were a little difficult to understand (the
captioner had <em>major</em> difficulty) so it was hard for me to get much out of the
talks beyond trying to extract every ounce of information from the slides that I
could.</p>
<p>After a break for lunch — though I was simply too full from eating earlier and
had to pass — we had our second keynote of the day, <em>Expectations in Learning
and Inference</em> by Professor <a href="http://www.cs.tau.ac.il/~gamir/">Amir Globerson</a> of Tel Aviv University.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day2_keynote2.JPG" alt="sydney" />
<i>
The second keynote talk of the day.
</i>
</p>
<p>This talk was more about math, in particular about expectations and
probabilities. What happens when we aren’t given data but are given the expected
value? What can we determine from it? (See the slide above for the related
context.) The technical contribution was probably the development of bounds for
not probabilities, but the <em>minimum</em> of probabilities over a certain class (if
that makes sense?). I unfortunately had a harder time understanding this talk
compared to the first keynote. I thought I could follow slide by slide and
sentence by sentence in the captions when I saw the transcripts, but I couldn’t
piece together a good story. Maybe this has happened to other people?</p>
<p>In any case, for me I long ago decided that for research talks, <strong>I don’t try to
understand everything</strong> but instead, I find <strong>any interesting points and then
follow-up on these later</strong>, either by emailing the author or (more likely)
simply searching online. Google has been great for people like me.</p>
<p>We had two more oral presentations after this second keynote. In between the
two, I had an entertaining conversation with another student from the University
of Helsinski who told me that hearing aids should have some way of blocking out
background noise. I told him that, sadly, they’re <em>already</em> supposed to do that,
though he didn’t give up and said that we should use <em>machine learning</em> to make
them block out noise. Yeah, that would be great.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day2_poster.JPG" alt="sydney" />
<i>
The poster session for today.
</i>
</p>
<p>We wrapped up the day with a poster session which featured roughly one-third of
the UAI 2017 papers. (There are also poster sessions in day three and day four
of the conference.)</p>
<p>After this, I went to the nearby mall and found a quick, cheap Middle Eastern
restaurant for dinner. I ate by myself as I didn’t know anyone else, and I
couldn’t find any lonely person who I could pounce on with an invitation, but
that was OK with me. I just wanted to see what the city had to offer, and I’m
pleased to say that I was not disappointed. Darling Harbor has a <em>ridiculous
ton</em> of awesome restaurants. It’s food paradise for someone like me.</p>
<p style="text-align:center;">
<img src="https://danieltakeshi.github.io/assets/uai_2017/day2_harbor_night.JPG" alt="sydney" />
<i>
The view of the lovely harbor at night. The conference is in the large building
located at the center-left with the lights on. To its right is a giant mall
(which the photo doesn't entirely reveal) with a LOT of stores and restaurants.
Wow.
</i>
</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:note">
<p>The days when the posts are published on this blog do not necessarily
coincide with the day that the conference took place, e.g., this post was
published the day after. <a href="#fnref:note" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Sun, 13 Aug 2017 13:00:00 -0700
https://danieltakeshi.github.io/2017/08/13/uai-2017-day-two-of-five/
https://danieltakeshi.github.io/2017/08/13/uai-2017-day-two-of-five/