The lone class I am taking this semester is STAT 210B, the second course in the PhD-level theoretical statistics sequence. I took STAT 210A last semester, and I briefly wrote about the class here. I’ll have more to say about STAT 210B in late May, but in this post I’d first like to present an interesting problem that our professor, Michael I. Jordan, brought up in lecture a few weeks ago.

The problem Professor Jordan discussed was actually an old homework question, but he said that it was so important for us to know this that he was going to prove it in lecture anyway, without using any notes whatsoever. He also stated:

He then went ahead and successfully proved it, and urged us to do the same thing.

OK, if he says to do that, then I will follow his advice and write out my answer in this blog post. I’m probably the only student in class who’s going to be doing this, but I’m already a bit unusual in having a long-running blog. If any of my classmates are reading this and have their own blogs, let me know!

By the way, for all the students out there who say that they don’t have time to maintain personal blogs, why not take baby steps and start writing about stuff that accomplishes your educational objectives, such as doing practice exercises? It’s a nice way to make yourself look more productive than you actually are, since you would be doing those anyway.

Anyway, here at last is the question Professor Jordan talked about:

Let $\{X_i\}_{i=1}^n$ be a sequence of zero-mean random variables, each sub-Gaussian with parameter $\sigma$ (No independence assumptions are needed). Prove that

for all $n\ge 1$.

This problem is certainly on the easier side of the homework questions we’ve had, but it’s a good baseline and I’d like to showcase the solution here. Like Professor Jordan, I will do this problem (a.k.a. write this blog post) without any form of notes. Here goes: for $\lambda \ge 0$, we have

where:

• Step (i) follows from Jensen’s inequality. Yeah, that inequality is everywhere.
• Step (ii) follows from noting that one can pull the maximum outside of the exponential.
• Step (iii) follows from the classic union bound, which can be pretty bad but we don’t have much else to go on here. The key fact is that the exponential makes all terms in the sum positive.
• Step (iv) follows from applying the sub-Gaussian bound to all $n$ variables, and then summing them together.

Next, taking logs and rearranging, we have

Since $\lambda \in \mathbb{R}$ is isolated on the right hand side, we can differentiate it to find the tightest lower bound. Doing so, we get $\lambda^* = \frac{\sqrt{2 \log n}}{\sigma}$. Plugging this back in, we get

which proves the desired claim.

I have to reiterate that this problem is easier than the others we’ve done in STAT 210B, and I’m sure that over 90 percent of the students in the class could do this just as easily as I could. But this problem makes clear the techniques that are often used in theoretical statistics nowadays, so at minimum students should have a firm grasp of the content in this blog post.

Update April 23, 2017: In an earlier version of this post, I made an error with taking a maximum outside of an expectation. I have fixed this post. Thanks to Billy Fang for letting me know about this.