# Hypothesis Testing for a Binomial Example

In STAT 210A class, we are now discussing *hypothesis testing*, which has
brought back lots of memories from my very first statistics course (taken in my
third semester of undergrad). Think null hypotheses, -values, confidence
intervals, and power levels, which are often covered in introductory statistics
courses. In STAT 210A, though, we take a far more rigorous treatment of the
material. Here’s the setting at a high level: we use data to infer which of two
competing hypotheses is correct. To formalize it a bit: we assume some model
and test:

- , the null hypothesis
- , the alternative hypothesis

While it is not strictly necessary for and , we often assume these are true for simplicity.

We “solve” a hypothesis testing problem by specifying some *critical function*
, specified as

In other words, tells us the probability that we should reject .

The performance of the test is specified by the *power* function:

A closely related quantity is the *significance level* of a test:

The level here therefore represents the *worst* chance (among all the
possibilities) of falsely rejecting . Notice that
is constrained to be in the null hypothesis region ! The reason why
we care about is because often represents the status quo, and
we only want to reject it if we are absolutely sure that our evidence warrants
it. (In fact, we technically don’t even “accept” the null hypothesis; in many
courses, this is referred to as “failing to reject”.)

We may often resort to a randomized test, which was suggested by my definition of above which uses a . This is useful for technical reasons to achieve exact significance levels. Let’s look at an example to make this concrete. Suppose that , and that we are testing

- .
- (so here the hypotheses do not partition).

And, furthermore, that we want to develop a test with a significance level of . (Note: yes, this is related to the abundant usage of 0.05 as a -value in many research articles.)

To test, it is convenient to use a *likelihood ratio* test, which have nice
properties that I will not cover here. In our case, we have:

where we have simply plugged in the densities for Binomial random variables and simplified. Intuitively, if this ratio is very large, then it is more likely that we should reject the null hypothesis because describes the data better.

We know that can take on only three values (because ) and that
*under the null hypothesis* (this is important!) the probabilities of
taking on , or happen with probability and
, respectively.

Knowing this, how do we design the test with the desired significance level? It must be the case that

(There is only one possibility here, so we do not need a “sup” term.)

By using the fact that a likelihood ratio must be defined by a cutoff point where if , we reject, else if we accept (and with equality, we randomize), we see that our test must be and . If we were to equate this with our definitions above, this would be:

with . (The third case never happens here; I just added it to be consistent with our earlier definition.)

And *that* is why we use randomization. This also gives us a general recipe for
designing tests that achieve arbitrary significance levels.