A statistical *hypothesis* is an unproven statement which can be tested. A *hypothesis test* is used to test whether this statement is true.

- The
*null hypothesis*$H_0$, is where you assume that the observations are statistically independent i.e. no difference in the populations you are testing. If the null hypothesis is true, it suggests that any changes witnessed in an experiment are because of random chance and not because of changes made to variables in the experiment. For example, serotonin levels have no effect on ability to cope with stress. See also Null and alternative hypotheses.

- The
*alternative hypothesis*$H_1$, is a theory that the observations are related (not independent) in some way. We only adopt the alternative hypothesis if we have**rejected**the null hypothesis. For example, serotonin levels affect a person's ability to cope with stress. You do not necessarily have to specify in what way they are related but can do (see one and two tailed tests for more information).

- The first step of a hypothesis test is to state the
**null hypothesis $H_0$**and the**alternative hypothesis $H_1$**. The null hypothesis is the statement or claim being made (which we are trying to disprove) and the alternative hypothesis is the hypothesis that we are trying to prove and which is accepted if we have sufficient evidence to reject the null hypothesis.

For example, consider a person in court who is charged with murder. The jury needs to decide whether the person in innocent (the null hypothesis) or guilty (the alternative hypothesis). As usual, we assume the person is innocent unless the jury can provide sufficient evidence that the person is guilty. Similarly, we assume that $H_0$ is true unless we can provide sufficient evidence that it is false and that $H_1$ is true, in which case we reject $H_0$ and accept $H_1$.

To decide if we have *sufficient* evidence against the null hypothesis to reject it (in favour of the alternative hypothesis), we must first decide upon a *significance level*. The significance level is the probability of rejecting the null hypothesis when it the null hypothesis is true and is denoted by $\alpha$. The $5\%$ significance level is a common choice for statistical test.

The next step is to collect data and calculate the *test statistic* and associated *$p$-value* using the data. Assuming that the null hypothesis is true, the *$p$-value* is the probability of obtaining a sample statistic equal to or more extreme than the observed test statistic.

Next we must compare the $p$-value with the chosen significance level. If $p \lt \alpha$ then we reject $H_0$ and accept $H_1$. The lower $p$, the more evidence we have *against* $H_0$ and so the more confidence we can have that $H_0$ is false. If $p \geq \alpha$ then we do not have sufficient evidence to reject the $H_0$ and so must accept it.

Alternatively, we can compare our test statistic with the appropriate critical value for the chosen significance level. We can look up critical values in distribution tables (see worked examples below). If our test statistic is:

- positive and greater than the critical value, then we have sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.
- positive and lower than or equal to the critical value, we must accept the null hypothesis.
- negative and lower than the critical value, then we have sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.
- negative and greater than or equal to the critical value, we must accept the null hypothesis.

For either method:

Significant difference found: **Reject the null hypothesis**

No significant difference found: **Accept the null hypothesis**

Finally, we must interpret our results and come to a conclusion. Returning to the example of the person in court, if the result of our hypothesis test indicated that we should accept $H_1$ and reject $H_0$, our conclusion would be that the jury should declare the person guilty of murder.

- Specify the null and the alternative hypothesis
- Decide upon the significance level.
- Collect data and decide whether to accept $H_0$ or reject $H_0$ and accept $H_1$ by either:
- Comparing the $p$-value to the significance level $\alpha$, or

- Comparing the test statistic to the critical value.

- Interpret your results and draw a conclusion

The $p$*-value* is the probability of the test statistic (e.g. *t*-value or Chi-Square value) occurring given the null hypothesis is true. Since it is a probability, the $p$-value is a number between $0$ and $1$.

- Typically $p \leq 0.05$ shows that there is strong evidence for $H_1$ so we can accept it and reject $H_0$. Any $p$-value less than $0.05$ is
*significant*and $p$-values less than $0.01$ are*very significant*.

- Typically $ p > 0.05$ shows that there is poor evidence for $H_1$ so we reject it and accept $H_0$.

- The smaller the $p$-value the more evidence there is supporting the hypothesis.

- The rule for accepting and rejecting the hypothesis is:

\begin{align} \text {Significant difference found} &= \textbf{Reject}\text{ the null hypothesis}\\ \text {No Significant difference found} &= \textbf{Accept}\text{ the null hypothesis}\\ \end{align}

**Note**: The significance level is not always $0.05$. It can differ depending on the application and is often subjective (different people will have different opinions on what values are appropriate). For example, if lives are at stake then the $p$-value must be very small for safety reasons.

- See $P$-values for further detail on this topic.

There are *parametric* and *non-parametric* hypothesis tests.

- A
*parametric hypothesis*assumes that the data follows a Normal probability distribution (with equal variances if we are working with more than one set of data) . A*parametric hypothesis test*is a statement about the parameters of this distribution (typically the mean). This can be seen in more detail in the Parametric Hypotheses Tests section.

- A
*non-parametric test*assumes that**the data does not follow any distribution**and usually bases its calculations on the**median**. Note that although we assume the data does not follow a particular distribution it may do anyway. This can be seen in more detail in the Non-Parametric Hypotheses Tests section.

Whether a test is One-tailed or Two-tailed is appropriate depends upon the alternative hypothesis $H_1$.

*One-tailed tests*are used when the alternative hypothesis states that the parameter of interest is either bigger or smaller than the value stated in the null hypothesis. For example, the null hypothesis might state that the average weight of chocolate bars produced by a chocolate factory in Slough is 35g (as is printed on the wrapper), while the alternative hypothesis might state that the average weight of the chocolate bars is in fact*lower*than 35g.

*Two-tailed tests*are used when the hypothesis states that the parameter of interest*differs*from the null hypothesis but does not specify in which direction. In the above example, a Two-tailed alternative hypothesis would be that the average weight of the chocolate bars is*not*equal to 35g.

- A
*Type I error*is made if we reject the null hypothesis when it is true (so should have been accepted). Returning to the example of the person in court, a Type I error would be made if the jury declared the person guilty when they are in fact innocent. The probability of making a Type I error is equal to the significance level $\alpha$.

- A
*Type II error*is made if we accept the null hypothesis when it is false i.e. we should have rejected the null hypothesis and accepted the alternative hypothesis. This would occur if the jury declared the person innocent when they are in fact guilty.

For more information about the topics covered here see hypothesis testing.