Before carrying out an experiment or trial the following steps must be carried out.

- Come up a hypothesis (see below).
- Collect data and carry out a hypothesis test.
- Decide whether you have
*statistically significant*results, i.e. have you got sufficient evidence to support your hypothesis? - Report your findings.

A statistical *hypothesis* is an unproven statement which can be tested. A *hypothesis test* is used to test whether this statement is true.

- The first step of a hypothesis test is to state the
**null hypothesis $H_0$**and the**alternative hypothesis $H_1$**. The null hypothesis is the statement or claim being made (which we are trying to disprove) and the alternative hypothesis is the hypothesis that we are trying to prove and which is accepted if we have sufficient evidence to reject the null hypothesis.

For example, consider a person in court who is charged with murder. The jury needs to decide whether the person in innocent (the null hypothesis) or guilty (the alternative hypothesis). As usual, we assume the person is innocent unless the jury can provide sufficient evidence that the person is guilty. Similarly, we assume that $H_0$ is true unless we can provide sufficient evidence that it is false and that $H_1$ is true, in which case we reject $H_0$ and accept $H_1$.

To decide if we have *sufficient* evidence against the null hypothesis to reject it (in favour of the alternative hypothesis), we must first decide upon a *significance level*. The significance level is the probability of rejecting the null hypothesis when it the null hypothesis is true and is denoted by $\alpha$. The $5\%$ significance level is a common choice for statistical test.

The next step is to collect data and calculate the *test statistic* and associated *$p$-value* using the data. Assuming that the null hypothesis is true, the *$p$-value* is the probability of obtaining a sample statistic equal to or more extreme than the observed test statistic.

Next we must compare the $p$-value with the chosen significance level. If $p \lt \alpha$ then we reject $H_0$ and accept $H_1$. The lower $p$, the more evidence we have *against* $H_0$ and so the more confidence we can have that $H_0$ is false. If $p \geq \alpha$ then we do not have sufficient evidence to reject the $H_0$ and so must accept it.

Alternatively, we can compare our test statistic with the appropriate critical value for the chosen significance level. We can look up critical values in distribution tables (see worked examples below). If our test statistic is:

- positive and greater than the critical value, then we have sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.
- positive and lower than or equal to the critical value, we must accept the null hypothesis.
- negative and lower than the critical value, then we have sufficient evidence to reject the null hypothesis and accept the alternative hypothesis.
- negative and greater than or equal to the critical value, we must accept the null hypothesis.

For either method:

Significant difference found: **Reject the null hypothesis**

No significant difference found: **Accept the null hypothesis**

Finally, we must interpret our results and come to a conclusion. Returning to the example of the person in court, if the result of our hypothesis test indicated that we should accept $H_1$ and reject $H_0$, our conclusion would be that the jury should declare the person guilty of murder.

- Specify the null and the alternative hypothesis
- Decide upon the significance level.
- Collect data and decide whether to accept $H_0$ or reject $H_0$ and accept $H_1$ by either:
- Comparing the $p$-value to the significance level $\alpha$, or

- Comparing the test statistic to the critical value.

- Interpret your results and draw a conclusion

If you were writing about findings of a hypothesis test in a report/project, you would do so in the following way:

- You would state what the results mean in context of your experiment.
- Immediately after the statement, in brackets, you would include what test you used, the test statistic and the
*P*value it yielded. - It is not just at undergraduate level in which findings are reported in this way, published papers use this method too.

There are *parametric* and *non-parametric* hypothesis tests.

- A
*parametric hypothesis*assumes that the data follows a Normal probability distribution (with equal variances if we are working with more than one set of data) . A*parametric hypothesis test*is a statement about the parameters of this distribution (typically the mean).

- A
*non-parametric test*assumes that**the data does not follow any distribution**and usually bases its calculations on the**median**. Note that although we assume the data does not follow a particular distribution it may do anyway. We do not cover non-parametric hypothesis tests in detail on the Animal Science area of the wiki, however if you would like to find out more about them you can look at the Psychology section.

Whether a test is One-tailed or Two-tailed is appropriate depends upon the alternative hypothesis $H_1$.

*One-tailed tests*are used when the alternative hypothesis states that the parameter of interest is either bigger or smaller than the value stated in the null hypothesis. For example, the null hypothesis might state that the average weight of chocolate bars produced by a chocolate factory in Slough is 35g (as is printed on the wrapper), while the alternative hypothesis might state that the average weight of the chocolate bars is in fact*lower*than 35g.

*Two-tailed tests*are used when the hypothesis states that the parameter of interest*differs*from the null hypothesis but does not specify in which direction. In the above example, a Two-tailed alternative hypothesis would be that the average weight of the chocolate bars is*not*equal to 35g.

- A
*Type I error*is made if we reject the null hypothesis when it is true (so should have been accepted). Returning to the example of the person in court, a Type I error would be made if the jury declared the person guilty when they are in fact innocent. The probability of making a Type I error is equal to the significance level $\alpha$.

- A
*Type II error*is made if we accept the null hypothesis when it is false i.e. we should have rejected the null hypothesis and accepted the alternative hypothesis. This would occur if the jury declared the person innocent when they are in fact guilty.

A confidence interval describes our uncertainty about where the population mean of a measurement lies, based on a sample. It's calculated using the of the mean. We first choose the confidence level of the interval; usually we choose the level to be 95%. This would mean that if we were to repeat our experiment 100 times and compute 100 corresponding confidence intervals, approximately 95 of the confidence intervals would contain the population mean.

A confidence interval consists of an *upper* and *lower* bound, calculated using the sample mean and sample standard deviation, and a *t*-value corresponding to the chosen significance level and the degrees of freedom in the sample.

\begin{align} \text{Upper bound} &= \bar{x} + (t\times\text{ Sample standard deviation}) \\ \text{Lower bound} &= \bar{x} - (t\times\text{ Sample standard deviation}) \end{align}

Minitab or R can calculate this range for you

Try our Numbas test on hypothesis testing: Practising confidence intervals and hypothesis tests.

To develop these ideas further see the other sections of Hypothesis Tests (Animal Science).

For additional information on topics covered in this section see the main site's page on hypothesis testing.