FAQs

In terms of statistics what is a scale of measurement? I have heard a lot about nominal, ordinal and interval scale of measurements. What are these?

What are the most common ways of summarising a variable?

What are the most common measures of variability?

What is a normal distribution?

How do I produce a bar chart of a nominal variable?

How do I produce a histogram for a continuous variable?

I have heard a lot about Scatterplot. What is this and how can I produce it in SPSS?

How do I produce and modify a 2D pie chart in SPSS of a nominal variable?

How can I produce a clustered bar chart of two nominal variables?

How to produce a boxplot for one continuous (interval) variable by one categorical variable? That is, for each level of the categorical variable I want to see a boxplot on the same plot.

How to produce a boxplots for several variables on the same plot?

How do I produce survival curves in SPSS? If your data is already in SPSS follow the instructions below to produce survival curves. Note that in the instruction below variable means your own variable in the data file. If you don't have any factor (grouping) variable for comparison ignore instructions 7-10.

How to produce a stacked bar for several variables on the same plot?

What is an Independent-samples t test (two-sample t test) and when can it be used?

What is paired-samples t test (dependent t test) and when can it be used?

How do I perform a normality test in SPSS?

<How do I perform an homogeneity of variance test in SPSS?

How do I perform an independent Samples T Tests in SPSS?

My data is not normal. Which test should I use to find out if there are significant differences between groups?

How do I perform a paired samples t test (Dependent T Test) in SPSS?

I have paired sample data that are not normally distributed. Which test should I use in SPSS?

How do I perform a One-Sample T Test in SPSS?

What is correlation?

How do I produce the Pearson or Spearman Rho correlation coefficient in SPSS?

In terms of statistics what is a scale of measurement? I have heard a lot about nominal, ordinal and interval scale of measurements. What are these?

Scale of measurement is simply the various ways that you use numbers to collect data for analysis. For example in nominal scale you assign numbers to certain words e.g. 1=male and 2=female. Or rock can be classified as 1=sedimentary, 2=metamorphic or 3=igneous. The numbers are just labels and have not real meaning. The order as well does not matter. For ordinal scale the order matters as they describe order e.g. 1st, 2nd 3rd. They can also be words such as ‘bad’, ‘medium’, and ‘good’. Interval scale refers to quantitative measurement such as temperature, weight, height. It is good to appreciate scale of measurement as they influence the type of analysis that you can do.

What are the most common ways of summarising a variable?

It depends on the scale of measurement of the variable. For nominal variable, usually a frequency table showing (counts are percentages) and mode (most common) are enough. For ordinal variable frequency table, mode and median are enough. For interval variable most statistical measure can be used.

What are the most common measures of variability?

 The most common measure of variability is range, inter-quarter range, variance and standard deviation.

What is a normal distribution?

Normal distribution is a theoretical concept that is symbolised by the familiar bell-shaped curve. It is really a family of distribution. It plays an important role in statistical inference. Some statistical procedures in SPSS assume that your data is normally distributed. That is, your data is taken from a normal population.

How do I produce a bar chart of a nominal variable?

Bar chart is the correct graph for a nominal variable. To produce the bar chart of a nominal variable called Employment Category. From the menu bar select Graphs ->Legacy Dialogs -> Bar…click on Simple, click on Define. From the variables list, select Employment Category [jobcat] click the arrow (>) to transfer it under Category Axis:. Then click OK to generate the graphic.

 How do I produce a histogram for a continuous variable?

To produce a Histogram of Current Salary with Normal Curve - from the menu bar select Graphs -> Legacy Dialogs -> Histogram…

From the variables list, select Current Salary [salary] click the arrow (>) to transfer it under Variable:. Select Display normal curve by a single click on the check box. Then click OK to generate the graphic.

I have heard a lot about Scatterplot. What is this and how can I produce it in SPSS?

The existence of a statistical association between two variables is most apparent in the appearance of a diagram called a scatterplot. A scatterplot is simply a cloud of points of the two variables under investigation.

To produce scatter plot of variable1 against variable2, from the menu bar select Graphs -> Legacy Dialogs -> Scatter/Dot… -> click on Simple, click on Define. From the variables list, select variable1 click the arrow (>) to transfer it under Y Axis:. From the variable list again, select variable2, click the arrow (>) to transfer it under X Axis:. Then click OK to generate the graphic.

How do I produce and modify a 2D pie chart in SPSS of a nominal variable?

Pie chart is another chart that is suitable for a nominal variable. To produce 2D Pie Chart for variable1 made up of 3 levels from the menu bar select Graphs -> Legacy Dialogs -> Pie…Select Summaries for groups of cases by a single click on the radio button. Click on Define. From the variables list, select variable1 click the arrow (>) to transfer it under Define Slices by:. Then click OK to generate the graphic. Double-Click on the Pie in quick succession to make it editable. (the pie will be displayed in the Chart Editor window). In this Window select Elements -> Show Data Labels. Select Percent (or Count) from the displayed dialogue box. Click on the green arrow on the right. Still on the dialogue box, click Apply and then Close. You will now be back on the Chart Editor window, from the menu bar select File -> Close.

How can I produce a clustered bar chart of two nominal variables?

From the menu bar select Graphs ->Legacy Dialogs -> Bar -> Clustered -> Define.From the variable list transfer variable1 to Category Axis:. From the variable list again transfer variable2 toDefine Clusters by: Then click OK to generate the graphic.

How to produce a boxplot for one continuous (interval) variable by one categorical variable? That is, for each level of the categorical variable I want to see a boxplot on the same plot.

  1. From the menu bar select Graphs -> Legacy Dialogs -> Boxplot… click on Simple, click on Summaries for group of cases, and click on Define
  2. From the variables list, select the variable e.g. Current Salary [salary] click the arrow (>) to transfer it under Variable:.
  3. From the variable list again, select a categorical variable e.g. gender, click the arrow (>) to transfer it under Category Axis:.
  4. Then click OK to generate the graphic

How to produce a boxplots for several variables on the same plot?

  1. From the menu bar select Graphs -> Legacy Dialogs -> Boxplot… click on Simple, click on Summaries for separate variables, and click on Define
  2. From the variables list, select the variables e.g. Current Salary [salary] click the arrow (>) to transfer it under Variable:. Select and transfer more variables as necessary.
  3. Then click OK to generate the graphic. 

How do I produce survival curves in SPSS? If your data is already in SPSS follow the instructions below to produce survival curves. Note that in the instruction below variable means your own variable in the data file. If you don't have any factor (grouping) variable for comparison ignore instructions 7-10.

  1. To run a Kaplan-Meier Survival Analysis, from the menus choose: Analyze →  Survival → Kaplan-Meier...
  2. Select variable as the Time variable.
  3. Select variable as the Status variable.
  4. Click Define Event.
  5. Under Value(s) Indicating Event Has Occurred type 1 in the text area next to Single value:.
  6. Click Continue.
  7. Select variable as a Factor.
  8. Click Compare Factor.
  9. Select Log rank, Breslow, and Tarone-Ware.
  10. Click Continue.
  11. Click Options in the Kaplan-Meier dialog box.
  12. Select Quartilesin the Statistics group and Survival in the Plots group.
  13. Click Continue.
  14. Click OK in the Kaplan-Meier dialog box.

How to produce a stacked bar for several variables on the same plot? 

  1. From the menu bar select Graphs -> Legacy Dialogs -> Bar… click on Stacked, click on Summaries for separate variables, and click on Define
  2. From the variables list, select the variables e.g. var1 click the arrow (>) to transfer it under Bars Represent:. Select and transfer more variables as necessary. The default statistics is the mean. You can change to any statistic by clicking on Change Statistics... when it is active. If not active click on any variable under Bars Represent:
  3. Note that you also need a variable under Category Axis such as gender for example.
  4. Then click OK to generate the graphic.

What is an Independent-samples t test (two-sample t test) and when can it be used?

This is one specific example of a group of test in statistics know as t test. This is used to compare the means of one variable for two groups of cases.  As an example, a practical application would be to find out the effect of a new drug on blood pressure.  Patients with high blood pressure would be randomly assigned into two groups, a placebo (control) group and a treatment (experimental) group.  The placebo group would receive conventional treatment while the treatment group would receive a new drug that is expected to lower blood pressure.  After treatment for a couple of months, the two-sample t test is used to compare the average blood pressure of the two groups.  Note that each patient is measured once and belongs to one group. You use this test for normally distributed data and when the variances between the two groups are equal.

What is paired-samples t test (dependent t test) and when can it be used?

This is one specific example of a group of test in statistics know as t test. This is used to compare the means of two variables for a single group.  The procedure computes the differences between values of the two variables for each case and tests whether the average differs from zero.  For example, you may be interested to evaluate the effectiveness of a mnemonic method on memory recall.  Subjects are given a passage from a book to read, a few days later, they are asked to reproduce the passage and the number of words noted.  Subjects are then sent to a mnemonic training session.  They are then asked to read and reproduce the passage again and the number of words noted.  Thus each subject has two measures, often called before (pre) and after (post) measures.

An alternative design for which this test is used is a matched-pairs or case-control study.  To illustrate an example in this situation, consider treatment patients.  In a blood pressure study, patients and control might be matched by age, that is, a 64-year-old patient with a 64-year-old control group member.  Each record in the data file will contain responses from the patient and also for his matched control subject. You use this test for normally distributed data.

What is one-sample t test and when can it be used?

This is one specific example of a group of test in statistics know as t test. This is used to compare the mean of one variable with a known or hypothesised value.  In other words, the One-sample t tests procedure tests whether the mean of a single variable differs from a specified constant.  For instance, you might be interested to test whether the average IQ of some 50 students differs from an IQ of 125; or how the average salary in Newcastle compares to the national average.

I hear a lot about p-value. What is it and how do I interpret it?

The p stands for probability therefore the p-value is a probability value between zero and one. It helps you to draw conclusion about statistics you perform. The three common situations are:

  1. If the p-value is greater than 0.05, the null hypothesis is accepted and the result is not significant.
  2. If the p-value is less than 0.05 but greater than 0.01, the null hypothesis is rejected and the result is significant beyond the 5 percent level.
  3. If the p-value is smaller than 0.01, the null hypothesis is rejected and the result is significant beyond the 1 percent level.

The null and alternative hypotheses are simply two opposing statements that you make concerning your research question.

How do I perform a normality test in SPSS?

Follow these steps to perform the normality test:

  1. From the menu bar select Analyze -> Descriptives Statistics -> Explore….
  2. Transfer a continuous variable e.g. blood pressure [bloodpres] to Dependent List:.
  3. Transfer a grouping variable gender to Factor List:.
  4. From Display click on Plots. Then click on Plots….
  5. Under Descriptive deselect Stem-and –leaf.
  6. Select Normality plots with tests.
  7. Click on Continue. Click on OK

Examine the result on the table Tests of Normality. For a small sample size (n≤50) use the Shapiro-Wilk statistic. For large sample size (n>50) use the Kolmogorov-Smirnov statistic. Note that p values (usually under the column Sig.) >0.05 means that your data is normally distributed.

How do I perform an homogeneity of variance test in SPSS?

Follow these steps to perform the homogeneity of variance test:

  1. Select Analyze -> Compare Means -> One-Way ANOVA….
  2. Transfer a continuous variable e.g. blood pressure [bloodpres] to Dependent List:.
  3. Transfer a grouping variable gender to Factor List:.
  4. Click on Options and select Homogeneity of variance test.
  5. Click Continue and click OK

Exaine the table Test of Homogeneity of variance. Note that if the p value is >0.05 then the variances between the two groups are equal. Ignore the table ANOVA which is also produced as part of this procedure.

How do I perform an independent Samples T Tests in SPSS?

This test is suitable only for normally distributed data and when the variances between the two groups are equal. Follow these steps to perform the test:

  1. Select Analyze -> Compare Means -> Independent-Samples T Test….
  2. Transfer the continuous variable e.g. blood pressure to Test Variable(s):.
  3. Transfer the grouping variable e.g. gender to Grouping Variable:.
  4. Click on Define Groups. Beside Group 1: type 1. Beside Group 2: type 2.
  5. Click on Continue and click on OK.

If you are not sure how to interpret the output recall the dialogue box via Analyze -> Compare Means -> Independent-Samples T Test…. Click on Help. Then click on Show me on the displayed window to go through a case study. The case study uses a different data file to explain every bit of the output.

My data is not normal. Which test should I use to find out if there are significant differences between groups?

You should use the Mann-Whitney U Test. Follow these steps:

  1. Select Analyze -> Nonparametric Tests -> 2 Independent-Samples T Test….
  2. Transfer the continuous variable e.g. blood pressure to Test Variable(s):.
  3. Transfer the grouping variable e.g. gender to Grouping Variable:.
  4. Click on Define Groups. Beside Group 1: type 1. Beside Group 2: type 2.
  5. Click on Continue and click on OK.

How do I perform a paired samples t test (Dependent T Test) in SPSS? 

This test is also suitable for normally distributed data. There is no need for homogeneity of variance test because we are dealing with the same group. To do the actual test, follow these steps:

  1. From the menu bar select Analyze -> Compare Means -> Paired Samples T Test….
  2. Click on variable1 and click on the arrow.
  3.  Click on variable2 and click on the arrow. Click OK.

If you are not sure how to interpret the output recall the dialogue box via Analyze -> Compare Means -> Paired -Samples T Test…. Click on Help. Then click on Show me on the displayed window to go through a case study. The case study uses a different data file to explain every bit of the output. 

I have paired sample data that are not normally distributed. Which test should I use in SPSS?

You should use Wilcoxon Signed Ranks test. Follow these steps:

  1. From the menu bar select Analyze -> Nonparametric Tests -> Paired Samples T Test….
  2. Click on variable1 and click on the arrow.
  3. Click on variable2 and click on the arrow. Click OK.

How do I perform a One-Sample T Test in SPSS? 

To do the test, follow these steps:

  1. From the menu bar select Analyze -> Compare Means -> One-Sample T Test….
  2. Select the continuous variable e.g. Intelligence Quotient [iq] and click on the arrow.
  3. Type a value e.g. 125 besides Test Value:.
  4. Click OK.

If you are not sure how to interpret the output recall the dialogue box via Analyze -> Compare Means -> One -Samples T Test…. Click on Help. Then click on Show me on the displayed window to go through a case study. The case study uses a different data file to explain every bit of the output. 

What is correlation?

A correlationis a statistic used for measuring the strength of a supposed linear association between two variables. The most common correlation coefficient is the Pearson correlation coefficient, use to measure the linear relationship between two interval variables that are normally distributed. Generally, the correlation coefficient varies from -1 to +1.  If your data is ordinal or not normally distributed you use the Spearman Rho. If you have two nominal variables and want to find the relationship (association) between them you will use Chi-Square.

How do I produce the Pearson or Spearman Rho correlation coefficient in SPSS?

To produce the correlation coefficient select:
Analyze -> Correlate -> Bivariate…

This will open the Bivariate Correlation dialog box. Transfer the two variables to the Variables text box.  Select the correlation you want by a single click on the dialog box.

I hear a lot about one-tailed test and two-tailed test. I don’t know what they are and how to apply them. Any advice will be appreciated.

One-tailed test applies in situation where the researcher knows the direction the results should point. For example, when testing a new drug against a placebo, a researcher would want to know whether the new drug is better than the placebo. On a family of normal distribution curves a one-tailed test can be in one direction only, positive or negative. Before you make your conclusion you must divide the p value by 2 because you are doing a one-tailed test.

Two-tailed test applies in situation where the researcher does not know or is interested in both directions of the results. The two-tailed test is more common than the one-tailed test. 

 

You must decide before you collect your data whether you are doing a one-tailed or two-tailed test.

 

Data Entry and Manipulation

Graphs

Statistical Inference and Significant Testing