Discriminant analysis is a technique use to build a predictive model of group membership based on observed characteristics of each case. For example, it is possible to group children into two main groups of Very Clever or Just Clever children based on their performance on the three core subjects English, Mathematics, and Science. Discriminant analysis generate functions from a sample of cases for which group membership is known; the functions can then be applied to new cases with measurements for the predictor variables but unknown group membership. That is, knowing a child's score on three subjects, we can use the discriminant function to determine whether the child belongs to the Very Clever group or the Just Clever group.
When there are two groups, only one discriminant function is generated. When there are more than two groups, several functions will be generated. Usually, only the first three of these functions will be useful.
There are basically three types of DA: direct, hierarchical and stepwise. In direct DA, all the variables enter at once; in hierarchical DA, the order of variable entry is determine by the researcher; and in stepwise DA, statistical criteria alone determine the order of entry. This document concentrate on stepwise DA.
Data was collected on two groups of students. One group is considered to be Very Clever while the other is considered to be Just Clever. The scores of the students on the following subjects English, Mathematics and Science were noted. The maximum score for each subject is 100. The two groups are the dependent variables while the subjects are the independent variables. The collected data is as shown on Table 1 below. Enter the data into SPSS Data Editor window. The data should fit 4 columns and 30 rows. Define the independent variables. Define the coding variable, comprising two values 1 = Very Clever, 2 = Just Clever. After the data has been entered successfully, we are ready to perform some analysis.
Table 1: Collected Data
| English | Maths | Science | Group | English | Maths | Science | Group |
| 44 | 44 | 28 | 1 | 40 | 54 | 40 | 1 |
| 61 | 29 | 25 | 1 | 29 | 53 | 19 | 2 |
| 19 | 68 | 77 | 1 | 28 | 66 | 71 | 2 |
| 48 | 58 | 45 | 1 | 27 | 67 | 17 | 2 |
| 38 | 41 | 30 | 1 | 45 | 66 | 79 | 2 |
| 25 | 55 | 77 | 1 | 55 | 43 | 51 | 2 |
| 39 | 30 | 50 | 1 | 68 | 45 | 58 | 2 |
| 33 | 59 | 44 | 1 | 52 | 56 | 51 | 2 |
| 30 | 65 | 49 | 1 | 74 | 47 | 33 | 2 |
| 17 | 60 | 21 | 1 | 70 | 51 | 29 | 2 |
| 42 | 49 | 30 | 1 | 49 | 67 | 74 | 2 |
| 47 | 44 | 43 | 1 | 80 | 53 | 40 | 2 |
| 13 | 76 | 52 | 1 | 50 | 61 | 13 | 2 |
| 63 | 31 | 54 | 1 | 48 | 71 | 71 | 2 |
| 54 | 47 | 8 | 1 | 65 | 60 | 39 | 2 |
From the menus choose:
Statistics
Classify
Discriminant... (See diagram below)
The Discriminant
Analysis menu
The Discriminant analysis dialog box will be loaded on the screen as shown below. Click to select the dependent variable group and click on the arrow (>) to transfer it into the Grouping Variable text box.
The Discriminant
analysis dialog box
Now click on Define Range to load the Discriminant analysis: Define Range dialog box on to the screen (see diagram below). Type 1 into the Minimum text box and 2 into the Maximum text box. Click on Continue to return to the Discriminant analysis dialog box.
The Discriminant
analysis: Define Range dialog box
Now drag the cursor over the rest of the variables (i.e. english, maths and science) to highlight them, and click on the arrow (>) to tranfer them into the Independent text box. Click on Use stepwise method. The completed dialog box is as shown above.
Click on Statistics and the Discriminant analysis: Statistics dialog box will be loaded on the screen (see diagram below). Within the Descriptives box select Univariate ANOVAs. Click on Continue.
The Discriminant
Analysis: Statistics dialog box
To obtain the success/failure table, click on Classify and the Discriminant analysis: Classification dialog box will be loaded on the screen (see diagram below). Within the Display box, select Summary table. Click on Continue and then on OK to run the procedure.
The Discriminant
Analysis: Classification dialog box
Now let us examine the output and try to offer some interpretation.
The first two tables from the output listing shown below gives information about the data and the number of cases in each category of dependent variable.
The table shown below was generated by the selected Univariate ANOVAs. This indicates whether there is a statistically significant difference among the dependent variable means (group) for each independent variable. Only English is statistically significant. The Wilks' Lambda is a statistical criteria that is used to add or remove variables from the analysis. Several other criteria are available.
The table below shows which variables have entered the analysis. The variables are English and Maths with Wilks' Lambda of 0.819 and 0.513 respectively. Note that, at each step the variable that minimizes the overall Wilks' Lambda is entered. The table also gives more statistical information about the two variables that have entered the analysis. The F statistic and their significant is shown on the table. Note the information provided at the bottom of the table.
At each step, the variable that minimizes the overall Wilks' Lamda is entered.
The next table shown below gives a summary of the variables in the analysis. The step at which they were entered is also shown along with other useful statistics.
The table below shows variables not in the analysis at each step. Note, at step 0, none of the variable was yet in the analysis. At step 1, English was entered and at step 2 Maths was entered.
The next two tables shown below gives the percentage of the variance accounted for by the one discriminant function generated. The significant of the function is also shown. Becuase there are two groups only one discriminant function was generated.
The standardised conical discriminant function coefficients for the two variables in the analysis are shown on the table below.
The pooled within groups correlations between the discriminanting variables and the function is shown on the table below. It is clear from this output that the association between the variable Science and the discriminant function is very small.
The next table below shows the group centroids for each group. The group centroids are quite different for the two groups.
The last three tables from the output listing was generated from the optional selection of Summary table from the Classify options in the Discriminant Analysis dialog box. The last of the table provide an indication of the success rate for prediction of membership of the grouping variable's categories using the discriminant function developed from the analysis.
The last table shows that the Very Clever chidren are the more accurately classified with 93.8% of the cases correct. For the Just Clever children 71.4% of cases were correctly classified. Overall, 83.3% of the original cases was correctly classified.