In this course we look at four statistical tests for differences between groups. The groups are usually defined by a nominal level variable, but may also be an ordinal variable with only a small number of categories, and the dependent (outcome) variable is interval level. Both parametric and non-parametric tests are included.
These can summarised as:
3 or more groups
In order to use parametric tests you have to be sure that your data meet the assumptions for the test. Basically, they are:
Interval level data.
Reasonably normally distributed.
Reasonable sample size.
Also read the section on when non-parametric tests are more appropriate and the rule of thumb in the box if in any doubt.
There is a family of t-tests but the one we are referring to here is the two-sample (or independent samples) t-test. The two-sample terminology is preferable as the "independent" can sometimes be confusing. However, SPSS uses the term "independent". If you have repeated measures or "dependent" measures then there are other tests to use that are not covered here.
Student's t-test for independent samples is used to determine whether two samples were drawn from populations with different means.
What can you do if the condition of normality is violated? One approach is to transform the data so the condition is satisfied. This will almost always involve a logarithmic transformation. On rare occasions, a square root, inverse, or inverse square root might be used. If no satisfactory transformation can be found, a non-parametric test should be used. The advantage of transformations is that they make it possible to use standard techniques to construct standard errors for estimating between-group differences.
Use this test for comparing means of 3 or more groups, to avoid performing multiple t-tests.
If you have 3 groups to compare (1, 2, 3) then we would need 3 separate t-tests (comparing 1 with 2, 1 with 3, and 2 with 3). If you had seven groups you would need 21 separate t-tests. This would be time-consuming but, more important, it would be flawed because in each t-test we usually accept a 5% chance of our conclusion being wrong (test for p < 0.05). So, in 21 tests you would expect that one test would give you a false result. ANOVA overcomes this problem by enabling you to detect significant differences between the treatments as a whole. You do a single test to see if there are differences between the means at our chosen probability level. The test statistic is F.
Look at how the test statistic varies depending on means and variance for three groups in this interactive webpage.
Why use non-parametric tests?
The parametric tests (t-tests and one-way analysis of variance) make assumptions about the population that the sample has been drawn from. This often includes assumptions about the shape of the population distribution. The required assumptions are less restrictive than those for fully parametric models. For example:
Most parametric procedures require knowledge of or a strong enough belief in a distributional form for the measured outcome in the population studied.
An interval level variable is usually required for parametric inference.
Most non-parametric methods will work with ordinal level data, and some of techniques will hold with nominal level data.
Non-parametric methods are valid for most distributions.
Nonparametric methods are often easier to compute.
Another factor that often limits the applicability of parametric tests based on the assumption that the sampling distribution is normal is the size of the sample of data available for the analysis (sample size; n). We can assume that the sampling distribution is normal even if we are not sure that the distribution of the variable in the population is normal, as long as our sample is large enough (e.g., 100 or more observations). However, if our sample is very small, then those tests can be used only if we are sure that the variable is normally distributed, and there is no way to test this assumption if the sample is small.
Despite being less ‘fussy’, non-parametric tests do have their disadvantages They tend to be less sensitive than their parametric cousins, and therefore may fail to detect differences between groups that actually do exist.
The Mann-Whitney Test is used in place of
This is a non-parametric test used to
compare three or more samples. It is used to test the null hypothesis that all
populations have identical distribution functions against the alternative
hypothesis that at least two of the samples differ only with respect to location
(median), if at all. It is the
analogue to the F-test used in analysis of variance. While analysis of variance
tests depend on the assumption that all populations under comparison are
normally distributed, the Kruskal-Wallis test places no such restriction on the
comparison. It is a logical
extension of the