Often, we want to test hypotheses about two groups.
Does the outcome of a treatment differ from a control?
Is the United States have more income inequality than Canada?
Are there more military interventions under Democratic or Republican presidents?
March, 14, 2017
Often, we want to test hypotheses about two groups.
Does the outcome of a treatment differ from a control?
Is the United States have more income inequality than Canada?
Are there more military interventions under Democratic or Republican presidents?
When comparing two groups, you are generally conducting a bivariate analysis - ie an analysis where you're comparing two variables.
When doing bivariate analysis you can have two types of groups in your samples:
Dependent samples- Samples that have the same subjects or the values of one group of subjects will affect the values of another. ie) Housework between husbands and wives, repeated measurements of test scores on the same people
Independent samples - observations in one sample are independent of the observations in another sample. ie) randomly selected subjects in Michigan and randomly selected subject in Georgia asked about their party affiliation.
Bivariate analysis are typically conducted with independent samples or at the very least assumed to be independent.
Sometimes it's tricky to figure out whether samples are dependent or independent.
Does the outcome of a treatment differ from a control? (Independent)
Does the United States have more income inequality than Canada? (Independent, but may have dependencies)
Are there more military interventions under Democratic or Republican presidents? (?)
In the context of simple siginficance testing, we generally assume that samples are independent.
There are more sophisticated methods of dealing with dependent samples that we'll learn about when we get back from the break.
The main difference between significance testing with one sample and two samples is the standard error.
With two independent samples the standard error is now \(\sqrt{se_{1}^2 + se_{2}^2}\)
Everything else is pretty much the same.
\(m_{democrat}\) - average # of military intervention under democratic presidents from 1900-Present.
\(m_{republican}\) - average # of military intervention under democratic presidents from 1900-Present.
\[ \hat{m}_{democrat} = \sum_{i=1}^{N}\frac{intervention_{i}}{Terms_{democrat}} \]
Let's say that there are 29 total terms.
\(Terms_{democrat} = 14\), \(Terms_{republican} = 15\),
\(\hat{m}_{democrat}= 1.5\) , \(\hat{m}_{democrat}= 1.8\)
\(se_{democrat}= 0.2\) , \(se_{republican}= 0.3\)
Most useful to do hypothesis testing for proportions with categorical data.
Example: Does Prayer Help Coronary Surgery Patients?
Prayer | Complications | No Complications | Total |
---|---|---|---|
Yes | 315 | 289 | 604 |
No | 304 | 293 | 597 |
Start by thinking about what we are comparing.
We want to compare the % of people in the "Prayer" group that had complications (\(\pi_{1}\)) vs. % of people in the "No Prayer" group that had complications (\(\pi_{2}\)).
\(\pi_{1} = 315/604 = 0.522\), \(\pi_{2} = 304/604 = 0.509\)
What are the null and alternative hypotheses?
Perform the significance test to answer the question.
Test for comparing two means is pretty much identical to the one for comparing two proportions.
Only the distribution for the confidence level changes as does the standard error.
Example: Who spends more time doing housework? Men or women?
Sex | Sample Size | Mean Minutes/Day | SD |
---|---|---|---|
Men | 1219 | 23 | 32 |
Women | 733 | 37 | 16 |
What are the null and alternative hypotheses?
Perform a significance test to answer the question who spends more time doing housework? Men or women?