- Linear Relationships
- Least Squares Prediction Equation
- Linear Regression Model
March, 28, 2017
What is the relationship between gun ownership and crime?
What is the relationship between smoking and cancer?
What is the relationship between ethinc and racial diversity and trust?
Linear relationship are relationships between two or more variables that have a certain functional form.
\(y =\) response variable – Levels of trust.
\(x =\) explanatory variable – measure of diversity
\[ y = \alpha + \beta x \]
y-intercept \(\alpha\)
slope \(\beta\)
Positive relationships
Negative relationships
A linear function provides a model for the relationship between two variables.
Given any two variables, we can estimate a linear function by estimating \(\alpha\) and \(\beta\) to fit a scatterplot.
trumptweets <- read.csv("https://www.ocf.berkeley.edu/~janastas/trump-tweet-data.csv") attach(trumptweets) trumptweets[1:5,1]
## [1] I have not heard any of the pundits or commentators discussing the fact that I spent FAR LESS MONEY on the win than Hillary on the loss! ## [2] I would have done even better in the election, if that is possible, if the winner was based on popular vote - but would campaign differently ## [3] Campaigning to win the Electoral College is much more difficult & sophisticated than the popular vote. Hillary focused on the wrong states! ## [4] Yes, it is true - Carlos Slim, the great businessman from Mexico, called me about getting together for a meeting. We met, HE IS A GREAT GUY! ## [5] especially how to get people, even with an unlimited budget, out to vote in the vital swing states ( and more). They focused on wrong states ## 31058 Levels: ...
plot(Retweets,Favorites,xlab = "Trump Retweets",ylab = "Trump Favorites")
How do we fit a model to this data if we are interested in using retweets to explain favorites?
Model: \[y = \alpha + \beta x\]
Prediction Equations: \[\hat{y} = a + bx\]
\[ b = \frac{\sum (x - \bar{x})(y - \bar{y})}{\sum(x-\bar{x})^2} \]
\[ a = \bar{y} - b\bar{x} \]
model.1<-lm(Favorites~Retweets) summary(model.1)
## ## Call: ## lm(formula = Favorites ~ Retweets) ## ## Residuals: ## Min 1Q Median 3Q Max ## -253188 -445 -274 -251 118566 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 2.587e+02 2.649e+01 9.767 <2e-16 *** ## Retweets 2.316e+00 5.513e-03 420.209 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4515 on 31173 degrees of freedom ## Multiple R-squared: 0.8499, Adjusted R-squared: 0.8499 ## F-statistic: 1.766e+05 on 1 and 31173 DF, p-value: < 2.2e-16
\(a = 258.7\)
\(b = 2.316\)
plot(Retweets,Favorites,xlab = "Trump Retweets",ylab = "Trump Favorites") abline(a = 258.7,b = 2.316 )