March, 28, 2017

For Today

  • Linear Relationships
  • Least Squares Prediction Equation
  • Linear Regression Model

Relationships between variables

  • What is the relationship between gun ownership and crime?

  • What is the relationship between smoking and cancer?

  • What is the relationship between ethinc and racial diversity and trust?

Putnam's Diversity and Trust Study

Linear relationships

  • Linear relationship are relationships between two or more variables that have a certain functional form.

  • \(y =\) response variable – Levels of trust.

  • \(x =\) explanatory variable – measure of diversity

Linear functions

\[ y = \alpha + \beta x \]

  • y-intercept \(\alpha\)

  • slope \(\beta\)

Linear functions

  • Positive relationships

  • Negative relationships

Linear functions and models

  • A linear function provides a model for the relationship between two variables.

  • Given any two variables, we can estimate a linear function by estimating \(\alpha\) and \(\beta\) to fit a scatterplot.

Scatterplots and linear relationships

trumptweets <- read.csv("https://www.ocf.berkeley.edu/~janastas/trump-tweet-data.csv")
attach(trumptweets)
trumptweets[1:5,1]
## [1] I have not heard any of the pundits or commentators discussing the fact that I spent FAR LESS MONEY on the win than Hillary on the loss!    
## [2] I would have done even better in the election, if that is possible, if the winner was based on popular vote - but would campaign differently
## [3] Campaigning to win the Electoral College is much more difficult & sophisticated than the popular vote. Hillary focused on the wrong states! 
## [4] Yes, it is true - Carlos Slim, the great businessman from Mexico, called me about getting together for a meeting. We met, HE IS A GREAT GUY!
## [5] especially how to get people, even with an unlimited budget, out to vote in the vital swing states ( and more). They focused on wrong states
## 31058 Levels:  ...

Scatterplots and linear relationships

plot(Retweets,Favorites,xlab = "Trump Retweets",ylab = "Trump Favorites")

Least Squares Prediction Equation

  • How do we fit a model to this data if we are interested in using retweets to explain favorites?

  • Model: \[y = \alpha + \beta x\]

  • Prediction Equations: \[\hat{y} = a + bx\]

Least Squares Prediction Equation

\[ b = \frac{\sum (x - \bar{x})(y - \bar{y})}{\sum(x-\bar{x})^2} \]

\[ a = \bar{y} - b\bar{x} \]

Example: Predicting Favorites from Retweets

model.1<-lm(Favorites~Retweets)
summary(model.1)
## 
## Call:
## lm(formula = Favorites ~ Retweets)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -253188    -445    -274    -251  118566 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2.587e+02  2.649e+01   9.767   <2e-16 ***
## Retweets    2.316e+00  5.513e-03 420.209   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4515 on 31173 degrees of freedom
## Multiple R-squared:  0.8499, Adjusted R-squared:  0.8499 
## F-statistic: 1.766e+05 on 1 and 31173 DF,  p-value: < 2.2e-16

Example: Predicting Favorites from Retweets

  • \(a = 258.7\)

  • \(b = 2.316\)

Example: Predicting Favorites from Retweets

plot(Retweets,Favorites,xlab = "Trump Retweets",ylab = "Trump Favorites")
abline(a = 258.7,b = 2.316 )