Sentiment Analysis

4/05/2017

Sentiment Analysis

We are often interested constructing a numerical measure of the sentiment of a document.
Sentiment can be positive/negative: "This movie is great!" (+) "This movie is typical of the ones that Nicholas Cage is in" (-)

Sentiment Analysis

Sentiment can be emotional states:

"God, Nicholas Cage is such a bad actor!" (Disgust) "I wish I could just reach into the screen and throttle Nicholas Cage" (Anger) "So depressed that I'm watching my second Nicholas Cage movie week" (Sadness)

Sentiment analysis for social science research

Means of measuring sentiment analysis

Supervised machine learning - using naive Bayes, SVM, LSTMs or CNNs (neural networks)
Natural language processing

create dictionary of words associated w/ each sentiment.
calculate sentiment score using the dictionary.

Naive bayes for sentiment analysis

Any supervised ML method that we learned can be used for sentiment analysis.
For this class, we will learn how to do sentiment analyses using naive Bayes and SVMs.
We will not only learn how to classify documents by sentiment, we will also learn how to construct more fine grain sentiment metrics.

Sentiment analysis with ML: Step 1 - Identify your training data

You can identify your training data in two ways:

Unsupervised - Identify a source of labeled data OR label your own data
Semi-supervised - Use both other labeled sources of data and your own data.

Semi-supervised sentiment analysis

It is possible under some conditions to significantly improve model accuracy without inducing overfitting.
Semi-supervised learning can oftentimes help with this.
With semi-supervised learning we make use of both labeled and unlabeled data.
There are many ways to do semi-supervised learning, this is just one.

Semi-supervised sentiment analysis: steps

1)Identify a source of labeled data.

Train model.
Apply model to unlabelled data.
Use labels plus human-judgement on a subsample of the unlabelled data & retrain the model with the newly labelled data.

Review of naive Bayes: model

\[ P(C = 1|D) = \frac{P(w_{1} \cap w_{2} \cap \cdots \cap w_{k} | C = 1) P(C = 1)}{P(w_{1} \cap w_{2} \cap \cdots \cap w_{k})} \]

Likelihood and marginal likelihood

Likelihood: \[P(D|C = 1) = \prod_{i=1}^W P(w_{i}|C =1)\] Prior: \[P(C = 1)= \frac{\# D \in C_{1}}{\# D \in C_{1},C_{2}}\]

Marginal likelihood: \[ P(D) = \prod_{i=1}^W P(w_{i}) \]

Assumptions

If we assume that the words are independent conditional on a document class then:

\[ P(C = 1|D) = \frac{[P(w_{1}|C=1)P(w_{2}|C=1)\cdots P(w_{k}| C = 1)] P(C = 1)}{P(w_{1})P(w_{2})\cdots P(w_{k})} \]

Where

\[P(w_{i} | C = 1) = \frac{\# w_{i} \in C_{1}}{\# \mathbf{w} \in C_{1}}\] \[P(C = 1)= \frac{\# D \in C_{1}}{\# D \in C_{1},C_{2}}\] \[P(w_{i})= \frac{\# w_{i} \in C_{1},C_{2}}{\# \mathbf{w} \in C_{1},C_{2}}\]

Classification

\[ \arg\max_{k} C_{k} = P(C = k)\prod_{i=1}^W P(w_{i}|C =k) \] - For classification purposes, we can ignore the marginal likelihood and assign classes based on likehood and the prior.

Classification

An alternative means of expessing this is if:

\[ P(C = k | D) > \frac{1}{k}\]

Assign document to class k.

Using naive Bayes to assess sentiment

Assume that we have a two class sentiment prediction problem where the classes are Positive and Negative
Naive Bayes produces two outputs for each document:

A class label: $k=(+,-)$ - $\arg\max_{k} C_{k} = P(C = k)\prod_{i=1}^W P(w_{i}|C =k)$
Probabilities for each class label: $P(C =+| D)$, $P(C = -| D)$.

Both can be used to measure sentiment for documents in a corpus.

Using classes to measure sentiment

Using only classes can be useful for a number of reasons.
Use the class labels $S = {d_{1+},d_{2+},d_{3-},} $ as an independent of a dependent variable for inference in a statistical model:

As dependent variable:

\[ logit(E[S|X]) = \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} \cdots \] As independent variable:

\[ Y = \beta_{0} + \beta_{1}S + \beta_{2}x_{2} + \beta_{3}x_{3} \cdots \]

Using conditional probabilities you can construct sentiment scores

Conditional probabilities estimated by the naive Bayes classifier provide a good measure of the "strength" of a classification.

We are more certain that a document is positive if:

\[ P(+|D) = 0.9 \implies + \] Than if:

\[ P(+|D) = 0.6 \implies + \] Even though both documents are labelled as positive.

Using conditional probabilities you can construct sentiment scores

This can be useful when it comes to the inference context if you want to construct a variable which is a continuous meaure of the sentiment of a document.

Eg) Does positive sentiment of Congressional speeches correlate with vote share?

Example using movie reviews

# Load the training data
data<- read.csv("/Users/jason/Downloads/movie-pang02.csv", stringsAsFactors = FALSE)
glimpse(data)

## Observations: 2,000
## Variables: 2
## $ class <chr> "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", "Pos", ...
## $ text  <chr> " films adapted from comic books have had plenty of succ...

Clean the reviews and place in a DTM

# Clean the data
reviews<-data$text
newcorpus<-text_cleaner(reviews,rawtext=FALSE)
sentiment<-data$class
# Create a document term matrix
dtm <- DocumentTermMatrix(newcorpus)
dtm = removeSparseTerms(dtm, 0.99) # Reduce sparsity

Train the model and assess relevant statistics

# Split sample into training and test (75/25)
train=sample(1:length(reviews),
             length(reviews)*0.75)
dtm_mat<-as.matrix(dtm)
trainX = dtm_mat[train,]
testX = dtm_mat[-train,]
trainY = sentiment[train]
testY = sentiment[-train]

Conversion of Document Term Matrix to Counts

Naive Bayes uses proportions of words so we need to transform counts higher than 1 to 0.

counts <- function(x) {
  y <- ifelse(x > 0, 1,0)
  y <- factor(y, levels=c(0,1))
  y
}

Convert Document Term Matrix to Counts

fword_train <- apply(trainX, 2, counts)
fword_test <- apply(testX, 2, counts)

Estimating the Naive Bayes Model

viral_classifier <- 
  naiveBayes(x=fword_train,y=factor(trainY))

We will use the function "naiveBayes" in the "e1070" package.
The "viral_classifier" object is now the trained classifier on the training data.

Apply the trained naive Bayes classifier to the test data

viral_test_pred <- 
  predict(viral_classifier, newdata=fword_test)

# Let's see how this looks
confusion = table(testY,viral_test_pred)
confusion

##      viral_test_pred
## testY Neg Pos
##   Neg 205  38
##   Pos  63 194

Calculate stats

accuracy<-c(confusion[1,1]+confusion[2,2])/sum(confusion)
accuracy

## [1] 0.798

specificity<-confusion[1,1]/sum(confusion[1,])
specificity

## [1] 0.8436214

sensitivity<-confusion[2,2]/sum(confusion[2,])
sensitivity

## [1] 0.7548638

Trained sentiment model can now be applied to other things

Eg) Trump's Tweets

trumptweets <- read.csv("https://www.ocf.berkeley.edu/~janastas/trump-tweet-data.csv")
trumptweets<-trumptweets[1:10,]
trumptweets<-trumptweets$Text
cleantweets<-text_cleaner(trumptweets, rawtext = FALSE)
dtm <- DocumentTermMatrix(cleantweets)
dtm = removeSparseTerms(dtm, 0.99) # Reduce sparsity

trump_dtm<-as.matrix(dtm)

Classify the first 10 tweets and get class probabilities

Let's classify the first 10 Tweets

trump_tweet_pred <- 
  predict(viral_classifier, 
          newdata=trump_dtm, type="raw")

trump_tweet_pred

##                 Neg           Pos
##  [1,]  1.000000e+00 2.985154e-253
##  [2,] 7.163240e-177  1.000000e+00
##  [3,]  8.867135e-40  1.000000e+00
##  [4,]  1.000000e+00  7.963089e-44
##  [5,]  1.000000e+00 5.679855e-244
##  [6,]  1.000000e+00 1.319742e-265
##  [7,]  1.000000e+00 4.563247e-272
##  [8,]  4.499153e-33  1.000000e+00
##  [9,]  1.000000e+00 8.356058e-175
## [10,]  1.000000e+00  0.000000e+00

Classify the first 10 tweets and get classes

trump_tweet_pred <- 
  predict(viral_classifier, 
          newdata=trump_dtm, type="class")

trump_tweet_pred

##  [1] Neg Pos Pos Neg Neg Neg Neg Pos Neg Neg
## Levels: Neg Pos