Skip to contents

Shift object for calculating weighted scores of two systems of types, and the shift between them.

Usage

weighted_avg_shift(
  type2freq_1,
  type2freq_2,
  type2score_1 = NULL,
  type2score_2 = NULL,
  reference_value = NULL,
  handle_missing_scores = "error",
  stop_lens = NULL,
  stop_words = NULL,
  normalization = "variation"
)

Arguments

type2freq_1

A data.frame containing words and their frequencies.

type2freq_2

A data.frame containing words and their frequencies.

type2score_1

Optional. A lexicon containing 2 columns. The first column the words and the second column the word score.

type2score_2

Optional. A lexicon containing 2 columns. The first column the words and the second column the word score.

reference_value

Optional. String or numeric. The reference score to use to partition scores into two different regimes. If 'average', uses the average score according to type2freq_1 and type2score_1. If a lexicon is used for type2score, you need to use the middle point of that lexicon's scale. If no value is supplied, zero will be used as the reference point. See details for more information.

handle_missing_scores

Optional. Default value: "error". If "error", throws an error whenever a word has a score in one score dictionary but not the other. If "exclude", excludes any word that is missing a score in one score dictionary from all word shift calculations, regardless if it may have a score in the other dictionary. If "adopt" and the score is missing in one dictionary, then uses the score from the other dictionary if it is available

stop_lens

Optional. A vector of 2 values. Denotes intervals of scores that should be excluded from word shifts calculations. Types with scores in this range will be excluded from word shift calculations. See details for more information.

stop_words

Optional. A vector that contains words that should be excluded from word shifts calculations.

normalization

Optional. Default value: "variation". If 'variation', normalizes shift scores so that the sum of their absolute values sums to 1. If 'trajectory', normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and 'trajectory' is specified.

Value

Returns a list object of class shift.

Details

reference_value: When a lexicon is used for type2score, you have to supply the middle point of the lexicon's scale. If the scale is from 1 to 9, the middle point you should use is 5. If no reference value is given, a value of 0 will be used. This might skew the results when calculating the shift scores.

stop_lens: Stop_lens can be used to remove words that fall within a range from the shift score calculations. This should be used in combination supplying a lexicon. If the scale of the lexicon is from one to nine, you can, for example, remove the words that would have a score between 4 and 6 by supplying a vector of c(4, 6).

Examples

#' library(shifterator)
library(quanteda)
library(quanteda.textstats)
library(dplyr)

reagan <- corpus_subset(data_corpus_inaugural, President == "Reagan") %>% 
  tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% # to move from classes frequency, textstat, and data.frame to data.frame
select(feature, frequency) 

bush <- corpus_subset(data_corpus_inaugural, President == "Bush" & FirstName == "George W.") %>% 
tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% 
select(feature, frequency)

was <- weighted_avg_shift(reagan, bush, handle_missing_scores = "exclude")
#> There are 1461 words excluded from the calculations