Shift object for calculating weighted scores of two systems of types, and the shift between them.
Usage
weighted_avg_shift(
type2freq_1,
type2freq_2,
type2score_1 = NULL,
type2score_2 = NULL,
reference_value = NULL,
handle_missing_scores = "error",
stop_lens = NULL,
stop_words = NULL,
normalization = "variation"
)
Arguments
- type2freq_1
A data.frame containing words and their frequencies.
- type2freq_2
A data.frame containing words and their frequencies.
- type2score_1
Optional. A lexicon containing 2 columns. The first column the words and the second column the word score.
- type2score_2
Optional. A lexicon containing 2 columns. The first column the words and the second column the word score.
- reference_value
Optional. String or numeric. The reference score to use to partition scores into two different regimes. If 'average', uses the average score according to type2freq_1 and type2score_1. If a lexicon is used for type2score, you need to use the middle point of that lexicon's scale. If no value is supplied, zero will be used as the reference point. See details for more information.
- handle_missing_scores
Optional. Default value: "error". If "error", throws an error whenever a word has a score in one score dictionary but not the other. If "exclude", excludes any word that is missing a score in one score dictionary from all word shift calculations, regardless if it may have a score in the other dictionary. If "adopt" and the score is missing in one dictionary, then uses the score from the other dictionary if it is available
- stop_lens
Optional. A vector of 2 values. Denotes intervals of scores that should be excluded from word shifts calculations. Types with scores in this range will be excluded from word shift calculations. See details for more information.
- stop_words
Optional. A vector that contains words that should be excluded from word shifts calculations.
- normalization
Optional. Default value: "variation". If 'variation', normalizes shift scores so that the sum of their absolute values sums to 1. If 'trajectory', normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and 'trajectory' is specified.
Details
reference_value: When a lexicon is used for type2score, you have to supply the middle point of the lexicon's scale. If the scale is from 1 to 9, the middle point you should use is 5. If no reference value is given, a value of 0 will be used. This might skew the results when calculating the shift scores.
stop_lens:
Stop_lens can be used to remove words that fall within a range from the shift score calculations.
This should be used in combination supplying a lexicon. If the scale of the lexicon is from one to
nine, you can, for example, remove the words that would have a score between 4 and 6 by supplying
a vector of c(4, 6)
.
See also
Other shifts:
entropy_shift()
,
jsdivergence_shift()
,
kldivergence_shift()
,
proportion_shift()
Examples
#' library(shifterator)
library(quanteda)
library(quanteda.textstats)
library(dplyr)
reagan <- corpus_subset(data_corpus_inaugural, President == "Reagan") %>%
tokens(remove_punct = TRUE) %>%
dfm() %>%
textstat_frequency() %>%
as.data.frame() %>% # to move from classes frequency, textstat, and data.frame to data.frame
select(feature, frequency)
bush <- corpus_subset(data_corpus_inaugural, President == "Bush" & FirstName == "George W.") %>%
tokens(remove_punct = TRUE) %>%
dfm() %>%
textstat_frequency() %>%
as.data.frame() %>%
select(feature, frequency)
was <- weighted_avg_shift(reagan, bush, handle_missing_scores = "exclude")
#> There are 1461 words excluded from the calculations