Skip to contents

Shift object for calculating differences in proportions of types across two systems.

Usage

proportion_shift(type2freq_1, type2freq_2)

Arguments

type2freq_1

A data.frame containing words and their frequencies.

type2freq_2

A data.frame containing words and their frequencies.

Value

Returns a list object of class shift.

Details

The easiest word shift graph that we can construct is a proportion shift. If \(p_i^{(1)}\) is the relative frequency of word i in the first text, and \(p_i^{(2)}\) is its relative frequency in the second text, then the proportion shift calculates their difference:

\(\delta p_i = p_i^{(2)} - p_i^{(1)}\)

If the difference is positive (\(\delta p_i > 0\)), then the word is relatively more common in the second text. If it is negative (\(\delta p_i < 0\)), then it is relatively more common in the first text. We can rank words by this difference and plot them as a word shift graph.

Examples

#' library(shifterator)
library(quanteda)
library(quanteda.textstats)
library(dplyr)

reagan <- corpus_subset(data_corpus_inaugural, President == "Reagan") %>% 
  tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% # to move from classes frequency, textstat, and data.frame to data.frame
select(feature, frequency) 

bush <- corpus_subset(data_corpus_inaugural, President == "Bush" & FirstName == "George W.") %>% 
tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% 
select(feature, frequency)

prop <- proportion_shift(reagan, bush)