Jensen-Shannon Divergence Shift

Shift object for calculating the Jensen-Shannon divergence (JSD) between two systems.

Usage

jsdivergence_shift(
  type2freq_1,
  type2freq_2,
  weight_1 = 0.5,
  weight_2 = 0.5,
  base = 2L,
  alpha = 1,
  reference_value = 0,
  normalization = "variation"
)

Arguments

type2freq_1: A data.frame containing words and their frequencies.
type2freq_2: A data.frame containing words and their frequencies.
weight_1: Relative weight of type2freq_1 when constructing the mixed distribution. Together with weight_2 should sum to 1.
weight_2: Relative weight of type2freq_2 when constructing the mixed distribution. Together with weight_1 should sum to 1.
base: The base for the logarithm when computing entropy scores.
alpha: The parameter for the generalized Tsallis entropy. Setting 'alpha = 1' recovers the Shannon entropy.
reference_value: Optional. String or numeric. The reference score to use to partition scores into two different regimes. If 'average', uses the average score according to type2freq_1 and type2score_1. If a lexicon is used for type2score, you need to use the middle point of that lexicon's scale. If no value is supplied, zero will be used as the reference point. See details for more information.
normalization: Optional. Default value: "variation". If 'variation', normalizes shift scores so that the sum of their absolute values sums to 1. If 'trajectory', normalizes them so that the sum of shift scores is 1 or -1. The trajectory normalization cannot be applied if the total shift score is 0, so scores are left unnormalized if the total is 0 and 'trajectory' is specified.

Value

Returns a list object of class shift.

Details

The Jensen-Shannon divergence (JSD) accounts for some of the pathologies of the KLD. It does so by first creating a mixture text \(M\),

\(M = \pi_1 P^{(1)} + \pi_2 P^{(2)}\),

where \(\pi_1\) and \(\pi_2\) are weights on the mixture between the two corpora. The JSD is then calculated as the average KLD of each text from the mixture text,

\(D^{(JS)} \bigl(P^{(1)} || P^{(2)}\bigr) = \pi_1 D^{(KL)} \bigl(P^{(1)} || M \bigr) + \pi_2 D^{(KL)} \bigl(P^{(2)} || M \bigr)\)

If the probability of a word in the mixture text is \(m_i = \pi_1 p_i^{(1)} + \pi_2 p_i^{(2)}\), then an individual word's contribution to the JSD can be written as

\(\delta JSD_i = m_i \log \frac{1}{m_i} - \biggl( \pi_i p_i^{(1)} \log \frac{1}{p_i^{(1)}} + \pi_2 p_i^{(2)} \log \frac{1}{p_i^{(2)}} \bigg)\)

Note

The JSD is well-defined for every word because the KLD is taken with respect to the mixture text M, which contains every word from both texts by design. Unlike the other measures, a word's JSD contribution is always positive, so we direct it in the word shift graph depending on the text in which it has the highest relative frequency. A word's contribution is zero if and only if \(p_i^{(1)} = p_i^{(2)}\).

Like the Shannon entropy, the JSD can be generalized using the Tsallis entropy and the order can be set through the parameter alpha.

Quite often the JSD is effective at pulling out distinct words from each corpus (rather than "stop words"), but it is a more complex measure and so it is harder to properly interpret it as a whole.

The total Jensen-Shannon divergence be accessed through the difference column in the shift object.

Examples

library(shifterator)
library(quanteda)
library(quanteda.textstats)
library(dplyr)

reagan <- corpus_subset(data_corpus_inaugural, President == "Reagan") %>% 
  tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% # to move from classes frequency, textstat, and data.frame to data.frame
select(feature, frequency) 

bush <- corpus_subset(data_corpus_inaugural, President == "Bush" & FirstName == "George W.") %>% 
tokens(remove_punct = TRUE) %>% 
dfm() %>% 
textstat_frequency() %>% 
as.data.frame() %>% 
select(feature, frequency)

jsd <- jsdivergence_shift(reagan, bush)