We rely on speeches in the Italian parliament (2013-2020) taken from
ParlaMint to illustrate a
possible use case of our quantities. By showing how government and
opposition parties differentially adjusted their speeches around issues
of immigration following the 2015 refugee crisis in Europe, we
illustrate how our ALC resources can be used to make inferences about
semantic differences across time and groups. Our resources are fully
integrated with the conText
R
package. You find more information on how to get started
with conText
here.
# transformation matrix
local_fasttext = readRDS("../../replication/data/raw/embeddings/it/fastText/fasttext_transform_itwiki_25.rds")
dim(local_fasttext)
## [1] 300 300
# pretrained embeddings
not_all_na <- function(x) any(!is.na(x))
fasttext <- setDT(read_delim("../../replication/data/raw/embeddings/it/fastText/fasttext_vectors_itwiki.vec",
delim = " ",
quote = "",
skip = 1,
col_names = F,
col_types = cols())) %>%
dplyr::select(where(not_all_na)) # remove last column which is all NA
word_vectors <- as.matrix(fasttext, rownames = 1)
colnames(word_vectors) <- NULL
rm(fasttext)
dim(word_vectors)
## [1] 309561 300
We use ParlaMint data for Italian parliamentary debates for the lower house only.
In terms of preprocessing, it is generally a good idea to keep the pre-processing close to what we did for training:
# restricted to lower house
data_lim <- readRDS("../../replication/data/analysis/examples/ParlaMint/parlamint_it.rds")
glimpse(data_lim)
## Rows: 21,654
## Columns: 37
## $ doc_id <chr> "ParlaMint-IT_2014-01-02-LEG17-Sed-159.u2", "ParlaM…
## $ text <chr> " Signor Presidente, chiedo la votazione del proce…
## $ Title <chr> "Report of the session of the Senate of the Italian…
## $ From <date> 2014-01-02, 2014-01-02, 2014-01-02, 2014-01-02, 20…
## $ To <date> 2014-01-02, 2014-01-02, 2014-01-02, 2014-01-02, 20…
## $ House <chr> "Upper house", "Upper house", "Upper house", "Upper…
## $ Term <chr> "17-upper", "17-upper", "17-upper", "17-upper", "17…
## $ Session <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Meeting <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Sitting <chr> "159-upper", "159-upper", "159-upper", "159-upper",…
## $ Agenda <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ Subcorpus <chr> "Reference", "Reference", "Reference", "Reference",…
## $ Speaker_role <chr> "Regular", "Regular", "Regular", "Regular", "Regula…
## $ Speaker_type <chr> "MP", "MP", "MP", "MP", "MP", "MP", "MP", "MP", "MP…
## $ Speaker_party <chr> "M5S.1", "M5S.1", "M5S.1", "M5S.1", "PD", "LN-Aut",…
## $ Speaker_party_name <chr> "Movimento 5 Stelle", "Movimento 5 Stelle", "Movime…
## $ Party_status <chr> "Opposition", "Opposition", "Opposition", "Oppositi…
## $ Speaker_name <chr> "Ciampolillo, Lello", "Ciampolillo, Lello", "Ciampo…
## $ Speaker_gender <chr> "M", "M", "M", "M", "M", "M", "M", "F", "F", "M", "…
## $ Speaker_birth <dbl> 1972, 1972, 1972, 1962, 1953, 1955, 1960, 1964, 197…
## $ Year <chr> "2014", "2014", "2014", "2014", "2014", "2014", "20…
## $ Date <date> 2014-01-02, 2014-01-02, 2014-01-02, 2014-01-02, 20…
## $ Moy <chr> "2014-01", "2014-01", "2014-01", "2014-01", "2014-0…
## $ postimmig <chr> "beforeImmig", "beforeImmig", "beforeImmig", "befor…
## $ postimmig_num <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ populist <dbl> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, …
## $ government <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, …
## $ periodparty <fct> 1.beforeImmig, 1.beforeImmig, 1.beforeImmig, 1.befo…
## $ partyperiod <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ Moyparty <fct> "Movimento 5 Stelle.2014-01", "Movimento 5 Stelle.2…
## $ Yearparty <fct> "Movimento 5 Stelle.2014", "Movimento 5 Stelle.2014…
## $ governmentperiod <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ periodgovernment <fct> 0.beforeImmig, 0.beforeImmig, 0.beforeImmig, 0.befo…
## $ moy <fct> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ Moygovernment <fct> 0, 0, 0, 0, 10, 0, 0, 0, 0, 10, 10, 10, 0, 0, 0, 10…
## $ months <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ monthsgovernment <fct> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
table(data_lim$Moy)
##
## 2014-01 2014-02 2014-03 2014-04 2014-05 2014-06 2014-07 2014-08 2014-09 2014-10
## 801 938 416 841 402 373 821 752 376 486
## 2014-11 2014-12 2015-01 2015-02 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08
## 444 253 529 482 494 658 413 478 844 125
## 2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06
## 419 1128 369 313 306 294 587 350 482 404
## 2016-07 2016-08 2016-09 2016-10 2016-11 2016-12 2017-01 2017-02 2017-03 2017-04
## 506 299 281 497 512 76 324 384 375 224
## 2017-05 2017-06 2017-07 2017-08 2017-09 2017-10 2017-11 2017-12
## 421 280 536 95 321 398 113 434
corp <- quanteda::corpus(data_lim)
#########################
## Corpus prep
# remove short speeches
corpus <- corp %>%
corpus_trim(what = "documents",
min_ntoken = 10)
# some pre-processing
toks <- tokens(corpus, remove_punct=T, remove_symbols=T) %>%
tokens_tolower()
# without stops (also works with them!)
toks_nostop <- tokens_select(toks, pattern = stopwords("it"), selection = "remove")
# only use features that appear at least 10 times in the corpus
feats <- dfm(toks_nostop, tolower=T, verbose = FALSE) %>%
dfm_trim(min_termfreq = 10) %>% featnames()
head(feats, n = 50)
## [1] "signor" "presidente" "chiedo" "votazione"
## [5] "processo" "verbale" "previa" "verifica"
## [9] "numero" "legale" "tratta" "altro"
## [13] "precisazione" "dell'emendamento" "c'è" "stato"
## [17] "voto" "senatrice" "siede" "posto"
## [21] "accanto" "senatore" "gasparri" "votato"
## [25] "quel" "momento" "presente" "vorrei"
## [29] "venisse" "messo" "l'ennesima" "ripetuta"
## [33] "violazione" "regolamento" "norme" "modalità"
## [37] "viene" "perpetrata" "quest'aula" "quindi"
## [41] "venga" "comunque" "prima" "ogni"
## [45] "nave" "crociera" "venezia" "inquina"
## [49] "14.000" "vecchie"
toks_nostop <- tokens_select(toks_nostop, feats, padding = TRUE)
A good first exploratory step is to analyze the nearest neighbors of
the ALC embeddings by groups, i.e. features with the highest
cosine-similarity with each group embedding using
conText::get_nns()
(a wrapper function to
conText::nns()
). In our example, we are interested in the
nearest neighbors to the ALC embedding of the wordstem
immigr
across government and oppositions parties and across
time. We use the candidates
argument to limit the set of
features we want get_nns()
to consider as candidate nearest
neighbors. In our case we limit candidates to those features that appear
in the context window around the target term immigr
(we
could also allow this set to incorporate the entire corpus or all
features in the pretrained embeddings).
target_toks <- tokens_context(x = toks_nostop, pattern = "immigr*", window = 5L)
## 378 instances of "immigrati" found.
## 19 instances of "immigrato" found.
## 280 instances of "immigrazione" found.
## 15 instances of "immigrazioni" found.
feats <- featnames(dfm(target_toks))
# nearest neighbors: features with the highest cosine-similarity with each group embedding
# ---------------------------------
# by government vs. opposition
target_nns <- get_nns(x = target_toks, N = 10,
groups = docvars(target_toks, 'government'),
candidates = feats, # restrict to candidates in context window
pre_trained = word_vectors,
transform = TRUE,
transform_matrix = local_fasttext,
bootstrap = F) %>%
lapply(., "[[",2) %>%
do.call(rbind, .) %>%
as.data.frame()
target_nns[, 1:5]
## V1 V2 V3 V4
## 1 dell'immigrazione all'immigrazione richiedenti immigrazione
## 0 dell'immigrazione richiedenti all'immigrazione immigrazione
## V5
## 1 emergenziale
## 0 l'immigrazione
# ---------------------------------
# across months
target_nns <- get_nns(x = target_toks, N = 10,
groups = docvars(target_toks, 'Moy'),
candidates = feats,
pre_trained = word_vectors,
transform = TRUE,
transform_matrix = local_fasttext,
bootstrap = F) %>%
lapply(., "[[",2) %>%
do.call(rbind, .) %>%
as.data.frame() %>%
tibble::rownames_to_column(var = "Moy") %>%
arrange(lubridate::ym(Moy))
target_nns[19:25, 1:5]
## Moy V1 V2 V3 V4
## 19 2015-07 richiedenti emergenziale richiedente chiediamo
## 20 2015-08 richiedenti richiedente lavoratori migranti
## 21 2015-09 richiedenti richiedente emergenziale pregiudiziale
## 22 2015-10 dell'immigrazione immigrazione all'immigrazione migranti
## 23 2015-11 ventimiglia francia invadere respingere
## 24 2015-12 incentiva previdenziali emergenziale sostenibile
## 25 2016-01 richiedenti richiedente immigrazione migranti
We evaluate the trend in semantic differences across government and
opposition parties around the 2015 refugee crisis using embedding
regression. conText::conText()
uses ALC embeddings within a
regression-style framework, i.e. it allows to examine covariate effects
on embeddings beyond discrete group variables or while controlling for
other covariates.
set.seed(2021L)
models <- lapply(unique(docvars(target_toks, 'months')), function(j){
conText(formula = . ~ government,
data = tokens_subset(target_toks, months == j),
pre_trained = word_vectors,
transform = TRUE,
transform_matrix = local_fasttext,
stratify = T,
jackknife = T,
# bootstrap = T,
permute = TRUE,
num_permutations = 100,
hard_cut = F,
window = 5,
case_insensitive = TRUE,
verbose = T)
})
## total observations included in regression: 175
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 0.8911084 0.1999593 0.4964504 1.285766 0.05
## total observations included in regression: 66
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 1.34187 0.3037269 0.7352853 1.948454 0.06
## total observations included in regression: 130
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 0.8568327 0.170718 0.5190629 1.194602 0.02
## total observations included in regression: 34
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 2.061913 0.5126719 1.018875 3.104952 0
## total observations included in regression: 77
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 1.352791 0.2647936 0.8254088 1.880173 0.01
## total observations included in regression: 44
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 1.350581 0.3374869 0.6699739 2.031188 0.05
## total observations included in regression: 103
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 1.386223 0.2107045 0.9682915 1.804154 0
## total observations included in regression: 63
## starting permutations
## done with permutations
## Note: These values are not regression coefficients. Check out the Quick Start Guide for help with interpretation:
## https://github.com/prodriguezsosa/conText/blob/master/vignettes/quickstart.md
##
## coefficient normed.estimate std.error lower.ci upper.ci p.value
## 1 government 1.378633 0.3288502 0.7212713 2.035996 0.09
plot_tibble <- lapply(models, function(i) i@normed_coefficients) %>%
do.call(rbind, .) %>%
mutate(period = factor(seq(1, 8), labels = c("2014-01/06", "2014-07/12",
"2015-01/08", "2015-09/12",
"2016-01/06", "2016-07/12",
"2017-01/06", "2017-07/12")))
ggplot(data = plot_tibble,
aes(x = period,
y = normed.estimate)) +
geom_point() +
geom_errorbar(aes(ymin = lower.ci,
ymax = upper.ci),
width = 0.5) +
geom_vline(xintercept = 3.5, linetype = "dashed") +
labs(x = "",
title = "Norm of Difference between Government and Opposition ALC embeddings of 'immigr*'",
y = TeX("Norm of $\\hat{\\beta}$"))+
theme_bw()
Another exploratory exercise is to compute the cosine similarity
ratio between group embeddings and features using
conText::get_nns_ratio()
(a wrapper function for
conText::nns_ratio()
). Given ALC embeddings for two groups,
get_nns_ratio() first computes the similarity between a feature and each
group embedding for any given feature, and then takes the ratio of these
two similarities.
This ratio captures how “discriminant” a feature is of a given group. Values larger (smaller) than 1 mean the feature is more (less) discriminant of the group in the numerator (denominator). Use the numerator argument to define which group represents the numerator in this ratio. If N is defined, this ratio is computed for the union of the top N nearest neighbors.
plotfun <- function(period){
temp <- tokens_subset(target_toks, months==period)
feats <- featnames(dfm(target_toks))
docvars(temp)$Government = ifelse(docvars(temp)$government==1, "Government", "Opposition")
set.seed(111)
target_nns_ratio <- get_nns_ratio(x = temp,
N = 10,
groups = docvars(temp, 'Government'),
numerator = "Government",
candidates = feats,
pre_trained = word_vectors,
transform = TRUE,
transform_matrix = local_fasttext,
bootstrap = T,
num_bootstraps = 100,
permute = TRUE,
num_permutations = 100,
verbose = T)
return(target_nns_ratio)
}
out_before <- plotfun(period = 2) # Jan - Aug 2015
## starting bootstraps
## done with bootstraps
## starting permutations
## done with permutations
## NOTE: values refer to the ratio Government/Opposition.
out_before
## # A tibble: 14 × 7
## feature value std.error lower.ci upper.ci p.value group
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 consapevoli 1.10 0.0727 0.968 1.22 0.17 Government
## 2 dell'attualità 1.04 0.0700 0.914 1.15 0.64 Government
## 3 richiedente 1.03 0.0945 0.895 1.19 0.75 Government
## 4 legalità 0.997 0.0792 0.888 1.11 0.95 Government
## 5 riteniamo 0.972 0.0771 0.816 1.09 0.78 shared
## 6 richiedenti 0.967 0.0821 0.843 1.11 0.73 shared
## 7 chiediamo 0.961 0.0874 0.790 1.10 0.61 shared
## 8 emergenziale 0.953 0.0821 0.833 1.08 0.59 shared
## 9 all'immigrazione 0.937 0.0857 0.801 1.08 0.45 shared
## 10 umanitari 0.902 0.0718 0.789 1.04 0.2 Opposition
## 11 dell'immigrazione 0.901 0.0836 0.768 1.05 0.27 shared
## 12 immigrazione 0.894 0.0899 0.756 1.04 0.34 Opposition
## 13 migranti 0.866 0.0870 0.729 1.02 0.16 Opposition
## 14 extracomunitari 0.853 0.0903 0.693 1.01 0.17 Opposition
plot_nns_ratio(x = out_before, alpha = 0.05, horizontal = F)
out_after <- plotfun(period = 3) # Sep - Dec 2015
## starting bootstraps
## done with bootstraps
## starting permutations
## done with permutations
## NOTE: values refer to the ratio Government/Opposition.
out_after
## # A tibble: 19 × 7
## feature value std.error lower.ci upper.ci p.value group
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 destabilizzazione 1.43 0.232 1.12 1.83 0.07 Government
## 2 integrazione 1.32 0.209 1.00 1.69 0.13 Government
## 3 d'integrazione 1.27 0.201 1.01 1.65 0.19 Government
## 4 regolamentazione 1.23 0.210 0.910 1.52 0.35 Government
## 5 normativa 1.22 0.202 0.916 1.53 0.39 Government
## 6 emergenziale 1.19 0.191 0.874 1.50 0.41 Government
## 7 bossi-fini 1.19 0.193 0.948 1.52 0.42 Government
## 8 previdenziali 1.11 0.165 0.846 1.41 0.7 Government
## 9 schengen 0.993 0.187 0.674 1.27 1 Government
## 10 dell'immigrazione 0.875 0.135 0.648 1.08 0.56 shared
## 11 all'immigrazione 0.857 0.114 0.665 1.02 0.42 Opposition
## 12 l'immigrazione 0.854 0.128 0.644 1.02 0.5 Opposition
## 13 immigrazione 0.842 0.126 0.638 1.04 0.45 Opposition
## 14 immigrazioni 0.751 0.112 0.590 0.951 0.27 Opposition
## 15 richiedenti 0.663 0.0788 0.533 0.792 0.03 Opposition
## 16 extracomunitari 0.660 0.0957 0.466 0.802 0.03 Opposition
## 17 migranti 0.645 0.0882 0.479 0.773 0.01 Opposition
## 18 chiediamo 0.555 0.107 0.374 0.711 0.01 Opposition
## 19 cittadini 0.529 0.0809 0.404 0.643 0 Opposition
plot_nns_ratio(x = out_after, alpha = 0.05, horizontal = F)
We now validate the performance of our pretrained ALC resources, comparing it against locally trained quantities.
# ---------------------------------
# pretrained embeddings + local A
toks_fcm <- fcm(toks_nostop, context = "window", window = 5, count = "frequency")
localA <- conText::compute_transform(x = toks_fcm, pre_trained = word_vectors, weighting = 'log')
# ---------------------------------
# local embeddings + local A
# now with both glove and A locally
# library(text2vec)
# estimate glove model using text2vec
# glovelocal <- GlobalVectors$new(rank = 300,
# x_max = 100,
# learning_rate = 0.05)
# wv_main <- glovelocal$fit_transform(toks_fcm, n_iter = 10,
# convergence_tol = 1e-3,
# n_threads = parallel::detectCores()) # set to 'parallel::detectCores()' to use all available cores
#
# wv_context <- glovelocal$components
# locallocal_glove <- wv_main + t(wv_context) # word vectors
# saveRDS(locallocal_glove, "localglove_nostops_italianparliament.rds")
# read in local A
locallocal_glove <- readRDS("localglove_nostops_italianparliament.rds")
locallocalA <- compute_transform(x = toks_fcm, pre_trained = locallocal_glove, weighting = 'log')
#------------------------------------------------------------------------------#
# Nearest Neighbors
#------------------------------------------------------------------------------#
immig_toks <- tokens_context(x = toks_nostop, pattern = "immigrazione", window = 5L)
## 280 instances of "immigrazione" found.
feats <- featnames(dfm(immig_toks))
immig_dfm <- dfm(immig_toks)
###########################
## with local GloVe
# GloVE
nns_localglove <- find_nns(locallocal_glove['immigrazione',],
pre_trained = locallocal_glove,
N = 10,
candidates = feats)
nns_localglove
## [1] "immigrazione" "reato" "clandestina" "terrorismo" "parlando"
## [6] "unico" "riferimento" "l'altro" "previsto" "tortura"
# GloVe ALC
immig_dem_local <- dem(x = immig_dfm,
pre_trained = locallocal_glove,
transform = TRUE,
transform_matrix = locallocalA,
verbose = TRUE)
# take the column average to get a single "corpus-wide" embedding
immig_wv_local <- colMeans(immig_dem_local)
# find nearest neighbors for overall ALC embedding
nns_localglove_alc <- find_nns(immig_wv_local,
pre_trained = locallocal_glove,
N = 10,
candidates = feats)
nns_localglove_alc
## [1] "quindi" "fatto" "infatti" "però" "solo" "così" "proprio"
## [8] "poi" "invece" "parte"
##################################
# with our FT quantities
nns_ft <- find_nns(word_vectors['immigrazione',],
pre_trained = word_vectors,
N = 10,
candidates = feats[feats %in% rownames(word_vectors)])
nns_ft
## [1] "immigrazione" "dell'immigrazione" "emigrazione"
## [4] "all'immigrazione" "immigrati" "migranti"
## [7] "emigrati" "criminalità" "rifugiati"
## [10] "esodo"
immig_dem_local <- dem(x = immig_dfm,
pre_trained = word_vectors,
transform = TRUE,
transform_matrix = local_fasttext,
verbose = TRUE)
# take the column average to get a single "corpus-wide" embedding
immig_wv_local <- colMeans(immig_dem_local)
# find nearest neighbors for overall embedding
nns_ftalc <- nns(x = immig_wv_local,
N = 10,
candidates = feats,
pre_trained = word_vectors,
stem = F,
as_list = FALSE,
show_language = FALSE)
## Warning in nns(x = immig_wv_local, N = 10, candidates = feats, pre_trained =
## word_vectors, : the following canidates do not appear to have an embedding in
## the set of pre-trained embeddings provided: vergogniamo, soffermarmi,
## cardiello, ricordiamoci, credetemi, nell'emendamento, subemendamento, segnalo,
## assumete, piangiamo, rinviabili, esodati, recuperiamo, confrontarci,
## quest'aula, diciamolo, lasciatemelo, risolviamo, rimandiamo, serissimo, citavo,
## sottoscriviamo, quest'assemblea, concludo
nns_ftalc
## # A tibble: 10 × 4
## target feature rank value
## <lgl> <chr> <int> <dbl>
## 1 NA depenalizzazione 1 0.640
## 2 NA reato 2 0.616
## 3 NA bossi-fini 3 0.616
## 4 NA pregiudiziale 4 0.602
## 5 NA dell'immigrazione 5 0.597
## 6 NA criminalizzare 6 0.590
## 7 NA depenalizzare 7 0.581
## 8 NA terrorismo 8 0.579
## 9 NA penale 9 0.578
## 10 NA dell'illecito 10 0.575
knitr::kable(data.frame(
nns_ft,
nns_ftalc$feature,
nns_localglove,
nns_localglove_alc),
format = "simple",
booktabs = T,
linesep = "",
col.names = c("our fT", "our fT-ALC", "local GloVe", "local GloVe-ALC")
)
our fT | our fT-ALC | local GloVe | local GloVe-ALC |
---|---|---|---|
immigrazione | depenalizzazione | immigrazione | quindi |
dell’immigrazione | reato | reato | fatto |
emigrazione | bossi-fini | clandestina | infatti |
all’immigrazione | pregiudiziale | terrorismo | però |
immigrati | dell’immigrazione | parlando | solo |
migranti | criminalizzare | unico | così |
emigrati | depenalizzare | riferimento | proprio |
criminalità | terrorismo | l’altro | poi |
rifugiati | penale | previsto | invece |
esodo | dell’illecito | tortura | parte |
our fT | our fT-ALC | local GloVe | local GloVe-ALC |
---|---|---|---|
immigration | decriminalization | immigration | therefore |
of immigration | crime | crime | fact |
emigration | Bossi-Fini | illegal (undocumented) | in fact |
to immigration | prejudicial | terrorism | however |
immigrants | of immigration | speaking | only |
migrants | to criminalize | unique | so |
emigrants | to decriminalize | reference | own |
crime | terrorism | the other | then |
refugees | criminal | foreseen | instead |
exodus | of the illegal | torture | part |
Our parliamentary corpus is too small to train high-quality embeddings and the corresponding ALC transformation matrix locally. However, our pretrained quantities seem to get the “job done.”