Working Paper


The past few years were marked by increased online offensive strategies perpetrated by state and non-state actors to promote their political agenda, sow discord and question the legitimacy of democratic institutions in the US and Western Europe. In 2016 the US congress identified a list of Russian state-sponsored Twitter accounts that were used to try to divide voters on a wide range of issues. Previous research used Latent Dirichlet Allocation (LDA) to estimate latent topics in data extracted from these accounts. Howerver, LDA is has characteristics that may pose significant limitations to be used in data from social media: the number of latent topics must be specified by the user, interpretability can be difficult to achieve, and it doesn’t model short-term temporal dynamics. In the current paper we propose a new method to estimate latent topics in texts from social media termed Dynamic Exploratory Graph Analysis (DynEGA). We compare DynEGA and LDA in a Monte-Carlo simulation in terms of their capacity to estimate the number of simulated latent topics. Finally, we apply the DynEGA method to a large dataset with Twitter posts from state-sponsored right- and left-wing trolls during the 2016 US presidential election. The results show that DynEGA is substantially more accurate to estimate the number of simulated topics than several different LDA algorithms. Our empirical example shows that DynEGA revealed topics that were pertinent to several consequential events in the election cycle, demonstrating the coordinated effort of trolls capitalizing on current events in the U.S. This demonstrates the potential power of our approach for revealing temporally relevant information from qualitative text data.


Exploratory Graph Analysis (EGA) has emerged as a popular approach for estimating the dimensionality of multivariate data using psychometric networks. Sampling variability, however, has made reproducibility and generalizability a key issue in network psychometrics. To address this issue, we have developed a novel bootstrap approach called Bootstrap Exploratory Graph Analysis (bootEGA). bootEGA generates a sampling distribution of EGA results where several statistics can be computed. Descriptive statistics (median, standard error, and dimension frequency) provide researchers with a general sense of the stability of their empirical EGA dimensions. Structural consistency estimates how often dimensions are replicated exactly across the bootstrap replicates. Item stability statistics provide information about whether dimensions are unstable due to misallocation (e.g., item placed in the wrong dimension), multidimensionality (e.g., item belonging to more than one dimension), and item redundancy (e.g., similar semantic content). Using a Monte Carlo simulation, we determine guidelines for acceptable item stability. After, we provide an empirical example that demonstrates how bootEGA can be used to identify structural consistency issues (including a fully reproducible R tutorial). In sum, we demonstrate that bootEGA is a robust approach for identifying the stability and robustness of dimensionality in multivariate data.