In the previous post we have presented one part of our topic modelling exercise in which we have investigated which of the 100 identified topics were represented in the journal of Global Environmental Change (GEC) more than any other journal. These topics represent the distinctive aspects of the journal and analysing their distribution over the years will allows us to investigate whether and how the journal has changed over the years. By correlating the results of our labelling exercise and multi-dimensional analysis we can observe in what manner, or which type of papers and contexts, these topics are usually discussed. More importantly, as these topics are computationally calculated groups of words, identifying the contexts in which these topics occur in will allow us to interpret the topics themselves. Continue Reading
We have conducted a topic modelling analysis on our corpus of 11 academic journals and created a model with 100 potential ‘topics’. Topics in this sense are collections of words and do not necessarily represent content topics in the traditional sense, like ‘environment protection’ for example. Rather, these topics are groups of words that statistically tend to co-occur in the same paragraphs. Continue Reading
Topic modelling is a machine learning technique that identifies topics in a given corpus. We assume that a document consists of multiple topics with varying probability, and topic modelling estimates the distribution of topic probability in each document. From a topic model, we can extract keywords of each topic, as well as the distribution of topics in each document.
We ran topic modelling on our corpus consisting of 11 journals. The basic unit of the analysis was paragraph, and multiple paragraphs constituted a text that topic models were built from. We targeted paragraphs and not papers because each paper can consist of multiple topics, and it would be interesting to investigate the transition of topics within papers. Continue Reading