Topic modelling is a machine learning technique that identifies topics in a given corpus. We assume that a document consists of multiple topics with varying probability, and topic modelling estimates the distribution of topic probability in each document. From a topic model, we can extract keywords of each topic, as well as the distribution of topics in each document.
We ran topic modelling on our corpus consisting of 11 journals. The basic unit of the analysis was paragraph, and multiple paragraphs constituted a text that topic models were built from. We targeted paragraphs and not papers because each paper can consist of multiple topics, and it would be interesting to investigate the transition of topics within papers. Continue Reading