Paper labelling was an exercise we categorised papers published in the journal of Global Environmental Change (GEC) in order to gain further insights into the types of papers that were published in the journal and to aid the interpretation of other analyses we employed. The papers were labelled, or categorised, in a bottom-up approach in which categories were established by reading a number of papers from the journal.
Initially we developed a system with seven categories, but these proved either too specific or too closely related, resulting in poor agreement when we categorised papers independently. Thus the number of categories was reduced and their description broadened. The final set included four categories: 1) Empirical, 2) Policy discussion, 3) Research agenda and Research Framework and 4) Other papers. Using this framework we have achieved a reasonable agreement rate (76.6%) between two researchers. Continue Reading
Topic modelling is a machine learning technique that identifies topics in a given corpus. We assume that a document consists of multiple topics with varying probability, and topic modelling estimates the distribution of topic probability in each document. From a topic model, we can extract keywords of each topic, as well as the distribution of topics in each document.
We ran topic modelling on our corpus consisting of 11 journals. The basic unit of the analysis was paragraph, and multiple paragraphs constituted a text that topic models were built from. We targeted paragraphs and not papers because each paper can consist of multiple topics, and it would be interesting to investigate the transition of topics within papers. Continue Reading
In order to investigate what linguistic features distinguish between monodisciplinary and interdisciplinary journal discourses, we employed Biber’s (1988) multidimensional (MD) analysis. This well-known technique in corpus linguistics investigates quantitative correlations between language features in texts and discloses functional similarities and differences between corpora. The corpus used for this study consists of all the research papers published in 11 journals over 10 years, which were provided by our partners, Elsevier publishers. Continue Reading