0

Interim seminar: topic modelling

origami-212777_1280

Topic modelling is a machine learning technique that identifies topics in a given corpus. We assume that a document consists of multiple topics with varying probability, and topic modelling estimates the distribution of topic probability in each document. From a topic model, we can extract keywords of each topic, as well as the distribution of topics in each document.

We ran topic modelling on our corpus consisting of 11 journals. The basic unit of the analysis was paragraph, and multiple paragraphs constituted a text that topic models were built from. We targeted paragraphs and not papers because each paper can consist of multiple topics, and it would be interesting to investigate the transition of topics within papers. Continue Reading

0

Interim seminar: multidimensional analysis

rubik-1

In order to investigate what linguistic features distinguish between monodisciplinary and interdisciplinary journal discourses, we employed Biber’s (1988) multidimensional (MD) analysis. This well-known technique in corpus linguistics investigates quantitative correlations between language features in texts and discloses functional similarities and differences between corpora. The corpus used for this study consists of all the research papers published in 11 journals over 10 years, which were provided by our partners, Elsevier publishers. Continue Reading

0

Linked with the Sketch Engine

Today we have established an official partnership with the Sketch Engine. Thanks to the efforts of Adam Kilgarriff and Dr Paul Thompson, IDRD project is officially linked with the Sketch Engine, which will allow  us to undertake a variety of different dorpus investigations. The results of our preliminary analyses will be updated soon!