Investigating Global Environmental Change: topic modelling (part one)


We have conducted a topic modelling analysis on our corpus of 11 academic journals and created a model with 100 potential ‘topics’. Topics in this sense are collections of words and do not necessarily represent content topics in the traditional sense, like ‘environment protection’ for example. Rather, these topics are groups of words that statistically tend to co-occur in the same paragraphs.

Seven of these 100 topics are mainly associated with the journal of Global Environmental Change (GEC), which means that these topics are less likely to occur in papers from journals other than GEC. We have analysed the distribution of these topics across the 20 years of the journal, as well as the distribution within particular papers. This has allowed us to observe whether the topics are in the general scope of the journal or limited to a particular time span. Furthermore, the analysis has also allowed us to determine the position within individual papers where the topic most often occur, e.g. at the beginning or at the end of the paper.

In the process of topic modelling, all the words are ‘stemmed’ which means that third person suffix ‘-s’ or plural suffixes ‘-s’, ‘-es’, nominalisation suffixes ‘-ment’, ‘-ing’, etc. are removed from the words which are then all treated as the same; e.g. govern, governed, government. As the topics are automatically identified, they are interpreted post hoc. For example, ‘Topic 7’ is a list of 20 words which include the following: countries, development, global, world, nation, economic, industries, international, million, year, population, growth, environment, import, major, trade, manifold*, decades, centuries, economies. Compared with the results of our labelling exercise, we can observe that most of the papers strong in this topic are the type of papers labelled as ‘Policy Discussion’, and to some extent ‘Research Agenda’. Thus, we have interpreted ‘Topic 7’ as ‘International Development’ since the included words and the high scoring papers relate to that issue.

The other 6 topics strongly associated with GEC are as follows:

Topic 50: will, can, may, need, requir, must, target, howev, limit, possibl, current, future, like, make, becom, potenti, necessari, provid, exist, exampl

Topic 58: research, paper, section, discuss, focus, approach, work, develop, analysi, issu, understand, framework, scienc, literatur, process, address, knowledg, review, studi, provid

Topic 60: govern, policies, develop, public, state, nation, local, institut, plan, issu, project, intern, agenc, polit, implement, communiti, environment, program, fund, particip

Topic 81: chang, climat, scenario, adapt, impact, vulner, futur, global, assess, capac, project, polici, uncertainti, respons, will, current, region, rise, warm, ipcc (Intergovernmental Panel on Climate Change)

Topic 86: emiss, carbon, reduct, gas, reduc, estim, greenhous, atmospher, sourc, contribut, year, sequestr, total, inventori, potenti, gase, methan, ghg (greenhouse gases), result, emit

Topic 96: environment, risk, natur, resourc, manag, environ, human, ecolog, sustain, ecosystem, health, protect, conserv, impact, concern, develop, biodivers, potenti, need, includ

The interpretation of these topics will be covered in the following blog post, where we will discuss their correlation with the labelling exercise as well as their distribution within papers and across the 20 years of the journal in detail.

*this is a case of a false-positive suffix removal; however, these are usually fairly rare and do not influence the results.

Leave a Reply

Your email address will not be published. Required fields are marked *