GEC corpus

The GEC corpus is finally complete! The corpus consists of 569 original research articles published in the Global Environmental Change from 1990 to 2010. This amounts to 3.7 million words tokens and, although we are very happy with this achievement, this is only 1 out of 11 journals we are comparing in our analysis. These other 10 journals will represent 5 discipline specific ¬†and 5 other interdisciplinary journals. At the moment, the team at Elsevier are working hard on identifying these 10 journals from which we’ll compile the full corpus. Thus we expect our final corpus to be between 30-40 million word tokens, which for a corpus linguistic analysis is a massive amount of data.

Leave a Reply

Your email address will not be published. Required fields are marked *