The Icelandic Gigaword Corpus
Abstract
In may 2018 a new text corpus, The Icelandic Gigaword Corpus, was launched. The first version of the corpus contains over 1.2 billion running words, PoS-tagged and lemmatized. Texts will be collected continually for the corpus and a new version published every year. Although the corpus is tailored for use in language technology and linguistic research, it can also be very useful for students of linguistics. It is accessible in a variety of ways. It can be searched in a graphical search interface, powered by Korp. N-grams can be compared in an n-gram viewer and the corpus is available for download with permissive licenses.
Copyright (c) 2019 Steinþór Steingrímsson
![Creative Commons License](http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.