The Icelandic Gigaword Corpus

Authors

  • Steinþór Steingrímsson The Árni Magnússon Institute for Icelandic Studies Author

DOI:

https://doi.org/10.33112/ordogtunga.21.9

Abstract

In may 2018 a new text corpus, The Icelandic Gigaword Corpus, was launched. The first version of the corpus contains over 1.2 billion running words, PoS-tagged and lemmatized. Texts will be collected continually for the corpus and a new version published every year. Although the corpus is tailored for use in language technology and linguistic research, it can also be very useful for students of linguistics. It is accessible in a variety of ways. It can be searched in a graphical search interface, powered by Korp. N-grams can be compared in an n-gram viewer and the corpus is available for download with permissive licenses.

Published

2019-08-15

Issue

Section

Language News