Hvert á að sækja orðaforðann í orðabók?
Abstract
The topic of this paper is different methods in assembling a list of headwords for Icelandic dictionaries. Until recently, the classic Icelandic dictionary (Íslensk orðabók) originally published in 1963, has been the primary source of data of the vocabulary for Icelandic lexicographers, with the collections at the department of lexicography of The Árni Magnússon Institute for Icelandic Studies as additional sources. With the advent of electronic text collections, notably the tagged Icelandic corpus (Mörkuð íslensk málheild, MÍM), Icelandic lexicographers now have access to a huge new source of data. The use of data on frequency from these new sources is of great value to lexicographers in the choice of headwords, but the complete coverage of the words and word forms in the texts made possible by new methods of word extraction also complements older material by filling accidental gaps in material assembled by older methods of dictionary excerption. Partly because of data scarcity problems in a language with a very rich morphology, the conclusion in the paper is that word frequency alone cannot be the base of an Icelandic dictionary at this point in time, as the volume of texts needed for a good coverage of the vocabulary far exceeds anything available now. One of the main reasons for this is a very productive system of compounding, making it necessary to split compounds before attempting to use frequency for the selection of vocabulary.