Tölvutækur merkingarbrunnur fyrir íslenska máltækni

Grunnur lagður að því að tölvur skilji merkingu í íslenskum textum

  • Anna B. Nikulásdóttir Háskóli Íslands
Keywords: semantic database, wordnet, language technology, semantic relations, semantic information

Abstract

This article describes the work on a semantic database for Icelandic language technology. The database is being developed using a monolingual approach with automatic methods for the extraction of semantic information from texts. Both pattern based and statistical methods are used, as well as a hybrid methodology. The database already contains about 134,000 words, primarily nouns, and more than one million relations. The number of relations might change during the last stage of the development which consists of automatically validating the results. This will be done e.g. by using results of one extraction method to support or reject the results of another.

The structure of the database is not based on hierarchies, like for example the Princeton WordNet, but rather on clusters of strongly related words and semantic relations often describing common sense knowledge and associations. 

After release, in the beginning of 2012, the database will be freely available.

Published
2020-07-16
Section
Peer-reviewed Articles