Textasöfn og setningagerð: greining og leit

  • Eiríkur Rögnvaldsson Háskóli Íslands
Keywords: text corpora, example sentences, PoS tagging

Abstract

This paper discusses the use of text corpora in syntactic research, and how to search for example sentences in corpora. During the past few decades, widely divergent views have been expressed as to the value of corpora in syntactic argumentation. It is argued in the paper that this disagreement stems from different views as to the subject of linguistic research. The paper also discusses various problems that arise in the interpretation of the information extracted from corpora – especially in drawing conclusions from the silence of the texts on certain constructions. The main section of the paper discusses the possibilities of searching for certain syntactic constructions in different types of Icelandic corpora; raw untagged text, PoS tagged text, and text where the major syntactic constituents and syntactic functions have been identified. Data-driven PoS taggers have now been trained on Icelandic texts, and it is shown that due to the inflectional character of Icelandic and the richness of the tagset, the re- sulting PoS tagging is very effective in the search for various syntactic constructions.

Published
2020-07-26
Section
Non-refereed Short Papers