Diskursusemarker (ma) arvan (et); pp. 63–90

Tiit Hennoste, Külli Habicht, Helle Metslang, Külli Prillop, Kirsi Laanesoo, David Ogren, Liina Pärismaa, Elen Pärt, Andra Rumm, Andriela Rääbis, Carl Eric Simmul


The discourse marker (ma) arvan (et) ’i think’

The article analyzes the usage of variants of the discourse marker (ma) arvan (et) in different registers of Estonian. The use of this marker is compared in 15 text groups, which can be grouped into three registers: printed texts, online texts and spoken texts. The research material comes from the corpora available on Keeleveeb as well as corpora of spoken language and online conversations and contains a total of 2572 usage instances. We seek answers to three research questions:

         – does the frequency of the marker differ by text group and/or register?
         – how does the frequency of variants of the marker differ according to syntactic position in different text groups and/or registers?
         – are different variants of the marker associated with different text groups and/or registers?

Our analysis shows that there are four variants of the marker in use: ma arvan, ma arvan et, arvan, and arvan et, and registers and text groups differ in terms of the frequency of the marker, its typical position in the sentence, and its typical form.  Based on the usage of the marker (ma) arvan (et), texts can be divided into four groups.
On one end of the scale lies the spoken language register. The frequency of the marker is substantially higher in spoken language than in other groups, and almost exclusively variants including the pronoun ma are used. Moreover, variants without the conjunction et are more common in spoken language than elsewhere. It is common for the verb and et to be pronounced together, such that the marker is in itself a prosodic whole. The marker often appears non-sentence-initially and is almost always prosodically integrated with the rest of the sentence.
The other end of the scale is formed by non-fiction printed texts, i.e. journalism and (popular) science. These groups are characterized by the sentence-initial position of the marker, the high frequency of pro-drop variants (i.e. without the subject pronoun ma ‘I’), and variants in which the orthographic boundary before the word et (marked by a comma) has been preserved. Fiction texts represent a middle ground, and are distinguished from other printed texts by the structure of the marker; similarly to spoken language, fiction texts feature more use of ma and less use of et.
Online texts form a heterogeneous group with vague internal boundaries. One identifiable subgroup is spontaneous real-time online conversations, in which the marker appears very frequently (as in spoken language), but its usage is peculiar. Next to this group on the scale are chat rooms. Generally, these groups behave similarly to spoken language. On the other end of the scale are news groups, which are most similar to printed texts. Forum texts and comments form a heterogeneous intermediate range.
In summary, the differences in marker usage in different text groups are related to numerous factors, among which are spoken/written, processuality, dialogicity, spontaneity/editedness, the personal/impersonal nature of the interaction, and the influence of the rules of the standard language.



