EAP - Trames Publications

PUBLISHED
SINCE 1997

TRAMES. A Journal of the Humanities and Social Sciences

ISSN 1736-7514 (Electronic)
ISSN 1406-0922 (Print)

Open Access Journal

CiteScore: 0.8

Impact Factor (2022): 0.2

WORD LENGTH IN ESTONIAN PROSE; pp. 145–175

PDF | DOI: 10.3176/tr.2016.2.03

Author

Peter Grzybek

Abstract

The present study deals with the problem of word length in Estonian prose. As is well known from quantitative and synergetic linguistics, word length is no isolated phenomenon; rather, it stands in close interrelations with word frequency, sentence and syllable length, and others, resulting in language as a dynamically balanced system. Moreover, the frequency with which words of a given length occur is no haphazard or chaotic phenomenon, but organized regularly, in a law-like manner. In this respect, the necessarily interdisciplinary approach to this issue may not only be helpful for analogical studies in other fields as well; it may also help to bridge the gap between what is usually juxtaposed in terms of ‘soft’ vs. ‘hard’, ‘human’ vs. ‘natural’ sciences, and the like. Since the results to be obtained quite obviously depend upon a number of various factors – e.g., the definition of ‘word’ itself, as well as of its constituting elements, the choice of a paradigmatic vs. syntagmatic approach (i.e. of dictionary vs. text material), the study of lemmas vs. word forms, etc. – relevant theoretical linguistic aspects are initially discussed, before the linguistic material to be investigated is presented: on the whole, five novels from modern Estonian authors (Pärtel Ekman, Jaan Kross, Reet Kudu, Viivi Luik) are analysed, chapter per chapter, summing up to an amount of ca. ¼ million words, or ca. 20,000 sentences. As a result, the (discrete) Zipf-Alekseev distribution turns out to be an excellent model for word length frequencies of Estonian prose texts, what paves the way for future studies in various perspectives: generally speaking, the result allows for a qualitative interpretation in terms of a diversification process; more concretely, a solid basis is provided, not only for further intra-lingual studies of Estonian (including factors such as different discourse types, author-specific styles, periods of language development, etc.), but also for systematic comparative inter-lingual studies (including language specifics, parameter interpretation, etc.).

References

Alekseev, Pavel M. (1978) “O nelinejnych formulirovkach zakona Cipfa”. Voprosy kibernetiki 41; 53–65.

Altmann, Gabriel (2013) “Aspects of word length”. In Reinhard Köhler and Gabriel Altmann, eds. Issues in quantitative linguistics, 23–38. Lüdenscheid: RAM.

Bartens, Hans-Hermann and Karl-Heinz Best (1996) “Wortlängen in estnischen Texten”. Ural-Altaische Jahrbücher N.F. 14, 112–128.

Buk, Solomija, Ján Mačutek, and Andrij Rovenchak (2008) “Some properties of the Ukrainian writing system”. Glottometrics 16, 63–79.

Dixon, Robert M.W. and Alexandra Y. Aikhenwald (2002) “Word: a typological framework”. In R M. W. Dixon and Alexandra Y. Aikhenwald, eds. Word: a cross-linguistic typology, 1–41. Cambridge: Cambridge University Press.

Grzybek, Peter and Ernst Stadlober (2002) “The Graz project on word length (frequencies). Project report”. Journal of Quantitative Linguistics 9, 187–192.
http://dx.doi.org/10.1076/jqul.9.2.187.8486

Grzybek, Peter (2006) “History and methodology of word length studies: the state of the art”. In Peter Grzybek, ed. Contributions to the science of text and language: word length studies and related issues, 15–90. (Text, Speech and Language Technology, 31.) Dordrecht: Springer.
http://dx.doi.org/10.1007/1-4020-4068-7_2

Grzybek, Peter (2013) “Homogeneity and heterogeneity within language(s) and text(s): theory and practice of word length modeling”. In Reinhard Köhler and Gabriel Altmann, eds. Issues in quantitative linguistics 3. Dedicated to Karl-Heinz Best on the occasion of his 70th birthday, 66–99. Lüdenscheid: RAM.

Grzybek, Peter (2014) “Regularities of Estonian proverb word length: frequencies, sequences, dependencies.” In Anneli Baran, Liisi Laineste, Piret Voolaid, eds. Scala naturae. Festschrift in Honour of Arvo Krikmann, 121–148. Tartu: ELM Scholarly Press.

Grzybek, Peter (2015) “Word length.” In John R. Taylor, ed. The Oxford handbook of the word, 89–119. Oxford: Oxford University Press.

Grzybek, Peter (2016 ) “Vana kannel and Kalevipoeg: word length variation and verse type frequencies”. To be published in: Studia Metrica et Poetica.

Grzybek, Peter and Veronika Koch (2012) “Shot length: random or rigid, choice or chance? An analysis of Lev Kulešov’s Po zakonu [By the law].” In Ernest W. B. Hess-Lüttich, ed. Sign culture. Zeichen Kultur, 169–188. Würzburg: Königshausen & Neumann.

Grzybek, Peter, Ernst Stadlober, Emmerich Kelih, and Gordana Antić (2005) “Quantitative text typology: the impact of word length”. In Claus Weihs and Wolfgang Gaul, eds. Classification: the ubiquitous challenge, 53–64. Heidelberg, New York: Springer.
http://dx.doi.org/10.1007/3-540-28084-7_5

Julien, Marit. (2006) “Word”. In Encyclopedia of language and linguistics, 617–624. Keith Brown, ed. Amsterdam: Elsevier.
http://dx.doi.org/10.1016/b0-08-044854-2/00130-9

Kelih, Emmerich, Gordana Antić, Peter Grzybek, and Ernst Stadlober (2005) “Classification of author and/or genre? The impact of word length.” In Claus Weihs and Wolfgang Gaul, eds. Classification: the ubiquitous challenge, 498–505. Heidelberg, New York: Springer.
http://dx.doi.org/10.1007/3-540-28084-7_58

Köhler, Reinhard (2005) “Synergetic Linguistics”. In Reinhard Köhler, Gabriel Altmann, and Rajmund G. Piotrowski, eds. Quantitative Linguistik. Quantitative linguistics. Ein Internationales Handbuch. An international handbook, 760–774. Berlin, New York: de Gruyter.

Krikmann, Arvo (1967) “Keelestatistikat Eesti vanasõnadest”. [Language statistics of Estonian proverbs .] Emakeele Seltsi aastaraamat (Tallinn) 13, 127–154.

Lotman, Maria-Kristiina and Mihhail Lotman (2014) “The accentual structure of Estonian syllabic-accentual iambic tetrameter”. Studia Metrica et Poetica 1, 2, 71–102.
http://dx.doi.org/10.12697/smp.2014.1.2.04

Mačutek, Ján (2008) “On the distribution of graphemic representation”. In Gabriel Altmann and Fan Fengxiang, eds. Analyses of script: properties of characters and writing systems, 75–78. Berlin, New York: Mouton de Gruyter.

Mikk, Jaan (2001) “Prior knowledge of text content and values of text characteristics”. Journal of Quantitative Linguistics 8, 1, 67–80.
http://dx.doi.org/10.1076/jqul.8.1.67.4094

Mikk, Jaan, Heli Uibo, and Jaanus Elts (2001) “Word length as an indicator of semantic complexity”. In Ludmila Uhliřová et al., eds. Text as a linguistic paradigm: levels, constituents, constructs, 187–195. (Quantitative Linguistics, 60.) Trier: wvt.

Orlov, Jurij K. (1982) “Linguostatistik. Aufstellung von Sprachnormen oder Analyse des Redeprozesses? (Die Antinomie‚ Sprache–Rede‘ in der statistischen Linguistik”. In Jurij K. Orlov, Moisej G., Boroda, I. Š. Nadarejšvili, eds. Sprache, Text, Kunst: Quantitative Analysen,
1–55. Bochum: Brockmeyer.

Popescu, Ioan-Iovitz, Karl-Heinz Best, and Gabriel Altmann (2014) Unified modeling of length in language. Lüdenscheid: RAM-Verlag.

Rajyashree, K. S. (2008) “The phoneme-grapheme correspondence in Marathi”. In Gabriel Altmann, Iryna Zadorozhna, and Yuliya Matskulyak, eds. Problems of general, Germanic and Slavic linguistics. Papers for the 70th anniversary of Professor V. Levickij, 503–517. Chernivtsi: Books–XXI.

Shenton, Leonard R. and Patrick M. Skees (1970) “Some statistical aspects of amounts and duration of rainfall”. In Ganapati P. Patil, ed. Random counts in scientific work. Vol. 3: Random counts in physical science, geo science, and business, 73–94. University Park: Pennsylvania State University Press.

Taylor, John R., ed. (2015) The Oxford handbook of the word. Oxford: Oxford University Press.
http://dx.doi.org/10.1093/oxfordhb/9780199641604.001.0001

Tuldava, Juhan (1995) “Informational measures of causality”. Journal of Quantitative Linguistics2, 1, 11–14.
http://dx.doi.org/10.1080/09296179508590028

Tuldava, Juhan (1998) “Investigating causal relations in language with the help of path analysis”. Journal of Quantitative Linguistics5, 3, 256–261.
http://dx.doi.org/10.1080/09296179808590134

Wimmer, Gejza, Reinhard Köhler, Rüdiger Grotjahn, and Gabriel Altmann (1994) “Towards a theory of word length distribution”. Journal of Quantitative Linguistics 1, 1, 98–106.
http://dx.doi.org/10.1080/09296179408590003

Wimmer, Gejza and Gabriel Altmann (1996) “The theory of word length: some results and generalizations”. In Glottometrika 15: Issues in general linguistics theory and the theory of word length, 112–133. Tier: WVT.

Wimmer, Gejza and Gabriel Altmann (1996) Thesaurus of univariate discrete probability distributions. Essen: Stamm.

Wimmer, Gejza and Gabriel Altmann (2005) “Unified derivation of some linguistic laws”. In Reinhard Köhler, Gabriel Altmann, and Rajmund G. Piotrowski, eds. Quantitative Linguistik. Quantitative linguistics. Ein Internationales Handbuch. An international handbook, 791–807. Berlin, New York: de Gruyter.

Wimmer, Gejza and Gabriel Altmann (2006) “Towards a unified derivation of some linguistic laws”. In Peter Grzybek, ed.Contributions to the science of text and language: word length studies and related issues, 329–337. (Text, Speech and Language Technology, 31.) Dordrecht, NL: Springer.
http://dx.doi.org/10.1007/1-4020-4068-7_17

Wray, Alison (2015) “Why are we so sure what a word is?”. In John R. Taylor, ed. The Oxford Handbook of the word, 725–750. Oxford: Oxford University Press.

Back to Issue