DISTRIBUTIONAL HYPOTHESIS: WORDS FOR ‘HUMAN BEING’ AND THEIR ESTONIAN COLLOCATES; pp. 141–158Full article in PDF format | doi: 10.3176/tr.2013.2.03
The article was inspired by the Distributional Hypothesis by Zellig Harris, which states that words occurring in similar contexts tend to have a similar meaning. The hypothesis was tested by a comparison of the 10 most frequent collocates of the Estonian words for ‘human being’. In the present study, the word collocate is used in a neo-Firthian sense, covering all the words that co-occur with the node word the most often. The collocates of the words inimene ‘human’, mees ‘man’, naine ‘woman’, laps ‘child’, tüdruk ‘girl’, poiss ‘boy’, tütar ‘daughter’, poeg ‘son’, ema ‘mother’ and isa ‘father’ were drawn from the context ‘three words to the left’ of the node word as occurring in the Newspaper subcorpus of the Balanced Corpus of Estonian. The comparison involved the 30 most frequent collocates for each node word. Assuming that a bigger number of shared collocates means a greater semantic closeness, intersections of collocates of the Estonian words for ‘human being’ were computed. It turned out that antonymous words had the highest number of collocates in common, which indicates that syntagmatic relations of words may also reflect some of their paradigmatic relations. In addition, what may be decisive for the part of speech of collocates, is analysed.
Bartsch, Sabine (2004) Structural and functional properties of collocations in English: a corpus study of lexical and pragmatic constraints on lexical co-occurrence. Tübingen: Gunter Narr Verlag.
Cruse, David Alan (2001) Lexical semantics. Cambridge University Press.
Evert, Stefan (2005) The statistics of word coocurrences: word pairs and collocations. PhD dissertation, IMS, University of Stuttgart.
Eesti keele seletav sõnaraamat. 2nd ed. 6 vols. [Explanatory Dictionary of the Estonian language.] Tallinn: Eesti Keele Sihtasutus, 2009.
Firth, John Rupert (1957) Papers in linguistics, 1934–1951. Oxford University Press.
Fodor, Jerry A. and Jerrold J. Katz (1963) “The structure of a semantic theory”. Language 39, 2, 170–210.
Gesuato, Sara (2003) “The company women and men keep: what collocations can reveal about culture”. In Proceedings of the corpus linguistics 2003 conference, 253–262. Dawn Archer, Paul Rayson, Andrew Wilson, and Tony McEnery, eds. UCREL, Lancaster University. Available at <http://ucrel.lancs.ac.uk/publications/cl2003/papers/gesuato.pdf>. Acsessed on 26.02.2013.
Harris, Zellig Sabbatai (1954) “Distributional structure”. Word. Journal of the linguistic circle of New York. 10, 2–3, 146–162.
Harris, Zellig Sabbatai (1957) “Co-occurrence and transformation in linguistic structure”. Language 33, 3, 283–340.
Jackson, Howard and Etienne Zé Amvela (2000) Words, meaning and vocabulary: an introduction to modern English lexicology. London and New York: Continuum.
Kaalep, Heiki-Jaan (1997) “An Estonian morphological analyser and the impact of a corpus on its development”. Computers and the Humanities 31, 115–133.
Kaalep, Heiki-Jaan and Kadri Muischnek (2002) Eesti kirjakeele sagedussõnastik. [Estonian Frequency Dictionary.] Tartu: Tartu Ülikooli kirjastus.
Lyons, John (1977) Semantics. 2 vols. Cambridge: Cambridge University Press.
McEnery, Tony and Andrew Hardie (2012) Corpus linguistics: method, theory and practice. New York: Cambridge University Press.
Miller, Georg A. and Walter Charles (1991) “Contextual correlates of semantic similarity”. Language and Cognitive Processes 6, 1, 1–28.
Muischnek, Kadri (2005) “Eesti keele verbikesksed püsiühendid tekstikorpuses”. [Estonian multi-word expressions in a text corpus.] Emakeele Seltsi aastaraamat (Tallinn) 51, 80–106.
Murphy, Lynne (2006) “Antonymy as lexical constructions: or, why paradigmatic construction is not an oxymoron.” In Constructions all over: case studies and theoretical implications. Constructions SV, 1–8. Doris Schönefeld ed. Available online at <www.constructions-online.de>. Accessed on 01.04.2013.
Nida, Eugene Albert (1975) Componential analysis of meaning: an introduction to semantic structures. The Hague: Mouton.
Palmer, Harold E. (1949 ) A grammar of English words. London, New York, and Toronto: Longman.
Panther, Klaus-Uwe and Linda Thornburg (2012) “Antonymy in language structure and use”. In Cognitive linguistics between universality and variation, 159–186. Mario Brdar, Milena Žic Fuchs, and Ida Raffaelli, eds. Newcastle upon Tyne: Cambridge Scholars.
Rubenstein, Herbert and John B. Goodenough (1965) “Contextual correlates of synonymy”. Communications of the ACM [Association for Computing Machinery] 8, 10, 627–633.
Sahlgren, Magnus (2006) The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. PhD dissertation. Stockholm University.
Sahlgren, Magnus (2008) “The distributional hypothesis”. Rivista di Linguistica (Italian Journal of Linguistics) 20, 1, 33–53. Special issue From context to meaning: distributional models of the lexicon in linguistics and cognitive science.
Saussure, Ferdinand de (2000) Course in general Linguistics. Charles Bally and Albert Sechehaye, eds. London: Duckworth.
Scott, Mike (2004) “WordSmith Tools version 4”: computer program. Oxford: Oxford University Press.
Sinclair, John McHardy (2004) Trust the text: language, corpus and discourse. Ronald Carter, ed. London and New York: Routledge.
Stubbs, Michael (2001) Words and phrases: corpus studies of lexical semantics. Oxford: Blackwell.
Sutrop, Urmas (2000) “Basic terms and basic vocabulary”. In. Estonian: typological studies 4, 118–145. Mati Erelt, ed. Tartu.
Sutrop, Urmas (2011) “Mis on põhivärvinimi, põhitase ja põhitaseme objekt?”. [What is basic colour term, basic level and basic level object?] In Värvinimede raamat, 39–46. [Book on colour names.] Mari Uusküla and Urmas Sutrop, eds. Tallinn: Eesti Keele Sihtasutus.
NBCE = Newspaper subcorpus of the balanced corpus of Estonian. Available online at <http:// www.cl.ut.ee/korpused/grammatikakorpus/ajalehekirjeldus>. Acsessed on 25.08.2008.Uiboaed, Kristel (2010) “Statistilised meetodid murdekorpuse ühendverbide tuvastamisel”. [Statistical methods for phrasal verb detection in Estonian dialects.] Eesti Rakenduslingvistika Ühingu Aastaraamat. Estonian Papers in Applied Linguistics 6, 307–326.
Back to Issue