Multiword Expression Reading Group

Things to think about

    Determine (a) which out of this list of expressions are multiword expressions, and (b) which of the Moon categories each multiword expression belongs to.

Working Definitions, Postulations, Points of Contention

    Multiword expression (MWE): any phrase that is not entirely predictable on the basis of standard grammar rules and lexical entries

    No immediate counterexamples to the claim that any expression that can be realised hyphenated/as a single lexeme or alternatively with spaces (e.g. mailman/postman vs. mail/post man), is a MWE. This could be used in the evaluation of extraction techniques, possibly using external resources to determine whether extracted expressions can be expressed hyphenated/without spaces (e.g. determine "optimal extraction volume" as the point where the ratio of such expressions is maximised)

Qualities of Multiword Expressions

    Institutionalisation/conventionalisation: process of an expression becoming recognised and accepted as a lexical item, through consistent use over time (necessary but not sufficient condition on MWE-hood)

    Lexicogrammatical fixedness: formal rigidity, preferred lexical realisation, restrictions on aspect, mood, voice, etc. (neither necessary nor sufficient condition on MWE-hood)
    • lexicogrammatically fixed MWE: kick the bucket, #the bucket was kicked, #slowly kick the bucket
    • lexicogrammatically fixed non-MWE: look like, *(to be) looked like, *is looking like

    Semantic/pragmatic non-compositionality: there is a mismatch between the semantics/pragmatics of the parts and the whole; includes the case of the component lexical items having specialised meanings within the context of the MWE, not accessible in simplex contexts (not necessary but sufficient)
    • idiomatic expression (non-compositional): the expression is semantically opaque and functions as a gestalt (e.g. kick the bucket)
    • idiomatically combining expression (idiosyncratically compositional): the lexical parts can be seen to (post hoc) assume components of the semantics of the whole, whereby the sum of the parts equals the whole (e.g. let the cat out of the bag)

    Syntactic irregularity: the expression cannot be parsed based on the simplex morphology (parts of speech) of the components (not necessary but sufficient)
    • syntactically-irregular MWEs: all of a sudden, the be all and end all of NP
    • syntactically regular MWEs: kick the bucket, fly off the handle

    Non-identifiability: when first exposed to the expression, the meaning cannot be predicted from its surface form (not necessary but sufficient)
    • idiom of decoding (non-identifiable): "misleading lexical clusters" (e.g. kick the bucket, fly off the handle)
    • idiom of encoding (identifiable): idiosyncratic lexical combination; note that all idioms of decoding are also idioms of encoding (example strict idioms of encoding -- wide awake, plain truth)

    Situatedness: the expression is associated with a fixed pragmatic point (neither necessary nor sufficient)
    • situated MWEs: good morning, all aboard
    • non-situated MWEs: first off, to and fro

    Figuration: the expression encodes some metaphor, metonymy, hyperbole, etc, even if the nature thereof is underspecified (neither necessary nor sufficient)
    • figurative expressions: bull market, beat around the bush
    • non-figurative expressions: first off, to and fro

    Proverbiality: the expression is used "to describe--and implicitly, to explain--a recurrent situation of particular social interest ... in virtue of its resemblance or relation to a scenario involving homely, concrete things and relations" (neither necessary nor sufficient)

    Informality: the expression is associated with more informal or colloquial registers (neither necessary nor sufficient)

    Affect: the expression encodes a certain evaluation of affective stance toward the thing it denotes (neither necessary nor sufficient)

Types of Multiword Expressions (a la Moon [1998])

    Anomalous collocations: lexicogrammatically marked
    • (syntactically) ill-formed collocations: (at all, by and large)
    • cranberry collocations: idiosyncratic lexical component -- one or more words found only in that collocation (in retrospect, kith and kin)
    • defective collocations: idiosyncratic meaning component (in effect, foot the bill)
    • phraseological collocations: semi-productive constructions, occurring in paradigms (in/into/out of action, on show/display)

    Formulae: pragmatically marked
    • simple formulae/sayings: compositional strings with a special discourse function (alive and well, a horse, a horse, my kingdom for a horse)
    • metaphorical/literal proverbs: (you can't have your cake and eat it, enough is enough)
    • similes (as good as gold)

    Metaphors: semantically marked (non-compositional)
    • transparent metaphors: (behind someone's back, pack one's bags)
    • semi-transparent metaphors: (on an even keel, pecking order)
    • opaque metaphors: (bite the bullet, kick the bucket)

    Collocations: compositional word co-occurrence of markedly high frequency
    • semantic collocations: co-occurrence preferences/priming effects (jam with FOOD)
    • lexico-semantic collocations: collocation paradigms (rancid butter/fat, face the truth/facts/problem)
    • syntactic collocations: fully-productive phraseological collocations (too ... to ...)

LinGO Multiword Expression (MWE) Reading Group Materials

(in reverse chronological order)
    30 Jul, 2002: a basic literature review of the semantics of verb-particles

    10 May, 2002: verb-particle extraction

    8 Feb, 2002: the Cambridge Grammar of the English Language view of verb-particles

    25 Jan, 2002: the QGLS view of verb-particles

    18 Jan, 2002: Tim's personal ramblings on collocation extraction

    27 Nov, 2001: basic introduction to collocation extraction

    30 Oct, 2001: tying up loose ends with respect to definition

    23 Oct, 2001: take closer look at Moon's classification, and actually apply her system in classifying MWEs

    16 Oct, 2001: presentation of R. Moon (1998) Fixed Expressions and Idioms in English: A Corpus-based Approach, Chapters 1-2, Clarendon Press

    5 Oct, 2001: basic outline of multiword expression project, based on the NSF proposal

Last modified: Thu Aug 21 12:42:24 PDT 2003