Zero Derivations

I. What are zero derivations
Conversion is a linguistic process that assigns an already existing word to a new syntactic (grammatical) category (part of speech) without any concomitant change in form (Lieber 2005: 418). It is the processes that may take part in the creation of new lexemes in English (Valera 2004: 20). This process is also known as a functional shift or zero derivation.

II. Zero derivation pairs in LEXICON
The base form and category are coded in the LEXICON. All words with same base form and different categories can be retrieved from LEXICON. However, the retrieved list is over-generated. Computer programs were developed to filter invalid derivation pairs (acronyms, abbreviations, and word size = 1) and facilitate human tagging process to generate zero derivations table.

In practice, zeroD also include follows:

  • spelling variants:
    • cesarian|adj|cesarean|noun
    • partisan|adj|partizan|noun
  • hyphenation:
    • first-class|adj|first class|noun
    • latin-american|adj|Latin American|noun
    • low-density|adj|low density|noun

However, we don't generate this type of zeroD from Lexicon because the spelling are not identical. Instead, we just categorize these dPairs as zeroD because there is no suffixation or prefix processes involved.

Also, we include compounds in zeroD (Huddleston and Pullum does not consider compounds zero derivations). For example:

  • roadmap|noun|E0511182|roadmap|verb|E0727494|yes
  • side-swipe|noun|E0587613|side-swipe|verb|E0587614|yes
  • step section|noun|E0730390|step section|verb|E0730391|yes
  • step-section|noun|E0730390|step-section|verb|E0730391|yes
  • off site|adj|E0728198|off site|adv|E0728199|yes
  • off-site|adj|E0728198|off-site|adv|E0728199|yes
  • offsite|adj|E0728198|offsite|adv|E0728199|yes

Please also refer to Lynn's notes for more details.

III. Processes

  • Prepare input files
    • ${DERIVATIONS}/ZeroDerivations/data/${YEAR}/dataOrg/LEXICON
      The latest LEXICON from lexicon.${YEAR}
    • ${DERIVATIONS}/ZeroDerivations/data/${YEAR}/dataOrg/zeroD.tag.txt
      A manual tag file for zero derivation. The baseline of this file is the previous year tag file. The tagged file of zeroD.tbd.data is then added. The format of this file is:
      basecategory-1EUI-1basecategory-2EUI-2tag
      where tag: yes|no
  • Run the program
    shell>cd ${DERIVATIONS}/ZeroDerivations/bin
    shell>GetZeroD ${YEAR}
    5

    The following iterative steps are need:
    • send the "zeroD.tbd.data" file to linguists to tag
    • add tagged file of "zeroD.tbd.data" to "zeroD.tag.txt"
    • rerun the program until there is no pairs in "zeroD.tbd.data"
  • Process overview

IV. Program Details (GetZeroDerivations)

  1. Generate bases of zero derivations from LEXICON
    • Descriptions:
      Retrieve all legal base forms (base and spelling variants) from LEXICON. It filters out base if:
      • acronym or abbreviation
      • word size is < 2 (= 1)
      • category filter is not used (all categories are accepted)
        Huddleston & Pullum (P. 1640) say that conversion (their preferred term for zero derivation) is not limited to the major parts of speech.
        For example:
        nobody|pron|E0042734|nobody|noun|E0042735|yes
        down|noun|E0023849|down|prep|E0023850|yes
        dare|verb|E0020794|dare|modal|E0020796|yes
        nightly|adj|E0042640|nightly|adv|E0042641|yes
        that|det|E0060479|that|compl|E0060480|yes
        before|adv|E0012237|before|conj|E0012239|yes

        No zero derivations associated with category of aux
      • derivation overlap filter is not used. Overlap is allowed, zero derivation could also be affixes (prefixes or suffixes).
        For example:
        suffix:
        flexion|noun|flex|verb|true
        flexure|noun|flex|verb|true

        zero derivation:
        flex|noun|flex|verb|true
    • Input files:
      • LEXICON: LEXICON of the release year
    • Output files:
      • bases.data: all legal bases for zero derivations
        basecategoryinflectionEUI
    • Other input parameters:
      • min. word size: 2
    • Associated Java files:
      • GetBasesFromLexicon.java
  2. Retrieve possible zero derivations pairs from base list
    • Descriptions:
      Retrieve all possible derivations forms legal bases. It retrieves all bases if they have more than one categories:
    • Input files:
      • bases.data: all legal bases for zero derivations
    • Output files:
      • zeroD.raw.data: raw data of possible zero derivation pairs
        basecategory-1EUI-1basecategory-2EUI-2
    • Associated Java files:
      • GetZeroDFromBaseFile.java
  3. Get zero derivations meta tagged file
    • Descriptions:
      go through all pairs in "zeroD.raw.data" and add tag information (from zeroD.tag.txt):
      • yes: if tagged as "yes" in zeroD.tag.txt
      • no: if tagged as "no" in zeroD.tag.txt
      • tbd: if not tagged in zeroD.tag.txt

      Please note that not all zero derivation pairs retrieved from LEXICON (step 2) are valid derivation pairs. We define an seven fields (pipe separated) format for tagging the zero derivation pairs to validate derivational variants:

      basecategory-1EUI-1basecategory-2EUI-2

      Examples

      fast|adj|E0027369|fast|noun|E0027371|no
      fair|noun|E0027197|fair|verb|E0538382|no
      
    • Input files:
      • zeroD.raw.data: raw data of possible zero derivation pairs
      • zeroD.tag.txt: tag file of zero derivation pairs
        basecategory-1EUI-1basecategory-2EUI-2tag
    • Output files:
      • zeroD.meta.data: meta file of tagged zero derivation pairs
        basecategory-1EUI-1basecategory-2EUI-2tag
    • Associated Java files:
      • GetZeroDMetaFile.java
  4. Split zero derivations meta file
    • Descriptions:
      split "zeroD.meta.data" into three files according to the tag:
      • zeroD.yes.data: if tag is "yes"
      • zeroD.no.data: if tag is "no"
      • zeroD.tbd.data: if tag is "tbd"
    • Input files:
      • zeroD.meta.data: meta file of tagged zero derivation pairs
    • Output files:
      • zeroD.yes.data: used for the derivations table
        basecategory-1EUI-1basecategory-2EUI-2
      • zeroD.no.data: not used, just for reference
      • zeroD.tbd.data: need to tag this file, add to zeroD.tag.txt, and rerun the program
    • Associated Java files:
      • SplitMetaFile.java
  5. Add negation tag
    • Descriptions:
      Add negation tag (O) to zeroD.yes.data because all zeroD pairs are not negation.
    • Input files:
      • zeroD.yes.data: valid zero derivation pairs
    • Output files:
      • zeroD.yes.data.${YEAR}: used for the derivations table
        basecategory-1EUI-1basecategory-2EUI-2negation
    • Associated Java files:
      • AddNegationTagToFile.java