Derivations - Nominalizations

I. What is nominalizations
A derivation derives a new word from an existing word, by adding an affix (prefix or suffix) to it. When the new word which is derived from another existing word via affixation is a noun, the process (and the new noun) is nominalization. For examples:

  • kind|adj|E0036524|kindness|noun|E0036531
  • acid|adj|E0006889|acidity|noun|E0006897
  • deflate|verb|E0021233|deflation|noun|E0021234

Nominalizations are a type of derivation and are coded in LEXICON in the slot of "nominalization=". Such as:

{base=kind
entry=E0036524
cat=adj
variants=reg
position=attrib(1)
position=pred
nominalization=kindness|noun|E0036531
}

{base=kindness
entry=E0036531
cat=noun
variants=reg
variants=uncount
compl=pphr(of,np)
nominalization_of=kind|adj|E0036524
}

Nominalizations are either suffixD or zeroD. From the LEXICON, we are able to

  • update derivations by retrieving nominalizations from LEXICON
  • govern suffix rules from nominalizations

II. Nominalization derivation pairs from LEXICON
All nominalization from LEXICON are retrieved and stored in file, LRNOM. The format is:

EUI 1Base 1Cat 1EUI 2Base 2Cat 2
Computer programs were developed to filter out invalid derivation pairs from nominalizations and store into derivation pairs format:
Base 1Cat 1EUI 1Base 2Cat 2EUI 2

Algorithm:

  • dPair validation:
    • filter out known invalid dPairs (nomD.tagNo.txt.${YEAR})
    • filter out from pattern of invalid dPair (prepositions.data.${YEAR}):
      • xxxparticle|noun|eui1|xxx|verb|eui2
        lookup|noun|E0222422|look|verb|E003804
      • xxx-particle|noun|eui1|xxx|verb|eui2
        grown-up|noun|E0030484|grow|verb|E0030480
  • dType validation
    • dPairs caused by spelling variants are excluded

III. Procedures

  • Prepare input files
    • ${DERIVATIONS}/Nominalizations/data/${YEAR}/dataOrg/LRNOM
      The latest nominalization file (LRNOM) from lexicon.${YEAR}
    • ${DERIVATIONS}/Nominalizations/data/${YEAR}/dataOrg/prepositions.data
      The latest prepositions from lexicon.${YEAR}. This file is used/generated in the latest LexCheck package.
    • ${DERIVATIONS}/Nominalizations/data/${YEAR}/dataOrg/nomD.tagNo.txt
      A file lists all invalid derivations from nominalizations which need to be fixed in LEXICON. These list are not in the pattern of noun + particle|verb
  • Run the program
    shell> cd ${DERIVATIONS}/Nominalizations/bin
    shell> GetNomD ${YEAR}
    3
    
    The following iterative steps are need:
    • update nomD.tagNo.txt.${YEAR}
      => All those nomD are identified with unknonw (U) dType needs to be revied.
    • update preposition.data.${YEAR}

  • Process overview

IV. Programs Details (GetNomD)

  1. Generate derivations from nominalizations
    • Descriptions:
      Retrieve all possible derivation pairs from nominalization and change to derivation format
    • Input files:
      • /dataOrg/LRNOM: nominalization file
    • Output files:
      • ./data/nomD.raw.data: raw data of possible nominalization derivation pairs
        Base 1Cat 1EUI 1Base 2Cat 2EUI 2
    • Associated Java files:
      • GetNomDFromNomFile.java

  2. Get nominalization derivations meta file (nomD.raw.data), and then split into two files of nomD.yes.data and nomD.no.data :
    • Descriptions:
      go through all pairs in "nomD.raw.data" and add tag information to "nomD.meta.data" using following algorithm:
      • yes: all valid derivations from "nomD.raw.data"
      • no: all invalid derivations from "nomD.raw.data"
        • Pattern Filter: if it is invalid pattern (both directions)
          The most common way to nominalize a verb is by adding an affix. However, not every nominalization occurs that way. Thus, not every nominalization will be a derivation. For example, verb particles are not affixes. Four patterns of nominalization with verb particles are identified as invalid derivations. Derivation pairs are filtered out if they fall into these four patterns.
          • baseParticle|noun|eui 1|base|verb|eui 2
            Examples:
            backup|noun|E0321419|back|verb|E0011649|no
            cleanup|noun|E0319808|clean|verb|E0017272|no
            closeout|noun|E0587816|close|verb|E001744|no
            lineup|noun|E0521627|line|verb|E0037599|no
            lookup|noun|E0222422|look|verb|E003804|no
            setup|noun|E0320336|set|verb|E0055458|no
            takeover|noun|E0059818|take|verb|E0059816|no
            washout|noun|E0065084|wash|verb|E0065081|no
            ...

          • base-Particle|noun|eui 1|base|verb|eui 2
            Examples:
            cut-through|noun|E0588311|cut|verb|E0020215|no
            face-off|noun|E0588571|face|verb|E0027103|no
            fade-out|noun|E0587854|fade|verb|E0027177|no
            pull-up|noun|E0576246|pull|verb|E0051064|no
            phase-in|noun|E0588069|phase|verb|E0047185|no
            set-aside|noun|E0587818|set|verb|E0055458|no
            shake-up|noun|E0575525|shake|verb|E0055539|no
            warm-up|noun|E0586553|warm|verb|E0065055|no
            write-off|noun|E0587702|write|verb|E0065685|no
            ...

          • inflParticle|noun|eui 1|base|verb|eui 2
            Examples:
            grownup|noun|E0030484|grow|verb|E0030480|no
            ...

          • infl-Particle|noun|eui 1|base|verb|eui 2
            Examples:
            grown-up|noun|E0030484|grow|verb|E0030480|no
            salting-in|noun|E0587997|salt|verb|E0054234|no
            ...

            Please also note that above four patterns should not apply when:

            • preposition is "per" and
            • noun ends with "pper".

            The following examples are valid derivations:
            chopper|noun|E0343361|chop|verb|E0016729|yes
            ripper|noun|E0360460|rip|noun|E0053656|yes
            shipper|noun|E0360483|ship|noun|E0055655|yes
            shopper|noun|E0354647|shop|verb|E0055686|yes
            snapper|noun|E0346235|snap|verb|E0056428|yes
            worshipper|noun|E0554172|worship|verb|E0065637|yes

          Please note that the following example is a valid derivation because it does not belong to above pattern:
          run-on|noun|E0338312|run on|verb|EUI 2|yes

        • Invalid Derivations: if it is in invalid nomD list (nomD.tagNo.txt)
          Derivational are bi-directional. For examples, if A is a derivation of B, then B is a derivation of A. On the other hand, if A is not a derivation of B, then B is not a derivation of A. File, nomD.tagNo.txt, lists all known invalid derivation pairs with only one direction. Thus, we also need to check the reversed direction of these invalid derivations. These pairs should be filtered out.
          • this list does not include the pattern-filter described above:
          • list all invalid derivations (known by linguists)
          • 22 exceptions found in lvg.2012
          • Examples:
            face-saving|noun|E0027112|save|verb|E0054430|no
            decision-making|noun|E0021045|make|verb|E0038623|no
            merry-making|noun|E0039645|make|verb|E0038623|no
            lovemaking|noun|E0502721|make|verb|E0038623|no
            warm|noun|E0065054|warmed-up|adj|E0588482|no
            instability|noun|E0034830|unstable|adj|E0063378|no
            irradiation|noun|E0035884|nonirradiated|adj|E0042869|no
            ...
    • Input files:
      • ./data/nomD.raw.data

      • ./dataOrg/nomD.tagNo.txt: nomD with "no" tag, invalid nomD.
        Base 1Cat 1EUI 1Base 2Cat 2EUI 2
      • ./dataOrg/prepositions.data: prepositions
        • This file lists all particles (prepositions) found in LEXICON
        • 198 prepositions found in lvg.2012
        • This file is generated by LEXICON program and should be updated annually (used in LexCheck)
    • Output files:
      • ./data/nomD.meta.data: meta data with "yes", "no", "tbd" tags.
        Base 1Cat 1EUI 1Base 2Cat 2EUI 2tag
      • ./data/nomD.yes.data: valid nomD pairs
        Base 1Cat 1EUI 1Base 2Cat 2EUI 2
      • ./data/nomD.no.data: filtered out invalid nomD pairs
    • Associate java files:
      • GetNomDMetaFile.java
  3. Add negation tag
    Please refer to suffix negaton tag section