SD-FACTs

I. What are suffix derivations
A suffix is an affix which is placed after the stem of a word. Please see suffix derivations for details.

II. Suffix Rules
LSG derives the most common suffixD rules with high recall (occurrence) and precision (low exceptions). Please see suffixD rules for details.

III. Processes

  • Prepare input files
    • ${DERIVATIONS}/suffixD/data/${YEAR}/dataOrg/bases.data
      Use the same file (bases.data) from prefixD
    • ${DERIVATIONS}/suffixD/data/${YEAR}/dataOrg/sdRules.data
      Use fromthe following source:
      • Original suffixD-RULEs (97)
      • New suffixD-RULEs from nomD
      • New suffixD-RULEs from original suffixD pairs
      • New suffixD-RULEs from other sources
  • Run the program
    shell> cd ${DERIVATIONS}/suffixD/bin
    shell> GetSuffixD ${YEAR}
    5
    
    The following iterative steps are need:
    • TBD

    Process overview

    TBD

IV. Programs Details
A new approach for suffixD FACTs and RULEs are developed in 2013 release. The suffixD pairs FACTs are derived by following processes:

  1. Retrieve possible suffixD pairs from LEXICON and suffixD-RULEs (raw)
    • Descriptions:
      Retrieve all possible suffixD pairs from LEXICON and suffixD-RULEs
    • Input files:
      • ./dataOrg/bases.data: base from the latest LEXICON
      • ./dataOrg/sdRules.data: sdRules for this release
    • Output files:
      • ./data/suffixD.raw.data: raw data of possible suffixD pairs
        Base 1Cat 1EUI 1Base 2Cat 2EUI 2
      • ./data/sdRules.rawNo.rpt: suffixD-Rule & raw no report (by the order or raw no)
      • ./data/sdRules2.rawNo.rpt: suffixD-Rule & raw no report (by the same order as sdRules.data)
    • Associated Java files:
      • GetSuffixFromBaseFile.java
    TBD...
  2. Remove child suffixD-RULES from parents suffixD-RULES
  3. Use valid nomD pairs and suffixD from previous years for tag (meta)
  4. Send un-tag suffixD pairs to linguists for tagging

  5. Use the result tagging to validate suffixD-RULEs for precision (exception) and frequency
  6. Add all suffixD.no to the exceptions of each suffixD-RULEs TBD...
    1. Generate derivations from nominalizations

    2. Get nominalization derivations meta file (nomD.raw.data), and then split into two files of nomD.yes.data and nomD.no.data :
      • Descriptions:
        go through all pairs in "nomD.raw.data" and add tag information to "nomD.meta.data" using following algorithm:
        • yes: all valid derivations from "nomD.raw.data"
        • no: all invalid derivations from "nomD.raw.data"
          • Pattern Filter: if it is invalid pattern (both directions)
            The most common way to nominalize a verb is by adding an affix. However, not every nominalization occurs that way. Thus, not every nominalization will be a derivation. For example, verb particles are not affixes. Four patterns of nominalization with verb particles are identified as invalid derivations. Derivation pairs are filtered out if they fall into these four patterns.
            • baseParticle|noun|eui 1|base|verb|eui 2
              Examples:
              backup|noun|E0321419|back|verb|E0011649|no
              cleanup|noun|E0319808|clean|verb|E0017272|no
              closeout|noun|E0587816|close|verb|E001744|no
              lineup|noun|E0521627|line|verb|E0037599|no
              lookup|noun|E0222422|look|verb|E003804|no
              setup|noun|E0320336|set|verb|E0055458|no
              takeover|noun|E0059818|take|verb|E0059816|no
              washout|noun|E0065084|wash|verb|E0065081|no
              ...

            • base-Particle|noun|eui 1|base|verb|eui 2
              Examples:
              cut-through|noun|E0588311|cut|verb|E0020215|no
              face-off|noun|E0588571|face|verb|E0027103|no
              fade-out|noun|E0587854|fade|verb|E0027177|no
              pull-up|noun|E0576246|pull|verb|E0051064|no
              phase-in|noun|E0588069|phase|verb|E0047185|no
              set-aside|noun|E0587818|set|verb|E0055458|no
              shake-up|noun|E0575525|shake|verb|E0055539|no
              warm-up|noun|E0586553|warm|verb|E0065055|no
              write-off|noun|E0587702|write|verb|E0065685|no
              ...

            • inflParticle|noun|eui 1|base|verb|eui 2
              Examples:
              grownup|noun|E0030484|grow|verb|E0030480|no
              ...

            • infl-Particle|noun|eui 1|base|verb|eui 2
              Examples:
              grown-up|noun|E0030484|grow|verb|E0030480|no
              salting-in|noun|E0587997|salt|verb|E0054234|no
              ...

              Please also note that above four patterns should not apply when:

              • preposition is "per" and
              • noun ends with "pper".

              The following examples are valid derivations:
              chopper|noun|E0343361|chop|verb|E0016729|yes
              ripper|noun|E0360460|rip|noun|E0053656|yes
              shipper|noun|E0360483|ship|noun|E0055655|yes
              shopper|noun|E0354647|shop|verb|E0055686|yes
              snapper|noun|E0346235|snap|verb|E0056428|yes
              worshipper|noun|E0554172|worship|verb|E0065637|yes

            Please note that the following example is a valid derivation because it does not belong to above pattern:
            run-on|noun|E0338312|run on|verb|EUI 2|yes

          • Invalid Derivations: if it is in invalid nomD list (nomD.tagNo.txt)
            Derivational are bi-directional. For examples, if A is a derivation of B, then B is a derivation of A. On the other hand, if A is not a derivation of B, then B is not a derivation of A. File, nomD.tagNo.txt, lists all known invalid derivation pairs with only one direction. Thus, we also need to check the reversed direction of these invalid derivations. These pairs should be filtered out.
            • this list does not include the pattern-filter described above:
            • list all invalid derivations (known by linguists)
            • 22 exceptions found in lvg.2012
            • Examples:
              face-saving|noun|E0027112|save|verb|E0054430|no
              decision-making|noun|E0021045|make|verb|E0038623|no
              merry-making|noun|E0039645|make|verb|E0038623|no
              lovemaking|noun|E0502721|make|verb|E0038623|no
              warm|noun|E0065054|warmed-up|adj|E0588482|no
              instability|noun|E0034830|unstable|adj|E0063378|no
              irradiation|noun|E0035884|nonirradiated|adj|E0042869|no
              ...
      • Input files:
        • ./data/nomD.raw.data

        • ./dataOrg/nomD.tagNo.txt: nomD with "no" tag, invalid nomD.
          Base 1Cat 1EUI 1Base 2Cat 2EUI 2
        • ./dataOrg/prepositions.data: prepositions
          • This file lists all particles (prepositions) found in LEXICON
          • 198 prepositions found in lvg.2012
          • This file is generated by LEXICON program and should be updated annually (used in LexCheck)
      • Output files:
        • ./data/nomD.meta.data: meta data with "yes", "no", "tbd" tags.
          Base 1Cat 1EUI 1Base 2Cat 2EUI 2tag
        • ./data/nomD.yes.data: valid nomD pairs
          Base 1Cat 1EUI 1Base 2Cat 2EUI 2
        • ./data/nomD.no.data: filtered out invalid nomD pairs
      • Associate java files:
        • GetNomDMetaFile.java