Generate Derivational Variants

  • Short Description: Generate derivational variants

  • Full Description:

    Derivational variants are terms which are somehow related to the original term but do not share the same meaning. In linguistics, derivation is "Used to form new words, as with "happi-ness" and "un-happy" from "happy". Often, the derivational variant changes syntactic category from the original term. Derivational variants are generated by a lookup in a table of known derivations from database the derivation table (case insensitive), or they are generated by rules via adding, changing or removing common suffixes (case sensitive).

    There are two new heuristic rules implemented in the Java version to filter out non-realistic derivational variants generated by rules. They are governed by:

    • Min. length of a term:
      If the length of a term is too small (default value is 3), the word is usually an acronym or does not have too much meaning. Such terms could be filtered out by this rule.

    • Min. length of stem in trie tree:
      The stem length is the length of the word minus the length of input suffix rule. If the length of stem is too short, usually, the generated derivational variants are not good guess (from the rules) and should be filtered out. This is used in trie algorithm to filter out such cases.

      For example,

      RULE|ic$|adj|base|y$|noun|base

      The length of input suffix (ic$) is 2. If the input term is "zoic", the length of stem ("zo") is 2 (= 4 - 2). Accordingly, the rule-generated derivational variant, "zoy", is filtered out from the derivational variants of "zoic" by this rule (with default value 3).

    The values of above two variables are configurable in the configuration tool (${LVG_DIR}/data/config/lvg.properties). The default value are 3 and 3 for both Min. length of a term (MIN_TERM_LENGTH) and Min. length of stem in trie tree (DIR_TRIE_STEM_LENGTH), respectively.

    The results are sorted by category, length, case insensitive alphabetical order.

    The -m flag is used to display the additional information that can be retrieved with the derivation flow. The additional information consists of two parts: The fact or rule that generates the derivation variants and the fact or rule that was applied to the derivational form to produce the output.

  • Difference:
    1. The Java version shows all variants from different rules while C version shows one variants from different rules if the variant are the same.


  • Features:
    1. Fact: Find all derivational variants from derivation table.
    2. Rules: Find all derivational variants from morphology rules.
    3. Assign category and inflection for all outputs.
    4. Filter outputs according to the restriction flag (-kd).
    5. Display outputs by the frequency of categories.


  • Symbol: d

  • Examples:
    
    shell> lvg -f:d -m
    multiple|multiply|128|1|d|1|RULE|e$|noun|base|y$|noun|base|
    multiple|multiplant|128|1|d|1|RULE|e$|verb|base|ant$|noun|base|
    multiple|multiplicity|128|1|d|1|FACT|multiple|1|multiplicity|128|
    multiple|multiply|1024|1|d|1|FACT|multiple|1|multiply|1024|
    multiple|multiply|2|1|d|1|RULE|le$|adj|base|ly$|adv|base|
    multiple|multiply|1|1|d|1|RULE|e$|noun|base|y$|adj|base|
    
    help|helper|128|1|d|1|RULE|$|verb|base|er$|noun|base|
    help|helpfulness|128|1|d|1|FACT|help|128|helpfulness|128|
    help|helplessness|128|1|d|1|FACT|help|128|helplessness|128|
    help|helpfully|2|1|d|1|FACT|help|128|helpfully|2|
    help|helplessly|2|1|d|1|FACT|help|128|helplessly|2|
    help|helpful|1|1|d|1|FACT|help|128|helpful|1|
    help|helpless|1|1|d|1|FACT|help|128|helpless|1|
    
    gene|genic|1|1|d|1|FACT|gene|128|genic|1|
    gene|genetic|1|1|d|1|FACT|gene|128|genetic|1|
    
    More examples

  • Implementation Logic:
    • Use both facts and rules.
    • Facts:
      1. Performs a case insensitive search on the input term and term1 in the derivation table.
      2. Performs a case insensitive search on the input term and term2 in the derivation table.
      3. Check if the input categories are legal.
      4. Assigns term, category, inflection (base) for both source and target.
    • Rules:
      1. Uses persistent trie to apply rules (and check exceptions) on the input term.
      2. Assigns term, category, inflection (base) for both source and target.
    • Filter results according to the restriction filter.
    • Sort outputs by the frequency of categories, length, case insensitive alphabetical order.

  • Source Code: ToDerivation.java

  • Hierarchy: Object -> Transformation -> ToDerivation