Generate Inflectional Variants

  • Short Description: Generate inflectional variants

  • Full Description:

    Inflectional variants of terms include the singular and plurals for nouns, the various tenses of verbs, the positive, superlative and the comparative of adjectives and adverbs. By default, inflectional variants that are produced by fact are reported, and only if there are no such facts are rule generated inflections reported.

    Inflected forms are generated by first uninflecting the input term, then retrieving all inflected forms of the uninflected form. This differs from earlier versions of lvg. In the past, if an inflected form came in, only its uninflected form was generated. If an uninflected form came in, only the inflected forms (not including the uninflected form) were generated.

    In prior versions of LVG, the facts generated were only of irregular forms from the lexicon, relying upon the rules to generate the regular variant inflections. Under those circumstances, the default was that both facts and rules were reported, but a further filtering to a wordlist from the lexicon was done.

    All inflected forms are now contained in the facts file, so if a term is in the lexicon, all its inflected forms are there, negating the need to generate rule derived forms. Due to this design change, the indexed keys of inflected term in database must be case insensitive to handle various cases from input terms. Usually, case does not contribute too much meaning in NLP and thus this inflection flow is case insensitive and results in more aggressive results.

    Items returned from the inflection morphology unit are now sorted by part of speech, in an order which reflects frequency in the lexicon; nouns, adjectives, verbs, adverbs, modals and auxiliary verbs.

    An additional heuristic has also been implemented within the inflectional morphology unit to limit spurious variants. If a term goes through an inflectional morphology mutation, and the term is not known to the lexicon, but its rule generated inflectional form is known to the lexicon, this variant is thrown out, because it is likely to be wrong.

    The results are sorted by the frequency of category, length, case insensitive alphabetical order.

    The -m flag is used to display the additional information that can be retrieved with the inflection flow. The additional information consists of two parts: The fact or rule that uninflects the term and the fact or rule that are applied to the uninflected form to produce the output. The formats of these two parts are:

    • |FACT|uninflected term|category|uninflected inflection|inflected term|category|inflected inflection|EUI|
    • |RULE|uninflected term|matched pattern|category|uninflected inflection|replaced pattern|category|inflected inflection|


  • Difference:
    1. The Java version add heuristic rule so that if an inflectional term is generated by morphology (trie) and is in lexicon, it is filter out because it is likely to be wrong.
    2. The Java version has new data of inflection table in database.
    3. New data includes new scheme of inflection.
    4. New data separates "base" from "singular", "positive", and "infinitive".
    5. EUI information is added into -m option (fact) in 2012


  • Features:
    1. Fact: Find all inflectional variants from inflection table.
    2. Rules: Find all inflectional variants from morphology rules.
    3. Assign category and inflection for all outputs.


  • Symbol: i

  • Examples:
    
    shell> lvg -f:i
    sleep
    sleep|sleep|128|1|i|1|
    sleep|sleep|128|512|i|1|
    sleep|sleep|1024|1|i|1|
    sleep|sleep|1024|262144|i|1|
    sleep|sleep|1024|1024|i|1|
    sleep|slept|1024|32|i|1|
    sleep|slept|1024|64|i|1|
    sleep|sleeps|1024|128|i|1|
    sleep|sleeping|1024|16|i|1|
    
    More examples

  • Implementation Logic:
    • Facts:
      1. Performs a case insensitive search on the input term and inflected terms in the inflection database table.
      2. Grabs the EUI and uninflected form from the found records.
      3. Performs a search on both EUI and uninflected form the inflection table.
      4. Keeps only those records (rows) that match the categories and inflections.
      5. Assigns term and category for both source and target.
    • Rules:
      1. Uses persistent trie to apply rules (and check exceptions) on the input term.
      2. Delete output if it is known to LVG.
      3. Assigns term and category for both source and target.
    • Restrict the output according to the inflection restriction on output filter (-ki).
    • Sort outputs by the frequency of category, length, and case insensitive alphabetical order.

  • Source Code: ToInflection.java

  • Hierarchy: Object -> Transformation -> ToInflection