Generate Inflectional Variants - Specifying Output Categories and Inflections

  • Short Description: Generate inflectional variants, specifying the Bit OR'ed output categories, and output inflections.

  • Full Description:

    The inflection operation can be qualified to restrict output category and inflection by specifying the category bit vector and the inflection bit vector.

    The category values can be OR'ed from the following values:

    CategoryValue
    adj1
    adv2
    aux4
    compl8
    conj16
    det32
    modal64
    noun128
    prep256
    pron512
    verb1024
    all2047

    The inflection values can be OR'ed from the following values:

    InflectionValue
    base1
    comparative2
    superlative4
    plural8
    presPart16
    past32
    pastPart64
    pres3128
    positive256
    singular512
    infinitive1024
    pres123p2048
    pastNeg4096
    pres123pNeg8192
    pres1s16384
    past1p23pNeg32768
    past1p23p65536
    past1s3sNeg131072
    pres1p23p262144
    pres1p23pNeg524288
    past1s3s1048576
    pres2097152
    pres3sNeg4194304
    presNeg8388608
    all16777215

    In some domains, nouns are the vast majority of terms in the vocabulary. Furthermore, for terms which can be interpreted as either nouns or as some other categories, the noun sense is much more likely. Under these circumstances, one might want to restrict one's output to nouns and ignore the other senses of words. Further, many indexing vocabularies have nouns in their plural form rather than in their singular form, so one might want to restrict one's output to plural nouns.

    Users may input "all" instead of "2047" or "16777215" to represent all categories or all inflections, respectively. For instance, users may construct command such as "-f:ici~128+all" to get all inflectional variants for all nouns (including all inflections).

    The results are sorted as in inflection flow component. It is sorted by the frequency of category, length, and case insensitive alphabetical order.

    If the -m flag is specified, two types of possible information may be appended to the outputs. The formats of possible information are:

    • |FACT|uninflected term|category|uninflected inflection|inflected term|category|inflected inflection|EUI|
    • |RULE|uninflected term|matched pattern|category|uninflected inflection|replaced pattern|category|inflected inflection|


  • Difference:
    1. Please refer to inflections.
    2. In the Java version, both options of category and inflection must be specified. They cannot be omitted. In other words, option flag, "all", is needed if users don't care about the option. For example, if you would like to get the noun of a input and don't care about the inflections, you may use -f:ici~128+16777215.
    3. EUI information is added into -m option (fact) in 2012


  • Features:
    1. Fact: Find all inflectional variants from inflection table.
    2. Rules: Find all inflectional variants from morphology rules.
    3. Assign category and inflection for all outputs
    4. Filter output according to the restricted categories and inflections.
    5. Filter output according to the restriction flag (-ki)
    6. Display output by the frequency of categories.


  • Symbol: ici~LONG+LONG

  • Examples:
    
    shell> lvg -f:ici~128+8 -m
    elderly
    elderly|elderly|128|8|ici|1|FACT|elderly|noun|base|elderly|noun|plural|E0024667|
    elderly|elderlies|128|8|ici|1|FACT|elderly|noun|base|elderlies|noun|plural|E0024667|
    
    leaf
    leaf|leafs|128|8|ici|1|FACT|leaf|noun|base|leafs|noun|plural|E0037070|
    leaf|leaves|128|8|ici|1|FACT|leaf|noun|base|leaves|noun|plural|E0037070|
    
    neoplasm
    neoplasm|neoplasms|128|8|ici|1|FACT|neoplasm|noun|base|neoplasms|noun|plural|E0042193|
    
    shell> lvg -f:ici~128+all -m
    neoplasm
    neoplasm|neoplasm|128|1|ici|1|FACT|neoplasm|noun|base|neoplasm|noun|base|E0042193|
    neoplasm|neoplasm|128|512|ici|1|FACT|neoplasm|noun|base|neoplasm|noun|singular|E0042193|
    neoplasm|neoplasms|128|8|ici|1|FACT|neoplasm|noun|base|neoplasms|noun|plural|E0042193|
    
    shell> lvg -f:ici~all+8 -m
    neoplasm
    neoplasm|neoplasms|128|8|ici|1|FACT|neoplasm|noun|base|neoplasms|noun|plural|E0042193|
    
    left
    left|left|128|8|ici|1|FACT|left|noun|base|left|noun|plural|E0037124|
    left|lefts|128|8|ici|1|FACT|left|noun|base|lefts|noun|plural|E0037124|
    
    More examples

  • Implementation Logic:
    1. Call ToInflection.InflectWords (sorted)
    2. Filter out results by output categories and inflections.

  • Source Code: ToInflectionByCatInfl.java

  • Hierarchy: Object -> Transformation -> ToInflectionByCatInfl