Derivations - Suffix
I. What are suffix derivations
In linguistics, a suffix (also sometimes called a postfix or ending) is an affix which is placed after the stem of a word. A derivational suffix usually applies to words of one syntactic category and changes them into words of another syntactic category. For example:
Even derivational suffixes might be applied to all categories. In general, derivation suffixes generate only three category: noun, verb, and adj. Some derivational suffixes do multiple tasks. For example, the suffix -ate can create nouns, adjectives and verbs.
Derivational suffixes can be redundant. That is, two suffixes may indicate the same category.
Derivation suffixes can be applied many times to creates words. The last derivational suffix determines the part of speech. For example:
II. Derivational suffix list
There are several hundreds of derivational suffixes. We collected the most common suffixes for derivations (derivational suffix list) in Lexical Tools.
III. Derivational suffix rules
From the suffix list, we can generate derivation rules by following steps:
- evaluate all rules use LEXICON as corpus
- go through each rules and find all derivation pairs
- tag all possible suffix derivational pairs (maybe only on the differences between old lvg -f:do)
- generate derivation suffix rules if:
- the total number of derivation pairs is >= min. number
- the "yes" tag percentage is >= min. acceptance
- find exceptions ("no tag")
- the suffix rules is in this format:
- All rules should be bi-directional and stored in one rule-list file
- The rule-list file can be used to generate the rule documents in HTML format
IV. Suffix derivations generation
Derivation suffix rules are stored and retrieved through Trie mechanism to generate derivational variants. There are heuristic rules implemented to filter out non-realistic derivational variants generated by rules. They are:
- Min. length of stem in trie tree:
The stem length is the length of the word minus the length of input suffix rule. If the length of stem is too short, usually, the generated derivational variants are not good guess (from the rules) and should be filtered out. This is used in trie algorithm to filter out such cases.
- Min. length of a term:
If the length of a term is too small (default value is 3), the word is usually an acronym or does not have too much meaning. Such terms could be filtered out by this rule.
The length of input suffix (ic$) is 2. If the input term is "zoic", the length of stem ("zo") is 2 (= 4 - 2). Accordingly, the rule-generated derivational variant, "zoy", is filtered out from the derivational variants of "zoic" by this rule (with default value 3)