Lexical Tools uses The Specialist LEXICON as corpus for mutation in many flow components. The data in LEXICON are processed and converted in relational database format and stored in embedded database tables in Lexical Tools. These flows include inflectional variants, derivational variants, acronyms, antiNorm, canonical forms, fruitful variants, nominalizations, properNoun, synonyms, etc.. Most of these tables are updated and generated by computer programs for the annual release. However, the derivational variants and synonyms tables are not updated by these computer programs. This section describes a new enhanced methodology to generate derivations table from LEXICON annually.
Derivational variants flow component is one of the most commonly used functions in Lexical Tools. Derivational variants are terms which are somehow related to the original term but do not share the same meaning (they are close in meaning). The existing algorithm of derivational variants flow uses both facts (known derivations in derivations table) and rules (via adding, changing, or removing common suffixes). Facts include 4,559 records, which are developed since C version of Lexical Tools, are stored in database and retrieved by SQL query. Rules of suffix derivations are stored and retrieved through Trie mechanism to generate derivational variants. Three options of heuristic rules are implemented in Java version to filter out non-realistic derivational variants generated by rules:
- Min. length of a term: so the derivation is a real word
- Min. length of stem in trie tree: so the applied rules are realistic for real word
- derivational flow specific filter option (-kd:int): so the word is known by LEXICON
The derivational table (facts) has not been updated since the first Java release of Lexical Tools release (2002) while LEXICON releases are updated annually. The main reason is because no derivational relationship (and meaning) is coded directly in LEXICON. There are four issues of the current derivations table (fact):
some derivational pairs (facts) in the original derivations table are synonyms and should be removed from the original derivations table.
- Zero derivations:
only very few of Zero derivation pairs are not included
derivations by prefix are not included
Only a limited number of suffix known derivations (fact) are included. Most suffix derivations are generated by suffix rules (which reduce the precision).
The following section describes details about the original derivations tables.
The following section describes a new methodology to address the above issues to enhance derivational variants generation.