LVG Rules: Trie Introduction
A trie algorithm is used in LVG to find inflection variants and derivation. Rules are:
- used when LVG does not find the result from fact (database table)
- used as an independent program to find the results
- driven by rules and exceptions specified in files
- implements as trie tree (refer to Data Structure and Algorithms, by Aho, Hopcroft, and Ullman, PP.163-169)
The high level algorithm flows are described as follows:
- Establish the trie tree by reading data (tree structure, rules, and exceptions) from files.
- Read in input term.
- Traverse the trie tree and find all match nodes.
- Check if the input term belongs to exceptions of the applied rules.
- Apply rules in match nodes to the input term.
III. Trie and Tree
Tree has been used as powerful search algorithm. It includes binary tree and B-tree. A binary tree (searching) stores all data into a tree structure with all nodes have 2 branches at most. The B-tree was invented by Bayer and McCreight in the early 70's. It's now part of every database system. A node in a B-tree may have multiple branches.
Trie is derived from the middle letters of the word "retrieval.
In real applications, tree stores a whole word as a node while trie break a word into characters and store each single character into the tree structure.
< Example >
Data: be, bed, beset, bess, best, bet, better.
+--- Node (bed)
| +--- Node (be)
| +--- Node (beset)
+--- Node (bet)
+--- Node (best)
+--- Node (better)
+--- Node (e *)
+--- Node (d *)
+--- Node (s)
| +--- Node (e)
| | |
| | +--- Node (t *)
| +--- Node (s *)
| +--- Node (t *)
+--- Node (t *)
+--- Node (t)
+--- Node (e)
+--- Node (r *)