LVG Database Issues

  • Difficulties for distributing shrink wrapped version of LVG:

    Prior to lvg2002, Lexical tools are written in C and Java (partial code only). As Sun Micro System promised, Java is designed as a platform independent computer language. Accordingly, to distribute the shrink wrapped version of LVG in Java will resolve lots of platform compatibility problems that C had faced. Besides, C source codes of LVG were developed over years by several different developers. It is truly a legacy code that is very hard to be maintained or enhanced.

  • Difficulties for developing LVG in Java:

    As discussed above, Java is a good solution for the shrink wrapped version of LVG. However, there are several technical issues needed to be addressed first.

    1. Database:

      Issues: LVG requires lots of table lookup operations. Berkeley B-tree (instead of some database) is used for these operations. However, Berkeley B-tree does not work well in Java (in NT platform). In a word, we will need a mechanism to replace Berkeley B-tree in order to develop LVG in Java.


    2. Data persistent:

      Issues: There is a morphology function in LVG. Words are stored in the reverse order to facilitate the operation. This information requires being persistent. Java provides object serialization by implementing the Serializable and Externalizable interface. However, the performance is relative slow. Thus, to provide a better mechanism to store information in the hard disk is needed.