Install/Run Other Version of Data Set
A new feature is added to run TC to with other version of data set in Jdi, Sti, Stri, and StWsd since TC.2009 release. The supported data set includes TC.2007, TC.2008, and TC.2009. The procedures are detailed as follows:
II. Install Data Set
- Download data set from TC web site:
- Uncompress and unarchive this file into the top directory of TC. On a Linux machine, let's use 2008 version as an example, this would look like:
> mv tcData.2008.tgz ${TC_DIR} > cd ${TC_DIR} > gtar -xzvf tcData.2008.tgz
- Notes:
After this step, you should see the data directory under ${TC_DIR} as data.2008. This directory include a complete data set for TC.2008:- Config: default configuration file (used for references)
- HSqlDb: Database for TC.2008
- Jdi: files used for Jdi
- contractions.txt
- jds.txt
- jidTaJds.txt
- restrictWords.txt
- shs.txt
- stopWords.txt
- wordSignalWcDcGt1.txt
- Sti: files used for Sti
- sts.txt
- Stri: files used for Stri
- stJdTable.txt
III. Run Program with Specified Data Set
- Use run specified version option (-rv:STR). On a Linux
machine, let's use Jdi with 2008 version as an example, this would look like:
> cd ${TC_DIR}/bin > jdi -rv:2008 -p
- Notes:
the -rv:STR option performs following step on the background to run different versions:- Database: override the database name in configuration file according to the specified version, see table below.
- Data files directory: override the directory name of all TC input files in the configuration file according to the specified version, see table below.
- Default Max. normalized signal: The value of Max. normalized signal is updated yearly. Update the value of Max. signal in the inputFilterOption.legalWordOption according to the specified version, see table below.
DATA_DIR DB_NAME Max. Signal 2011 data/ tc2011 792054 2010 data.2010 tc2010 754648 2009 data.2009 tc2009 705815 2008 data.2008/ tc2008 645881 2007 data.2007/ tc2004 510754