Release Notes
These release notes summarize new enhanced features and fixed bugs for the most recent releases of Text Categorization Tools Java.
Version 2008
The 2008 release is the 2nd version of the Java Text Categorization Tools.
This version includes 4 main script programs. They are:
- mlt (MEDLINE Tokenizer)
- jdi (Journal Descriptor Indexing)
- sti (Semantic Type Indexing)
- stri (Semantic Type Indexing, Real-Time)
This release includes completing 45 software change requests (SCRs).
They are described as follows.
I. Main Feature enhancements
- Release Package
- Distributed with JRE, 1.6.0_5
- Distributed with HSqlDb 1.8.0.7 (HyperSonic SQL DB)
- Provide sample Java codes for using Text Categorization tools APIs
- Provide configurable option to auto locate the top directory of TC
- Provide version information on ${TC_DIR}/data/versions.txt
- TC Data
- New design of WORD_ST table in DB
- Released with latest tables from MEDLINE.2008 and Metathesaurus.2007AC
- Released with latest tables for JDI/STI/STRI
- Used new designed algorithm to generate stDocuments for STI and STRI
- Latest update on stopWords list
- Latest update on default value of Max. cutoff for normalized count
- Java APIs
- New APIs for calculate similarity of JD vectors
- New APIs for calculate similarity of ST vectors
- Tool Options
- New features of choosing score display factor to display score in JDI/STI/STRI
- New features of choosing output fields filter in JDI/STI/STRI
- New features of displaying detail scores for all legal words in JDI/STI/STRI
- New features of displaying Semantic Types in TUIs, ST names, and ST abbreviations in STI/STRI
- New features of providing Acronym filter option in JDI/STI/STRI
- New features of indexing MH/SH in STRI
- New features of preserving PMID in MLT
- New features of sorting results in PMID in MLT
- New features of taking unique identifier in input for JDI/STI/STRI
II. Bug fixes:
- Bug fixes in JavaDoc
- Bug fixes for Java class of Count2fOperator
- Bug fixes for Java class of Count2fStOperator