The Text Categorization tool 2007 version is the first offical public rlease.It was developed in pure Java, capable of handling UTF-8. Belows are some specificatio nof this tool.
IO
UTF-8.
Program
Provides iscripts for command line tools
Configuration file
Provides options configurable by the Configuration file
APIs
Provides Java API classes
Database
Embedded in the project by using HyperSql database