stWsd
StWsd tool applies the Sti/Stri to disambiguate ambiguous words. Sti uses the context (phrase or sentences/s) to disambiguate an ambiguous word whose meanings represent different semantic types called ST candidates. The ST candidate with the highest score/rank is presumed to be correct. stDocuments and Journal Descriptor Indexing (JDI) scores are the two main elements of STI scores. A stDocuments is a set of one-word Metathesaurus strings associated with an ST. JDI is a sophisticated methodology with consistent with consistent results for categorizing input text according to biomedical specialties, known as JDs. An optimal St document contains words which best represent the ST; the better the representation, the better the STI result. A new methodology is developed to enhance St documents to achieve better precision of WSD.
StWsd tool provides easy interface for users to find the best sense (ST) of an ambiguous word from given St candidates for a phrase or sentence(s). In shorts, three inputs are required to run StWsd:
- Ambiguous word
- ST candidates (possible sense in ST)
- context (phrase or sentence/s)
Follow the installation instructions to install text categorization tools and run the sti program. Check on the following items only if you don't use the provided script to install Text Categorization tools.
- CLASSPATH:
- include the Text Categorization tools distribution jar file, ${TC_DIR}/lib/tc2011dist.jar, in your CLASSPATH.
- include the TC top directory in your CLASSPATH.
- Configuration File: assign the full path of the top directory of tc2011 to a variable named ROOT_DIR in the configuration file, data/Config/tc.properties.
- Run java program
Enter the command:
> stWsd -aw:culture -can:idcn:lbpr -p - Please input a phrase or sentence/s (type "Ctl-d" to quit) > Cultural assessment in home healthcare. --> Found best sense for [culture] in the ST of [idcn|Idea or Concept] - Please input a phrase or sentence/s (type "Ctl-d" to quit) > The major differentiation products of maturing keratinocytes contain AP-1 regulatory motifs, and AP-1 DNA binding activity increases in cultured keratinocytes induced to differentiate by calcium. --> Found best sense for [culture] in the ST of [lbpr|Laboratory Procedure]
where:
- stWsd: StWsd script to run StWsd Java class
- -aw: ambiguous word
- -can: St candidates
- -p: set StWsd system option to show prompt (try -h option!)
StWsd take text as input:
- Phrase
- Sentence
- Sentences
StWsd calculates the combined STI scores of the input text for both word counts and document counts and sent the higher rank ST from the ST-candidates to output. If detail flag, -d, is used, the results include filtering details of final words, ST scores in following format:
Rank | ST Scores | ST abbreviation | ST name |
---|
Please refer to design document