StWsd Java

Introduction

StWsd tool applies the Sti/Stri to disambiguate ambiguous words. Sti uses the context (phrase or sentences/s) to disambiguate an ambiguous word whose meanings represent different semantic types called ST candidates. The ST candidate with the highest score/rank is presumed to be correct. stDocuments and Journal Descriptor Indexing (JDI) scores are the two main elements of STI scores. An stDocuments is a set of one-word Metathesaurus strings associated with an ST. JDI is a sophisticated methodology with consitent with consistent results for categorizing input text according to biomedical specialites, known as JDs. An optimal St document contains words which best represent the ST; the beter the representation, the better the STI result. An new methodology is developed to enhance St documents to achieve better precision of WSD.

StWsd tool provides easy interface for users to find the best sense (ST) of an ambiguous word from given St candidates for a phrase or sentence(s). In shorts, three inputs are required to run StWsd:

It also provides other options, such as use ambiguous sentences if the input is a paragraph, show details, etc.

SetUp

Follow the installation instructions to install text categorization tools and run the sti program. Check on the following items only if you don't use the provided script to install Text Categorization tools.

TestRun

Input

StWsd take text as input:

Output

StWsd calculates the combined STI scores of the input text for both word counts and document counts and sent the higher rank ST from the ST-candidates to output. If detail flag, -d, is used, the results include filtering details of final words, ST scores in following format:

RankST ScoresST abbreviationST name

StWsd Options

Please refer to design document