STRI: Text
- Description:
- word frequency count
- document count for word
- Inputs:
- a phrase, such as the combination of title and abstract
- a file, such as 9801.2004.TIAB.in
- Algorithm:
- Pre-Process (Input Filter):
- Tokenize all words of the input term
- Apply Word Extraction Filter
- Apply acronym filter (TBD)
- Filter out not legal words
- Filter out duplicated words if unique flag is true
- Assign the final words for processing
- Process:
- Get JDI scores for the input text jdi
- Calculate Vector similarity (cosine coefficient) on JDI scores (from above, word-JD) and ST-Jd scores.
- Post-process (Output Filter):
- Print out input text (term)
- Detail output filter
- Score entries display number
- No output message
- Cluster option
- ST candidates
- Use alphabetical order for Sts have same score
- Sample commands:
Read in the input text and perform ST real-time indexing based on
> stri -p => index a text from standard input with prompt > stri -i:9801.2004.TIAB.in -o:9801.2004.TIAB.out => index text from file, 9801.2004.TIAB.in, and send the results to a file, 9801.2004.TIAB.out
- a file, such as 9801.2004.TIAB.out