Exclusive Filter: A Term is all Digit
- Description:
If a term contains nothing but digits, punctuation, and space, it is not a valid multiword. Two normalization (strip punctuation and strip space) are performed in this filter. - Examples:
- "3 + 1"
- $1,500
- 192.168.1.1
- (+/- 0.05)
- (+15%),
- [0-5]
- [192, 168]
- Input Term: core-term.lc
- Filter Algorithm:
- Logics:
Description FilterType Notes Get words from inTerm FT_TBD Norm: strip punctuation and space FT_TBD Check if all digit FT_DIGIT - filtered invalid terms - all digit after strip punctuation and space
- source code: ExFilterDigit.java
- FilterType:
FilterType.FT_DIGIT
- Logics:
- Accuracy Test on Lexicon:
Lexicon Filter Sample No Pass No Trap No Exp No Pass-Rate 2014 FT_DIGIT 875090 875089 1 0 99.9999% 2015 FT_DIGIT 896213 896212 1 0 99.9999% There is a valid word "20/20" in the Lexicon, which is trapped by this filter.