Exclusive Filter: A Term is all Digit

  • Description:
    If a term contains nothing but digits, punctuation, and space, it is not a valid multiword. Two normalization (strip punctuation and strip space) are performed in this filter.

  • Examples:
    • "3 + 1"
    • $1,500
    • 192.168.1.1
    • (+/- 0.05)
    • (+15%),
    • [0-5]
    • [192, 168]

  • Input Term: core-term.lc
  • Filter Algorithm:
    • Logics:

      DescriptionFilterTypeNotes
      Get words from inTermFT_TBD
      Norm: strip punctuation and spaceFT_TBD
      Check if all digitFT_DIGIT
      • filtered invalid terms - all digit after strip punctuation and space

    • source code: ExFilterDigit.java
    • FilterType: FilterType.FT_DIGIT

  • Accuracy Test on Lexicon:

    LexiconFilterSample NoPass NoTrap NoExp NoPass-Rate
    2014FT_DIGIT875090875089 1 099.9999%
    2015FT_DIGIT896213896212 1 099.9999%

    There is a valid word "20/20" in the Lexicon, which is trapped by this filter.