About Sub-Term Mapping Tools

Sub-Term Mapping Tools (STMT) is a generic tool set that provides comprehensive sub-term related features for NLP applications with Java APIs and command line tools. It is used to find the longest prefix, prefixes, sub-terms, and synonymous sub-term substitutions in query expansion. There are six major components used in STMT:

  • Corpus: the collection of terms on the specific interest of the project. It is stored in a tree structure in STMT with each term as a branch in the tree and each word in the term as a node in the branch.
  • Sub-term: a sub-term is a term that is a subset of another term in the corpus
  • Prefix sub-term: a sub-term starts with the beginning of the input term
  • The longest prefix sub-term: the longest sub-term starts with the beginning of the input term
  • Sub-term patterns: pattern permutations with specified sub-term numbers
  • Synonymous sub-term substitutions: all permutations of new terms by substituting synonymous sub-terms

They are illustrated by the following example:

  • Corpus:
    TermsSynonymous Terms
    chronicchron|long-term|persistent|recurrent periodic|relapsing
    infectiouscommunicable|contagious|infection
    otitis externaauditory|auditory canal|aural|ear
    ......

  • Input term:
    otitis externa, chronic infectious

  • Sub-terms (3):
    • otitis externa
    • chronic
    • infectious

  • Prefix sub-term (1):
    • otitis externa

  • The longest prefix sub-term (1):
    • otitis externa

  • Sub-term Patterns (8):
    • Zero sub-term patterns (1):
      • otitis externa chronic infectious
    • One sub-term patterns (3):
      • otitis externa chronic infectious
      • otitis externa chronic infectious
      • otitis externa chronic infectious
    • Two sub-term patterns (3):
      • otitis externa chronic infectious
      • otitis externa chronic infectious
      • otitis externa chronic infectious
    • Three sub-term patterns (1):
      • otitis externa chronic infectious

  • Synonymous sub-term substitutions (117 = 12 + 47 + 60):
    • Synonymous sub-term substitutions with one sub-term (12 = 4 + 5 + 3):
      • otitis externa chronic infectious (4)
        • auditory chronic infectious
        • auditory canal chronic infectious
        • aural chronic infectious
        • ear chronic infectious
      • otitis externa chronic infectious (5)
        • otitis externa chron infectious
        • otitis externa long-term infectious
        • otitis externa persistent infectious
        • otitis externa relapsing infectious
        • otitis externa recurrent periodic infectious
      • otitis externa chronic infectious (3)
        • otitis externa chronic communicable
        • otitis externa chronic contagious
        • otitis externa chronic infection
    • Synonymous sub-term substitutions with two sub-term patterns (47 = 20 + 12 + 15):
      • otitis externa chronic infectious (20 = 4 x 5)
      • otitis externa chronic infectious (12 = 4 x 3)
      • otitis externa chronic infectious (15 = 5 x 3)
    • Synonymous sub-term substitutions with three sub-term patterns (1):
      • otitis externa chronic infectious (60 = 4 x 5 x 3)