SPT - Terms Match Design (SubTerm)

I. Introduction
This section describes the method used to find all matched terms in a Trie Tree from a given input. In other words, all sub-terms with synonyms from an input are found.

II. Algorithm

  • Init Vector<String> matchTerms
  • LowerCase the input term to newInTerm
  • Get inWords by tokenizing newInTerm
  • Go through terms from the inWords
    • Get curTerm from startIndex of inWords
    • Find branchMatches
      • Normalize the input term:
        • LowerCase
        • Add " $_END" (the END node)
      • Tokenize normalized term into inWords as a Vector<String>
      • Set the curNode to ROOT node
      • Init Vector branchMatches
      • Go through the inWords
        • Initiate curWordNode by the curWord
        • get curChilds from curNode
        • Check if curChilds has END node
          • Yes => add the branch term to branchMatches
        • Check if curChilds contains curWordNode
          • yes => update curNode
          • no => not match (false), break
    • Add branchMatches to matchTerms

III. Java Classes & Method

  • TrieTreeMatch.java: a Java class for matching in TrieTree
  • public Vector<String> FindMatchTerms(String inTerm)

IV. Examples

  • Synonym Rules:

    K9bull dog
    Dog and catpets
    puppy and kittypets

  • Synonym Terms:

    bull dog
    dog and cat
    puppy and kitty

  • Input Term:
    Who let dog and cat out
    • LowerCase: who let dog and cat out
    • Go through terms from "who let dog and cat out"
      0who let dog and cat out  
      1let dog and cat out  
      2dog and cat out
      • dog
      • dog and cat
      • dog
      • dog and cat
      3and cat out 
      • dog
      • dog and cat
      4cat out
      • cat
      • dog
      • dog and cat
      • cat
      • dog
      • dog and cat
      • cat

  • Output:

    return matched terms | start index | end indexes:

    • dog|2|3
    • dog and cat|2|5
    • cat|4|5

  • Trie Tree