Synonym Norm Development
I. Requirements
Use normalization to aggressively map a term to its synonyms by abstracting away from
- g: Genitive
- rs: parenthetical plural forms (s), (es), (ies)
- o: Punctuation
- l: cases
- Ct: spelling variants and inflectional variants
- remove duplicated spaces
- trim
- duplicated results
II. Developments
- Approach 1 (Ct on input term):
- use lvg -f:g:rs:o:l:Ct
- Ct is to get the citation form on the input term
- fast performance
- lower coverage rate (98% of method bellows)
- Example 1:
ID Term norm term synonym substitutions CUI KP102818 CLOTTING FACTOR DEFICIENCY, CONGENITAL - ...
not found
- Approach 2 (Ct on every words of input term):
- Use lvg -f:g:rs:o:l:Ct
- Customize Ct to get the citation form on every words of the input term
- More mutation and results slower performance and high coverage rate
- Example 1:
ID Term norm term synonym substitutions CUI KP102818 CLOTTING FACTOR DEFICIENCY, CONGENITAL clot factor deficiency congenital - coagulation factor deficiency hereditary
- ...
C0272316 - However, still misses some mapping when the citation form has punctuation, such as "carcino-embryonic" is the citation of "carcinoembryonic"
- Example 2:
ID Term norm term synonym substitutions CUI KP194142 Elevated carcinoembryonic antigen elevate carcino-embryonic antigen - increase carcino-embryonic antigen
- increased carcino-embryonic antigen
- high carcino-embryonic antigen
- ...
C0549371
- Approach 3 (Move Ct before removing punctuation):
- Use lvg -f:g:rs:Ct:l:o
- Example 2:
ID Term norm term synonym substitutions CUI KP194142 Elevated carcinoembryonic antigen elevate carcino embryonic antigen - increase carcino embryonic antigen
- increased carcino embryonic antigen
- high carcino embryonic antigen
- ...
C0549371 - elevate cea
C0742014 - Add remove genitive after Ct:
- E0000135|Addison's disease|Addisons disease
- There are no records with CT has (s), (es), (ies), so no need for -f:rs
- Use Database for CUI mapping to improve performance
III. Comparisons
Approach 1 (Ct on term) | Approach 2 (CuiMap) | Approach 3 (Smt) | |
---|---|---|---|
Performance | Fast
| Slow
| Fast
|
Coverage-KP (26890 terms) |
|
|
|
Coverage-VA (21221 terms) |
|
|
|
IV. Notes
In practice, we only normalize key of the synonym pair. This might cause non-symmetric issues. For example:
synonym pair: impaired|abnormality
are stored as follows in the database table:
normalized key | synonym |
---|---|
impair | abnormality |
abnormality | impaired impair|abnormality |
The mapping results in non-symmetric lookup:
- abnormality -> abnormality -> impaired
- impair -> impair -> abnormality (not symmetric)
- impaired -> impair -> abnormality