About Lexicon

The SPECIALIST LEXICON was first released in 1994. It is intended to be a general English lexicon that includes many biomedical terms. It contains a range of linguistic knowledge, which includes syntactic categorization, variant forms, and specifications of acronyms and abbreviations. It is the fundamental piece for biomedical natural language processing (NLP) and used in the development of UMLS Metathesaurus.

The SPECIALIST LEXICON consists of unit lexical records. A lexical record can be represented in text format, XML format, and Java object. For example, the record of "medicine" can be represented as:

  • Text Format
  • {base=medicine
    entry=E0039272
    	cat=noun
    	variants=reg
    	variants=uncount
    }
    

  • XML Format
  • <?xml version="1.0" encoding="UTF-8"?>
    <lexRecord>
    	<base>medicine</base>
    	<eui>E0039272</eui>
    	<cat>noun</cat>
    	<inflVars cit="medicine" unInfl="medicine" eui="E0039272" cat="noun" infl="base" type="basic">medicine</inflVars>
    	<inflVars cit="medicine" unInfl="medicine" eui="E0039272" cat="noun" infl="singular" type="basic">medicine</inflVars>
    	<inflVars cit="medicine" unInfl="medicine" eui="E0039272" cat="noun" infl="plural" type="reg">medicines</inflVars>
    	<nounEntry>
    		<variants>reg</variants>
    		<variants>uncount</variants>
    	</nounEntry>
    </lexRecord>
    

  • Java Object
  • lexRecord
    • base:string
    • eui:string
    • cat:string
    • inflVars: vector(inflVars)
    • nounEntry: (nounEntry)
    inflVars
    • inflVar:string
    • cit:string
    • unInfl:string
    • eui:string
    • noun:string
    • infl:string
    • type:string
    nounEntry
    • variants: vector(string)