Words from Medline
This page describes the details of adding words (multiword) from MEDLINE to The SPECIALIST Lexicon:
- The SPECIALIST Lexicon
- Approaches
- Element Words Approach (2014-)
- N-gram approaches, 2014+
- Source Corpus
- Software Components
- Multiwords
- Lead-End-Term Model
- Acronyms
- Spelling Variants Model
- N-gram Utilities
- Normalization
- core-term
- Rule Types for all words from n-grams
- Lexicon Words
- Multiwords
- Filters (Exclusive Filters)
- Exclusive filter: examples used in the paper of "using element words to generate multiwords with predictive N-gram"
- Current Enhanced Filters (design & test)
- General Filters:
- Filter: Pipe
- Filter: Punctuation and space
- Filter: Digit
- Filter: Number
- Filter: Digit and Stopword
- Pattern Filters:
- Filter: Pattern of Parenthetic Acronym (ACR)
- Filter: Pattern of Indefinite Article
- Filter: Pattern of Colon - UPPERCASE:
- Filter: Pattern of disallowed punctuation (replaced by disallowed characters after 2020+)
- Filter: Pattern of disallowed characters
- Filter: Pattern of measurement
- Filter: Pattern of incomplete
- Filter: Pattern of disallowed Chars
- Lead-End-Term Filters:
- Filter: Leads with Absolute Invalid-Lead-Terms (ILTs)
- Filter: Ends with Absolute Invalid-End-Terms (IETs)
- Filter: Lead-End-Term Pattern (LETP)
- Filter: Leads with valid-lead-term without SpVar pattern (VLTP)
- Filter: Ends with valid-end-term without SpVar pattern (VETP)
- Filter: Ends with a Parenthetic Abbreviation (not used, replaced by (ACR))
- Project Domain Filters:
- Filter: Lexicon
- Filter: SingleWord
- Filter: Frenquency - Doc Count and Word Count
- The Distilled MEDLINE n-gram set - Apply filters on the MEDLINE n-gram set
- Exclusive filter: examples used in the paper of "using element words to generate multiwords with predictive N-gram"
- Matchers (Inclusive Filters)
- Matcher: Parenthetic Acronym (ACR) Pattern
=> I. Annual LMW candidate list (acronymExp.tag.data.tag.final.tbd.${YEAR}), must generated after the previous year file is done - Matcher: SpVar Pattern
=> III. Also LMW Candidate list, use spVar (DMNS) + CUI + Frequency + Distrilled nGram (tedious process, done in 07/2016).- Matcher: with CUI(s)
=> II. Annual LMW candidate list (distilled nGrams + + coreTerm.Lc + CUI + MW + 33 endWord list + sort, step 30-34/35)- Matcher: EndWord Pattern
- Matcher: with CUI(s)
- Matcher: Parenthetic Acronym (ACR) Pattern
- Analysis
- Lexicon MultiWord Candidate List
- References