About CSpell
CSpell is a generic spelling detection and correction tool developed in Java. It is distributed by NLM via an Open Source License agreement. It was originally developed for Consumer Health Question Answering project and thus the consumer health corpus are used for word frequency and context scores (word vectors). Easy configurable options are provided for customizing different data files. The correction features include:
- Non-dictionary based correction
Type Input Text Output Correction Xml/Html handler "germs" "germs" Informal expression handler pls please Leading digit splitter 1.5years 1.5 years Ending digit splitter from2007 from 2007 Leading punctuation splitter volunteers(healthy) volunteers (healthy) Ending digit splitter cancer?if so cancer? if so - Dictionary based correction
- non-word
Type Input Text Output Correction Spelling dianosed diagnosed split knowabout know about merge stiff n ess stiffness - real-word
Type Input Text Output Correction Spelling bowl movement bowel movement split for along time for a long time merge early on set early onset
- non-word
- Combination of above:
- Example-1:
Input Text He was dianosed early on set deminita 3years ago. Output Correction He was diagnosed early onset dementia 3 years ago. Input text Output Correction dianosed diagnosed non-word, spelling on set onset real-word, merge deminita dementia non-word, spelling 3years 3 years non-dictionary, split - Example-2:
Input Text No bowl movement for along time. Output Correction No bowel movement for a long time. Input text Output Correction bowl bowel real-word, spelling along a long real-word, split
- Example-1: