About CSpell

CSpell is a generic spelling detection and correction tool developed in Java. It is distributed by NLM via an Open Source License agreement. It was originally developed for Consumer Health Question Answering project and thus the consumer health corpus are used for word frequency and context scores (word vectors). Easy configurable options are provided for customizing different data files. The correction features include:

  • Non-dictionary based correction

    TypeInput TextOutput Correction
    Xml/Html handler"germs""germs"
    Informal expression handlerplsplease
    Leading digit splitter1.5years1.5 years
    Ending digit splitterfrom2007from 2007
    Leading punctuation splittervolunteers(healthy)volunteers (healthy)
    Ending digit splittercancer?if socancer? if so

  • Dictionary based correction
    • non-word

      TypeInput TextOutput Correction
      Spellingdianoseddiagnosed
      splitknowaboutknow about
      mergestiff n essstiffness
    • real-word

      TypeInput TextOutput Correction
      Spellingbowl movementbowel movement
      splitfor along timefor a long time
      mergeearly on setearly onset

  • Combination of above:
    • Example-1:
      Input TextHe was dianosed early on set deminita 3years ago.
      Output CorrectionHe was diagnosed early onset dementia 3 years ago.

      Input textOutput Correction
      dianoseddiagnosednon-word, spelling
      on setonsetreal-word, merge
      deminitadementianon-word, spelling
      3years3 yearsnon-dictionary, split

    • Example-2:
      Input TextNo bowl movement for along time.
      Output CorrectionNo bowel movement for a long time.

      Input textOutput Correction
      bowlbowelreal-word, spelling
      alonga longreal-word, split