Test Set

I. Download the test set:

  • brat format: Test Set (brat), 90KB
  • text format: Test Set (text), 136 KB
    • 2.7 MB
    • OrgData.224: 224 health-related questions with highest OOV from NER collection asked by consumers
    • GoldStd-NonWord: non-word gold standard
    • GoldStd-RealWord: real-word gold standard

II. Description

The test set used in CSpell was generated by finding consumer health questions with the highest count of OOV (out of vocabulary) terms from the NER (Name Entity Recognition) collections. The SPECIALIST Lexicon 2017 release was used as the dictionary to identify OOVs. The errors were manually annotated by two annotators (Dr. Alan R. Aronson and Sonya E. Shooshan) independently. The disagreements were reconciled by the annotators with arbitration by Dr. Dina Demner-Fushman as needed. This test set is summarized as follows:

  • Summary statistics:
    Consumer health questions224
    Tokens16,707
    Annotation tags1,946
    Instances of non-word corrections974
    Instances of real-word corrections1,178
    Word count per question3 - 337
    Average word count per question72.36
    Error per question0 - 22
    Average error per question4.90
    Error rate (error per token)0.07 (= 1,178/16,707)

III. Distribution of Errors in the Test Set

  • Stats on file size and error tags
    CountMinimumMaximumAverage
    Character231985504.71
    Word333772.36
    Error Tag0224.90

IV. Generation Processes

V. Performance Tests