Category

Category

A syntactic category is a part-of-speech (noun, verb, adjective, etc). Word forms can have more than one category. e.g. "square" can be an noun, a verb, an adjective or an adverb. The categories a word form can have are represented in the lexical tools as a bit vector. Each bit represents the presence or absence of a category. In the Java implementation, a Category class is an extension of the BitMaskBase class.

  • Category variants
    Category variants are described in the following table:

    Bit Value Variant Other Symbols Example Possible Inflections
    01 adj
  • adjective
  • ADJ
  • red
  • redder
  • reddest
  • red
  • base (1)
  • comparative (2)
  • superlative (4)
  • positive (256)
  • 12 adv
  • adverb
  • ADV
  • fast
  • faster
  • fastest
  • fast
  • base (1)
  • comparative (2)
  • superlative (4)
  • positive (256)
  • 24 aux
  • auxiliary
  • be
  • being
  • did
  • been
  • is
  • be
  • do
  • didn't
  • don't
  • am
  • weren't
  • were
  • wasn't
  • are
  • aren't
  • was
  • isn't
  • base (1)
  • presPart (16)
  • past (32)
  • pastPart (64)
  • pres3s (128)
  • infinitive (1024)
  • pres123p (2048)
  • pastNeg (4096)
  • pres123pNeg (8192)
  • pres1s (16384)
  • past1p23pNeg (32768)
  • past1p23p (65536)
  • past1s3sNeg (131072)
  • pres1p23p (262144)
  • pres1p23pNeg (524288)
  • past1s3s (1048576)
  • pres3sNeg (4194304)
  • 38 compl
  • complementizer
  • that
  • base (1)
  • 416 conj
  • conjunction
  • CON
  • con
  • and
  • or
  • but
  • base (1)
  • 532 det
  • determiner
  • DET
  • a
  • the
  • some
  • each
  • base (1)
  • 664 modal .
  • dare
  • may
  • must
  • ought
  • shall
  • will
  • can
  • could
  • couldn't
  • can
  • can't
  • base (1)
  • past (32)
  • pastNeg (4096)
  • pres (2097152)
  • presNeg (8388608)
  • 7128 noun
  • NOM
  • NPR
  • dog
  • gods
  • dog
  • base (1)
  • plural (8)
  • singular (512)
  • 8256 prep
  • preposition
  • PRE
  • pre
  • to
  • on
  • in
  • at
  • by
  • base (1)
  • 9512 pron
  • pronoun
  • it
  • he
  • they
  • base (1)
  • 101024 verb
  • VER
  • ver
  • break
  • breaking
  • broke
  • broken
  • breaks
  • break
  • break
  • base (1)
  • presPart (16)
  • past (32)
  • pastPart (64)
  • pres3s (128)
  • infinitive (1024)
  • pres123p (2048)
  • Combination of multiple categories
    As described in the BitMaskBase page, in addition to use a value to represent a single category, it can be represented a combination of multiple categories.

    For examples, "saw" is a noun (128) and a verb (1024). It can be represented as a value of 1152 (= 128 + 1124). This is useful when the -CR:o (combine records by outputs) options is used. This value can be viewed as names <noun + verb> by using -SC (show category) options. In some Lexical tools flow operations, there is no information about the category, such as lower case (-f:l). In such case, a value of 2047 is used to represent all categories <all> because:
    2047 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 128 + 256 + 512 + 1024