Derivation Type

I. Introduction

There are three main types of derivations and these information are now available in Lexical Tools. The information of affix is also provided if it is a prefixD.

dTypedTypeStr
TagAffix
prefixDPprefix
suffixDSNone
zeroDZNone

In addtion, four secondary types of derivations are used during the type assignement process. They inlucde dPairs caused by spelling variantsa. These are valid dPairs. However, they are exculded from the derivational DB table due to the extra step (SpVars) in relationship.

dTypedTypeStr
TagAffix
prefixD by SpVarsPSprefix
suffixD by SpVarsSSNone
zeroD by SpVarsZSNone
unknownUNone

II. Java APIs

${DERIVATION_DIR}/5.allD/sources/DType.java

Required files:

  • ${DERIVATION_DIR}/5.allD/data/${YEAR}/dataOrg/LRSPL
    => For finding dPairs caused by SpVars
  • ${DERIVATION_DIR}/5.allD/data/${YEAR}/dataOrg/dTypeStr.data
    => For finding dType for dPairs can't be determined by alorithm

III. Algorithm & Example

d1|cat1|EUI1|d2|cat2|EUI2
Must be in the same order for the following algorithm to work:

  • zeroD (Z)
    • Same base form: d1.equals(d2)
      • flex|noun||flex|verb|E0026587|Z|None
      • excuse|noun|E0026586|excuse|verb|E0026587|Z|None

  • prefixD (P)
    • Same ending characters (lastIndexOf + length1 = legnth2)



      unzip|verb|zip|verb|P|un

  • zeroD by SpVars (ZS)
    • Any SpVArs of d1 and d2 are zeroD
      • ignore case:
      • hyphenation:
        hand-search|noun|E0527651|handsearch|verb|E0527650|ZS|None
        first-aid|adj|first aid|noun|Z|None
        low-fat|adj|low fat|noun|Z|None
      • space:
        hand search|verb|E0527650|handsearch|noun|E0527651|ZS|None
      • Other spelling variants
        check all spelling variants if equals
        endeavor|verb|E0025158|endeavour|noun|E0025157|ZS|None
        cesarian|adj|cesarean|noun|Z|None
        partisan|adj|partizan|noun|Z|None

    • prefixD (PS)
      • Any SpVars of d1 and d2 are prefixD

    • suffixD (S)
      • Same starting characters >= 2



        unzip|verb|zip|verb|P|un

        treatment|noun|treat|noun|S|None
      • same starting charaters if length of d1 and d2 are the same
        remove the last three characters and check if they are the same
        => sell|verb|sale|noun|S|None

    • suffixD (SS)
      • Any SpVars of d1 and d2 are suffixD

    • mapping dType from dTypeStr.data (P|Z|S|PS|ZS|SS)
    • unknown (U)
      • Anything else can't find from above steps
        These dPairs need to be reviewed manually:
        • Invalid dPairs
          => Add to 1.nomD/nomD.tagNo.txt if it is from nomD
        • valid dPairs:
          => Add to 5.allD/dTypeStr.data with dType|affix