Tagging Derivational Pairs

The tagging process on validating derivational pairs requires derivational analysis by experts with linguistic knowledge. The derivational analysis could be very complicated when more than one affix involved. For example, multioptional|adj could be derived from “optional|adj” with the prefix “multi” or from “multioption|noun” with suffix “al”. In such case, we have to determine the order of derivation. In other words, was prefix "multi" added to the base before or after the derivational suffix "al". The order can be determined if the prefix+base an independent noun or not:

  • multioptional|adj|E0622933|multioption|noun|E0726104|yes
  • multioptional|adj|E0622933|optional|adj|E0044055|no
    => noun exist
  • multienzmic|adj|E0608361|enzymic|adj|E0442501|yes
    => no noun exist
  • multiphyletic|adj|E0574510|phyletic|adj|E0047700|yes
    => no noun exist

It could get even more complicated when more (multiple) affixes (prefixes and/or suffixes) are involved, the order of derivation must be determined. Linguists need to look up the usage of related terms, peel off the derivational affixes and look at the result. If the result is not a valid word, then that particular result does not affect the yes-no tagging on the derivational pair at hand. When the result is a valid word, we have to determine the order in which the prefix and/or suffixes were added to the base. For example, “pseudo-hyper-para-thyroid-ism” has three prefixes and one suffix is tagged as follows:

  • pseudohyperparathyroidism|noun|E0233853|hyperparathyroidism|noun|E0032763|no
  • pseudohyperparathyroidism|noun|E0233853|pseudohyperparathyroid|noun|TBD|yes
    => "pseudohyperparathyroid" is used in terms as "pseudohyperparathyroid syndrome"

In such cases, the usage of all related words needs to be checked and the order of derivation must be determined for an accurate derivational analysis and tagging. This process is tedious, time consuming, and labor intensive. Over the years, derivational Facts and Rules in Lvg do not grow proportionally with the growth of Lexicon because of this difficulty and limited resources.

Also, there is not in every case 1 yes-or-no tag that will be correct forever because the languages are always changing. Their lexicons are perhaps the most changeable component of languages, new terms are being coined all the time. Something that starts out a nonce usage, invented on the spot to describe some new situation, can become accepted usage. If enough other speakers of that language either happen to hear the nonce usage, and like it to start using it themselves, or if the same nonce usage happens to be coined by other speakers (or writers), then a consensus begins to develop, that this new term should exist. An example of this would be the verb google, which rapidly gained solid usage. In the biomedical domain, a fair number of the new terms will involve the prefixes we're looking at now. The yes-or-no tag is reasonably solid and not prone to change, when there is only 1 derivational affix involved.