LVG Transformations (Flow Components)

In lvg, individual transformations are represented by flow components which are collected into flows. A flow may have one or more components. Each flow is serial application of each of its components; the output of each component in the flow is the input to the next component. Lvg allows single flows or multiple parallel flows.

In the new Java version of LVG, a new implementation design is used for transformation, LexItem, Category, Inflection, etc.. The implemented flow components are shown as in the table below. The Java Class Usage of flow components is also provided.

Flag in C Flag in JavaFeature Description
0 0 Strip NEC and NOS
a a Return known acronym expansions
A A Return known acronyms
  An Return possible mapping terms (approximate match) in Lexicon
b b Uninflect a term
B B Uninflect words in a term
  Bn Normalized uninflect words in a term
c c Tokenize a term into "words"
c:a ca Tokenize, keep everything
c:h ch Tokenize without breaking hyphens
C C Canonicalize
  Ct Retrieve the lexical name (base=, BAS) form
d d Generate derivational variants
d:N dc~LONG Generate derivational variants with specifying output categories
e e Generate known uninflected from spelling variants
E E Retrieve the unique EUI for a term
f f Filter output to contain only forms from lexicon
f:a fa Filter out acronyms and abbreviations from the output
f:p fp Filter out proper nouns from the output
g g Remove genitive
G G Generate all fruitful variants
  Ge Generate fruitful variants, enhanced
  Gn Generate known fruitful variants
i i Generate inflectional variants
i:N:N ici~LONG+LONG Generate inflectional variants with specifying output categories and output inflections
  is Generate inflectional variants with simple inflections
l l Lowercase
L L Retrieve category and inflection for a term
L:n Ln Retrieve category and inflection from lexicon
L:p Lp Retrieve category and inflection for all terms that begin with the given word
m m Generate the Metaphone spelling normalized form
n n No operation
  nom Retrieve nominalizations
N
N:2
N Normalize the input text in a non-canonical way (Norm)
N:3 N3 LuiNorm (canonical way normalization)
o o Replace punctuation with spaces
p p Strip punctuation
P P Strip punctuation, enhanced
q q Strip diacritics
  q0 Map Symbols & Punctuation to ASCII
  q1 Map Unicode to ASCII
  q2 Split Ligatures
  q3 Get Unicode names
  q4 Get Unicode base synonym
  q5 Normalize Unicode to ASCII
  q6 Normalize Unicode to ASCII with synonym Option
  q7 Unicode Core Norm
  q8 Strip or Map Unicode to ASCII
r r Generate synonyms, recursively
  rs Remove plural patterns of (s), (es), and (ies)
R R Generate derivational variants, recursively
s s Generate known spelling variants
S S Syntactic uninvert
  Si Map inflections into simple inflections
t t Strip stop words
T T Strip ambiguity tags
u u Uninvert the input phrase around commas
U U Convert the output of the Xerox Parc stocastic tagger into lvg style pipe delimited format
v v Retrieve fruitful variants from precomputed data
w w Sort words by order
wsNws~INT Filter words by specified word size
y y Generate synonyms