Normalize Unicode to ASCII
- Short Description: Normalize Unicode characters of the input term to pure ASCII.
- Full Description:
- Difference:
- Features:
- Unicode core norm, recursively perform:
- Map Unicode symbols and punctuation to ASCII
- Map Unicode to ASCII
- Split ligatures
- Strip diacritics
- Get Unicode symbol name if the character is not ASCII
- Symbol: q5
- Examples:
This flow normalizes characters of the input term to pure ASCII. That is to utilize Unicode core norm and then get Unicode symbol names for characters are not ASCII. This flow is equivalent to the combined flow options -f:q7:q3. Please refer to the design documents of Normalize Unicode characters to ASCII for details.
No effect on the -m option. "none" is added at the end of the output.
Utilize the recursive algorithm of Unicode core norm (-f:q7) instead of using combined flows of striping diacritics (-f:q) and splitting ligatures (-f:q2) from previous version.
Normalize Unicode characters of the input term to pure ASCII:
shell> lvg -f:q5 Evolène ©2002 Evolène ©2002|Evolene ![COPYRIGHT SIGN]!2002|2047|16777215|q5|1| Heavenly Bathrobes® Heavenly Bathrobes®|Heavenly Bathrobes![REGISTERED SIGN]!|2047|16777215|q5|1|More examples
- Utilize Unicode core norm
- Get Unicode symbol name if the character is not ASCII