[CentOS] Unicode related query

Joseph L. Casale jcasale at activenetwerx.com
Wed Feb 3 05:33:15 UTC 2010


>I am able to get a english word list in <file> by using the following command
>
>cat <file> | tr -sc A-Za-z '\012'
>
>My question is how to specify unicode character and ASCII.
>Specifically text text file containing 3 byte sequence starting with
>\x0e in the tr command.
>
>I am able to see the character using:
>
>echo -e '\xe0\xa5\xbf'
>
>What regex incantation would make tr give the results I want?
>
>I am new to unicode.

You don't say much as to what bounds the words, spaces? Give more info, but
http://www.regular-expressions.info/unicode.html leads to some Perl solutions.



More information about the CentOS mailing list