[CentOS] Unicode related query

Rajagopal Swaminathan raju.rajsand at gmail.com
Wed Feb 3 04:56:09 UTC 2010


I am able to get a english word list in <file> by using the following command

cat <file> | tr -sc A-Za-z '\012'

My question is how to specify unicode character and ASCII.
Specifically text text file containing 3 byte sequence starting with
\x0e in the tr command.

I am able to see the character using:

echo -e '\xe0\xa5\xbf'

What regex incantation would make tr give the results I want?

I am new to unicode.



More information about the CentOS mailing list