[CentOS-docs] Proposal for UTF8 vs performance tip.
Wojciech Pilorz
wpilorz at gmail.com
Mon Apr 30 23:02:27 UTC 2007
Avoid UTF8 processing if you don't need it and have extra speed
Many often used utilities are much slower with UTF-8 processing.
If you want extra speed and do not need UTF-8 processing, disable it using
export LANG=C
export LC_ALL=C (not needed if LC_ALL was not set)
compare
time grep -i -c some_string some_large files
with
LANG=en_US.UTF-8
and same with
LANG=C
On modern CPU grepping like this a 100MB files take some 2 seconds
with UTF8 (Celeron 3GHz) and is about hundred times faster (0.02s)
with LANG=C
Even more spectacular speedup is for
sort -f
More information about the CentOS-docs
mailing list