[CentOS-docs] Proposal for UTF8 vs performance tip.

Wojciech Pilorz

wpilorz at gmail.com
Mon Apr 30 23:02:27 UTC 2007


Avoid UTF8 processing if you don't need it and have extra speed

Many often used utilities are much slower with UTF-8 processing.
If you want extra speed and do not need UTF-8 processing, disable it using
export LANG=C
export LC_ALL=C (not needed if LC_ALL was not set)

compare
  time grep -i -c some_string  some_large files
with
LANG=en_US.UTF-8
and same with
LANG=C

On modern CPU grepping like this a 100MB files take some 2 seconds
with UTF8 (Celeron 3GHz) and is about hundred times faster (0.02s)
with LANG=C

Even more spectacular speedup is for
sort -f



More information about the CentOS-docs mailing list