[CentOS] Fwd: Problem of "sort" utf8 file.

Tue Sep 9 02:23:58 UTC 2008
Peter Cai <newptcai at gmail.com>

localedef -f UTF-8 -i zh_CN zh_CN.UTF-8

Nothing happened.

---------- Forwarded message ----------
From: Peter Cai <newptcai at gmail.com>
Date: Tue, Sep 9, 2008 at 9:42 AM
Subject: Problem of "sort" utf8 file.
To: centos at centos.org


Hi all,

I have 2 linux distro —— ubuntu and centos.

My problem is that the sort command has different behavior when
sorting Chinese string encoded in utf8 file.

On Ubuntu, it is OK.  But on CentOS, it WRONG.

I google this problem and it seems that's because of LC_COLLATE.

So I change "/etc/sysconfig/i18n" on CentOS and now the 2 have the
same LC_** like this:

CentOS:
=============================
[root at localhost ~]# locale
LANG=zh_CN.UTF-8
LC_CTYPE=zh_CN.UTF-8
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE=zh_CN.UTF-8
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

Ubuntu
=============================
peter at ubuntu:~$ locale
LANG=zh_CN.UTF-8
LANGUAGE=zh_CN:zh
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

But, the result is still incorrect on CentOS!  I almost got crazy!!!

PS: the background of this problem is that Postgresql's "order by"
command depends on the sort result of the OS.



-- 
科幻小说可能在哲学上是天真的,在道德上是简单的,在美学上是有些主观的,或粗糙的,但是就它最好的方面而言,它似乎触及了人类集体梦想的神经中枢,解放出我们人类这具机器中深藏的某些幻想。
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.centos.org/pipermail/centos/attachments/20080909/2af43df1/attachment-0005.html>