[CentOS] Unicode in C++

Wed Feb 23 19:17:10 UTC 2011
Cameron Kerr <cameron at humbledown.org>

The same as on any other Linux box.

Some important tips for beginners:

	*  Don't forget to set your locale appropriately at the beginning of your program.

	*  Use ONE encoding CONSISTENTLY (utf-8 or utf-16) inside your program, and trans-code appropriately to/from outer encodings (all such transcoding should happen at the IO edges). If using UTF-16, make sure you standardise on an byte order if you are storing the files. UTF-8 doesn't have that issue. US-ASCII is also UTF-8 (the reverse is not true).

	*  Do not mix data representations. As much as you can, try to stay with either wide-characters (where every character is represented as a single 32-bit codepoint) or multi-byte (eg. UTF-8, UTF-16).

		*  Yes, UTF-16 is also a multi-byte character set.

	*  Learn about Unicode Normalisation: it is important when comparing strings. It is VERY IMPORTANT when comparing strings in a security context.

	*  Software you will want to learn:
		libiconv for transcoding.
		IBM's Components for Unicode (ICU). This is a large suite of commonly needed Unicode algorithms that libc doesn't have.

Hope it helps,
Cameron

On 23/02/2011, at 11:37 AM, Michael D. Berger wrote:

> On my CentOS box, in C++ programs, is there a way to print
> Unicode characters?
> 
> Thanks,
> Mike.
> 
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos