ofbiz-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David E Jones <jone...@undersunconsulting.com>
Subject Re: why is mysql default character set latin1?
Date Mon, 09 Oct 2006 17:34:30 GMT

On Oct 9, 2006, at 4:52 PM, Kurt T Stam wrote:

> UTF-8 can consume up to 6 bytes per character. UCS2 is strictly 2  
> bytes.
> However most people prefer the 'backwards compatible' utf-8 where  
> ASCII
> range characters still only consume 1 byte, so it should NOT overflow
> using ASCII, but it might using Asian characters. BTW, on average it
> takes 3 bytes per character for Asian characters, so a rule of  
> thumb is
> to increase your string lengths by 3 when doing i18n.
>
> Any db will have this 'problem'..

To some extent this is true, but it seems that many other databases  
"hide" this internally by treating field sizes as the total number of  
characters instead of the total number of bytes. In other words, if  
you are using a multi-byte character set like UTF-8 and it wants to  
reserve 3 bytes per character and you say your column should be 255  
characters, then internally it will make that 765 bytes to cover  
those 255 characters you wanted in your column size.

In the 4 series MySQL didn't do this, hence the latin character set  
default. I don't know if this has changed in the 5 series, but it  
sure would be nice!

-David


Mime
View raw message