httpd-apreq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Zentner <...@2bz.de>
Subject Re: Apache::Request, APR::Table and UTF8
Date Tue, 05 Oct 2004 22:40:40 GMT

Hi,

Am 05.10.2004 um 18:34 schrieb David Wheeler:

> On Oct 5, 2004, at 9:20 AM, Joe Schaefer wrote:
>
>> We could use three-bit field for marking the charset:
>>
>>   0 - unknown
>>   1 - ASCII
>>   2 - UTF-8
>>   3 - UTF-16
>>   [ room for 4 more iso? charsets ]
>
> Note that data encoded in UTF-8 is not the same as decoded to Perl's 
> internal utf8 format. The latter has the same bytes, but the "utf8" 
> flag has been set on the variable so that Perl knows how to properly 
> count characters, among other things. I suspect that it is the loss of 
> the setting of this flag on the string variable that Boris is 
> reporting.
>

What I like to see is

0 - unknown
1 - ASCII
2 - UTF-8
3 - UTF-16
...
7 - perlutf8

The perlutf8 flag is somewhat unrelated to the charsets above, but this 
allows us to transform from perl to charsets and gets the issue with 
perls lost bit resloved. perls internal utf8 strings can differ from 
utf8 even if I never noticed the difference.


> Regards,
>
> David
>
>
--
Boris


Mime
View raw message