httpd-apreq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Wichitill <ma...@gmx.de>
Subject Re: New charset support breaks existing app charset/utf8 support
Date Mon, 18 Apr 2005 02:19:01 GMT
Joe Schaefer wrote:
> I'm hoping that our stuff just 
> plain-old works without any monkeying around with decoders.

In case you or other readers haven't had to deal much with utf8 flags yet, a 
few examples of what happens when a module sets utf8 flags and the 
application isn't prepared to handle that, because it's used to treat UTF-8 
like an 8-bit encoding (which is a lot simpler, although not perfect):

- Anytime you print a utf8 string, or write it to a file, Perl will warn 
about "Wide character in print at ..." and then convert it to latin1 if 
possible, effectively writing random garbage. Handling this requires setting 
:utf8 layers on handles, which is complicated by having to deal with tied IO 
and PerlIO, as in the case of mod_perl.

- If you pass a utf8 string to an XS module that doesn't handle utf8, 
there's a good chance it will do the same as above or die. Handling this 
requires plenty of ugly utf8::decode/utf8::encode pairs.

- Anytime the utf8-flagged strings are combined with other strings that are 
already in UTF-8 format, but don't have the flag, the unflagged strings are 
wrongly converted from latin1 to UTF-8 by Perl, destroying them. Handling 
this requires taking care of decoding all possible data sources early.

All in all, handling utf8-flagged strings in Perl isn't all that easy, and 
it's not made any simpler by the scattered and partly confusing Perl docs.

Some more points:

- Perl 5.6 may have the beginnings of UTF-8 support built-in, but it's buggy 
and there's no interface to use that functionality, so for all intents and 
purposes it doesn't support UTF-8. I'm not sure if that actually happens 
now, but you really don't want to set the flag under that version.

- Even Perl 5.8.0 - 5.8.2 are too buggy to be safely used with utf8-flagged 
scalars.

- Few XS modules support utf8, and this will probably never change, what 
with many modules, including important ones, being barely or not at all 
maintained. In this enviromnent, I see no point in forcing utf8 flags on users.

- Analyzing the paramters for encoding, when that's not required in many 
cases, seems wasteful from a performance perspective. And performance was 
always an important aspect of apreq.

Mime
View raw message