perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Hawryluk" <>
Subject UTF-8 fun [was: UTF8 fun with SOAP::Lite and mod_perl 1.3.33]
Date Fri, 16 Mar 2007 22:26:52 GMT
If I instruct the browser to render to UTF-8, The strange characters disappear, but the proper
characters don't show up - instead I get the gap indicative of a non-rendering character or
nothing at all, depending on the browser (IE and FF do different things here - big surprise).

The problem as I see it is that the sytem locale is set to ISO-8859-1 (and mysql should be
using the system locale), apache is set to ISO-8859-1, and yet for some reason UTF-8 (possibly
- not necessarily - just double-byte instead of single-byte) is coming out of mod_perl where
regular cgi is just pumping out (normal) ISO-8859-1.  Switching system locales might have
some effect, so I'll test that on a development machine and see what happens.  Here goes nothing...

-----Original Message-----
From: Chris Jacobson [] 
Sent: March-16-07 3:27 PM
To: Aaron Hawryluk
Cc: 'Drew Wilson'; 'modperl mod_perl'
Subject: Re: UTF8 fun with SOAP::Lite and mod_perl 1.3.33

FWIW, if you tell the client to render the page as UTF-8, your 'broken' 
mod_perl version works correctly.  The content-type header is 
instructing the client to render the page using ISO-8859-1, which will 
result in gremlin characters being rendered.

Aaron Hawryluk wrote:
> This is suspiciously similar to the problem I had with double-byte characters coming
up where single-byte characters were expected.  If you find the answer to this, could you
let me know?  I still can't migrate to mod_perl due to the problem. Mind you I'm on Apache2/mp2
so they could be completely unrelated...
> Here's a sample of what happens:
> Here it is under my old CGI model (which is now far too CPU-intensive):
> And here it is under mod_perl:
> Hey! Mod_perl guys! Can you say "reproducibility"?
> --Aaron Hawryluk
> Webmaster, The Calgary Sun
> Ph: 403-250-4371
> -----Original Message-----
> From: Drew Wilson [] 
> Sent: March-16-07 1:15 PM
> To: modperl mod_perl
> Subject: UTF8 fun with SOAP::Lite and mod_perl 1.3.33
> I'm trying to track down a Unicode malcoding problem using SOAP::Lite  
> 0.67 with mod_perl 1.29 on apache 1.3.33.
> The problem I'm seeing is my UTF8 strings are transformed in the http  
> response.
> The strings look correct inside the perl space (e.g. printing to  
> STDERR inside the perl handler) but the strings are converted in the  
> http packet returned (captured using tcpdump).
> For example, if I want to send back a string containing the Unicode  
> snowman U2603 (UTF8 E2 98 83), I manually encode the string as:
>             my $snowman = '☃';
>             my %result = ( 'snowman' => SOAP::Data->type( string =>  
> $snowman  ) );
> and return it
>             return %result;
> When watching with tcpdump, I expect to see this UTF8 byte sequence:
> 	 e2 98 83
> but instead see
> 	c3 a2 c2 98 c2 83
> I suspect the UTF8 byte sequence is being treated as a UTF 16  
> sequence [00 e2 00 98 00 83], which is then converted to the UTF8  
> equivalent byte sequence [c3 a2 c2 98 c2 83].
> But I cannot figure out WHERE this conversion is being done.
> Is there any way to trace data being written to the response?
> BTW - the $snowman string returns 1 for utf8::is_utf8 and utf8::valid.
> Thanks for any suggestions,
> Drew

Chris Jacobson                         Phone: (513) 665-9070 x310
Online-Rewards                         Fax  : (214) 242-4448
403 Vine Street, Second Floor
Cincinnati, OH 45202

View raw message