perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Wilson <...@apple.com>
Subject UTF8 fun with SOAP::Lite and mod_perl 1.3.33
Date Fri, 16 Mar 2007 19:14:41 GMT
I'm trying to track down a Unicode malcoding problem using SOAP::Lite  
0.67 with mod_perl 1.29 on apache 1.3.33.

The problem I'm seeing is my UTF8 strings are transformed in the http  
response.

The strings look correct inside the perl space (e.g. printing to  
STDERR inside the perl handler) but the strings are converted in the  
http packet returned (captured using tcpdump).

For example, if I want to send back a string containing the Unicode  
snowman U2603 (UTF8 E2 98 83), I manually encode the string as:
            my $snowman = '☃';
            my %result = ( 'snowman' => SOAP::Data->type( string =>  
$snowman  ) );

and return it
            return %result;

When watching with tcpdump, I expect to see this UTF8 byte sequence:
	 e2 98 83
but instead see
	c3 a2 c2 98 c2 83

I suspect the UTF8 byte sequence is being treated as a UTF 16  
sequence [00 e2 00 98 00 83], which is then converted to the UTF8  
equivalent byte sequence [c3 a2 c2 98 c2 83].

But I cannot figure out WHERE this conversion is being done.

Is there any way to trace data being written to the response?

BTW - the $snowman string returns 1 for utf8::is_utf8 and utf8::valid.

Thanks for any suggestions,

Drew
Mime
View raw message