perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Wilson <...@apple.com>
Subject Re: UTF8 fun with SOAP::Lite and mod_perl 1.3.33
Date Sat, 17 Mar 2007 01:45:29 GMT
FWIW - I did try forcing the page encoding, but this didn't turn out  
to be necessary as the XML is already utf-8.

Drew

On Mar 16, 2007, at 2:27 PM, Chris Jacobson wrote:

> FWIW, if you tell the client to render the page as UTF-8, your  
> 'broken' mod_perl version works correctly.  The content-type header  
> is instructing the client to render the page using ISO-8859-1,  
> which will result in gremlin characters being rendered.
>
> Aaron Hawryluk wrote:
>> This is suspiciously similar to the problem I had with double-byte  
>> characters coming up where single-byte characters were expected.   
>> If you find the answer to this, could you let me know?  I still  
>> can't migrate to mod_perl due to the problem. Mind you I'm on  
>> Apache2/mp2 so they could be completely unrelated...
>> Here's a sample of what happens:
>> Here it is under my old CGI model (which is now far too CPU- 
>> intensive):
>> http://www.calgarysun.com/cgi-bin/publish.cgi? 
>> p=171082&x=articles&s=showbiz
>> And here it is under mod_perl:
>> http://www.calgarysun.com/perl-bin/publish.cgi? 
>> p=171082&x=articles&s=showbiz
>> Hey! Mod_perl guys! Can you say "reproducibility"?
>> --Aaron Hawryluk
>> Webmaster, The Calgary Sun
>> http://www.calgarysun.com
>> webmaster@calgarysun.com
>> Ph: 403-250-4371
>> -----Original Message-----
>> From: Drew Wilson [mailto:amw@apple.com] Sent: March-16-07 1:15 PM
>> To: modperl mod_perl
>> Subject: UTF8 fun with SOAP::Lite and mod_perl 1.3.33
>> I'm trying to track down a Unicode malcoding problem using  
>> SOAP::Lite  0.67 with mod_perl 1.29 on apache 1.3.33.
>> The problem I'm seeing is my UTF8 strings are transformed in the  
>> http  response.
>> The strings look correct inside the perl space (e.g. printing to   
>> STDERR inside the perl handler) but the strings are converted in  
>> the  http packet returned (captured using tcpdump).
>> For example, if I want to send back a string containing the  
>> Unicode  snowman U2603 (UTF8 E2 98 83), I manually encode the  
>> string as:
>>             my $snowman = '☃';
>>             my %result = ( 'snowman' => SOAP::Data->type( string  
>> =>  $snowman  ) );
>> and return it
>>             return %result;
>> When watching with tcpdump, I expect to see this UTF8 byte sequence:
>> 	 e2 98 83
>> but instead see
>> 	c3 a2 c2 98 c2 83
>> I suspect the UTF8 byte sequence is being treated as a UTF 16   
>> sequence [00 e2 00 98 00 83], which is then converted to the UTF8   
>> equivalent byte sequence [c3 a2 c2 98 c2 83].
>> But I cannot figure out WHERE this conversion is being done.
>> Is there any way to trace data being written to the response?
>> BTW - the $snowman string returns 1 for utf8::is_utf8 and  
>> utf8::valid.
>> Thanks for any suggestions,
>> Drew
>
> -- 
> ____________________________________________________________________
> Chris Jacobson                         Phone: (513) 665-9070 x310
> Online-Rewards                         Fax  : (214) 242-4448
> 403 Vine Street, Second Floor          http://www.online-rewards.com
> Cincinnati, OH 45202
>


Mime
View raw message