perl-asp mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thanos Chatziathanassiou <tcha...@arx.gr>
Subject Re: Output character encoding
Date Wed, 06 Jun 2012 12:44:45 GMT
Apologies Arnon, I got your original message with the problem
description after I had sent mine...

> 
> To explain where there is some magic at play:
> 
> Apache::ASP::Response does a "use bytes" which is to deal with the
> output stream correctly I believe this is around content length
> calculations.  I think this is fine here, and turning this off makes
> things worse for these examples.
> 
> Apache::ASP::Response is more importantly tied as a file handle when
> this code is run:
> 
>         tie *RESPONSE, 'Apache::ASP::Response', $self->{Response};
>         select(RESPONSE);
> 
> This is to allow for print to go to $Response->PRINT which aliases to
> $Response->Write. Fundamentally all output is going through
> $Response->Write at the end of the day including the script static
> content itself.
> 
> What I have found is that this will output the correct bytes in this
> Apache::ASP script:
> 
> <% print STDOUT Encode::decode('ISO-8859-1',"\xE2"); %>
> 
> as it bypasses the tied file handle layer to $Response, so we know perl
> is working at this point!
> 
> but doing this is where we have a problem:
> 
> <% print Encode::decode('ISO-8859-1',"\xE2"); %>
> 
> and immediately in the Apache::ASP::Response::Write() method the data
> has already been converted incorrectly without any processing
> occurring.  Its as if by merely going through the tied interface that
> data goes through some conversion process.  I have played with various
> IO settings as in "open ..." and various "use" pragmas to no avail but
> really shooting blind here on what could not be working.
> 
> So the way I see it..
> 

That rang a bell for me:
Read the section ``The UTF8 flag'' in Encode to see the problem.
${$Response->{out}} contains a copy of the stuff you're sending to
$Response->Write(), AKA $Response->WriteRef() but without copying the
utf-8 flag.
You can make the example work by simply turning the utf8 flag
unconditionally on via ``Encode::_utf8_on(${$Response->{out}});''
after the print statements in Latin-1.rasp.
Of course, your data should either ALL have the utf8 flag on (eg via
Encode::decode) or ALL have it off, because ${$Response->{out}} can
either have it on or off but obviously not both.

> Encode and perltie seem to have some conflicting bits here.
> 
> If there were some workaround here I would be glad to hear it but I seem
> to have exhausted my ability to troubleshoot this.

I'm not sure there is a generic solution, except perhaps mess around
with ``is_utf8($$dataref)'' before appending it to $Response->{out}  and
make sure that the same kind of data is appended (either ON or OFF) to
$Response->{out}.
See below for why this is a problem

> 
>> # Latin-1.rasp: #############
>>
>> <%
>> #use open ( ":utf8", ":std" );
>> #binmode ( STDOUT, ":encoding(ISO-8859-1)" );
>>
>> $::Response->{Charset} = "ISO-8859-1";
>>
>> use Encode;
>>
>> print Encode::decode('ISO-8859-1',"\xE2"),
>> Encode::decode('UTF-8',Encode::encode('UTF-8',"\xE2")),

#these will now work if
#Encode::_utf8_on(${$Response->{out}});
#is set because they have the flag themselves

>> "\x{00E2}",
>> chr(0x00E2);

#these, on the other hand will not
#
#the opposite holds true for
#Encode::_utf8_off(${$Response->{out}});
#of course

>> %>

I'm sure we can design a ``proper'' solution but not without some
user-configurable settings and a bit of ugly code.

Best Regards,
Thanos Chatziathanassiou



---------------------------------------------------------------------
To unsubscribe, e-mail: asp-unsubscribe@perl.apache.org
For additional commands, e-mail: asp-help@perl.apache.org


Mime
View raw message