perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: Charset in response
Date Thu, 29 Nov 2012 15:53:17 GMT
André Warnier wrote:
> Torsten Förtsch wrote:
>> On 11/29/2012 10:37 AM, André Warnier wrote:
>>> When I say that it doesn't work, I mean in fact :
>>> - the "Content-Type" response header sent by the server is properly set
>>> according to what I do above (as verified in a browser plugin)
>>> - but if what I print contains "accented" characters, they are not being
>>> encoded properly
>>>
>>> So, do I need to set something else so that the $r->print(string) will
>>> output "string" properly ?
>>>
>>>
>>> Background :
>>>
>>> My PerlResponseHandler reads a html file from disk, replaces some
>>> strings into it, and sends the result out via $r->print.
>>> The source html file can be encoded in iso-8859-1 or UTF-8, and it
>>> contains a proper declaration of the charset under which it is really
>>> encoded :
>>>
>>> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
>>> or
>>> <meta http-equiv="content-type" content="text/html; charset=UTF-8">
>>>
>>> To read the file, I first open it "raw", read a few lines, checking for
>>> the above <meta> tag.  If found, I note the charset (say in $charset),
>>> close the file, and re-open it as
>>>
>>> open(my $fh,"<:encoding($charset)", $file);
>>>
>>> (note : if $charset is "UTF-8", then the open becomes
>>> open(my $fh,'<:utf8', $file);)
>>
>> So, you convert the octet stream into a character stream when you read
>> the file. You have to do the reverse when you write it.
> 
> I have to, to be able to be consistent in my string-replacement logic.
> 
>>
>>   $r->print(Encode::encode $encoding, $string);
>>
>> Modperl usually uses perlio. So, perl-script handler should be able to
>> push an encoding layer on top of the :Apache2 layer.
>>
>>   binmode STDOUT, ':encoding(...)'
>>
>> But I haven't tried this yet.
>>
>> Now, that I think of it, perhaps even the following would work
>>
>>   open my $fh, '>:Apache2:encoding(...)', $r;
>>   print $fh $string;
>>
>> If it does not work it would be good to make it so.
>>
> 
> I'll try the above and let you know.
> 
> I guess that if I can do
> open my $fh, '>:Apache2:encoding(...)', $r;
> then $r, under the hood, must be some kind of filehandle too.
> And then I could just do
> binmode($r,":encoding($charset)");
> but then, this being mod_perl, it may leave it that way and have 
> unexpected side-effects somewhere else..
> 
> 

Results :

1) using : open my $fh, '>:Apache2:encoding(...)', $r;

(Note: I can't find Apache2::encoding anywhere.  Was that a typo ?)

	$logger->warn("$pfx: reading form using encoding [$enc]") if $Debug>1;
...
	my $response_fh;
	unless (open ($response_fh,">:$enc",$r)) {
		$logger->error("$pfx Cannot open \$r : $?");
		return Apache2::Const::SERVER_ERROR;
	}

brings server error and logs :

[Thu Nov 29 15:48:42 2012] [warn] [client 192.168.245.129] AM::SendForm::response: reading

form using encoding [encoding(iso-8859-1)]
[Thu Nov 29 15:48:42 2012] [error] [client 192.168.245.129] AM::SendForm::response Cannot

open $r : 0

2) using : binmode STDOUT, ':encoding(...)'

	$logger->warn("$pfx: reading form using encoding [$enc]") if $Debug>1;
...
	binmode(STDOUT,":$enc");
...
		$logger->warn(" input line is [$line], utf8 flag : " . (Encode::is_utf8($line) ? "y"
: 
"n"));
...
	$r->print($line);
...

does not bring server error and outputs the page, but apparently has no effect (characters

are still wrong) :

[Thu Nov 29 15:55:52 2012] [warn] [client 192.168.245.129]  input line is [\t\t\t\t<input

name="ANSPR" type="radio" value="m" id="ANSPR">&nbsp;m\xc3\xa4nnlich\n], utf8 flag
: y

(in the response also)

3) same as (2), but using simple "print $line;" instead of "$r->print($line);"

That is very bizarre.  It runs through the code for many lines.  It still prints the one 
"Männlich" line wrong (in the log and in the html output as well):
client 192.168.245.129]  input line is [\t\t\t\t<input name="ANSPR" type="radio" value="m"

id="ANSPR">&nbsp;m\xc3\xa4nnlich\n], utf8 flag : y

but now in addition, it crashes a few lines further with a server error and this in the log
:

[Thu Nov 29 16:01:45 2012] [warn] [client 192.168.245.129]  input line is 
[<tr><td>&nbsp;</td></tr>\n], utf8 flag : y
[Thu Nov 29 16:01:45 2012] [error] [client 192.168.245.129] "\\x{4bae}" does not map to 
iso-8859-1 at /usr/local/lib/apache2/perllib/AM/SendForm.pm line 203, <$form_fh> line
101.\n

The line 101 of the input form is as shown in the log just before the error :
<tr><td>&nbsp;</td></tr>

and the next line is a simple
<tr>

I have examined the form with a UTF-8 capable editor, and I see no extra bizarre 
characters anywhere near. I have no idea where this ""\\x{4bae}" could be coming from.

4) trying : $r->print(Encode::encode $encoding, $string);

as : $r->print(Encode::encode($charset,$line));

Bingo !

It still prints in the log :
[Thu Nov 29 16:21:42 2012] [warn] [client 192.168.245.129]  input line is [\t\t\t\t<input

name="ANSPR" type="radio" value="m" id="ANSPR">&nbsp;m\xc3\xa4nnlich\n], utf8 flag
: y

But it outputs it correctly in the response document sent to the browser :
				<input name="ANSPR" type="radio" value="m" id="ANSPR">&nbsp;männlich

and it also doesn't choke on the line on which it choked before :
[Thu Nov 29 16:21:42 2012] [warn] [client 192.168.245.129]  input line is 
[<tr><td>&nbsp;</td></tr>\n], utf8 flag : y
[Thu Nov 29 16:21:42 2012] [warn] [client 192.168.245.129]  input line is [<tr>\n],
utf8 
flag : y

This works, but does not seem to be very efficient. It makes an additional call to a 
function at each output line.
I don't know though how this compares to when it's perlio who encodes the output.

Mime
View raw message