perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Charset in response
Date Thu, 29 Nov 2012 09:37:55 GMT
Hi.

I have a problem with a PerlResponseHandler, regarding the character set used in the 
response to a request.
Basically, the question is : how to I set the character set properly for the "handle" used
in
$r->print("string") ?
(where string can be "äéèöü" for example)

Neither of the following (which I do before starting to print output) seems to work :

$r->headers_out->unset('content-type');
$r->headers_out->set('content-type','text/html;charset=xxxx');

or

$r->content_type('text/html;charset=xxxx');

When I say that it doesn't work, I mean in fact :
- the "Content-Type" response header sent by the server is properly set according to what

I do above (as verified in a browser plugin)
- but if what I print contains "accented" characters, they are not being encoded properly

So, do I need to set something else so that the $r->print(string) will output "string"

properly ?


Background :

My PerlResponseHandler reads a html file from disk, replaces some strings into it, and 
sends the result out via $r->print.
The source html file can be encoded in iso-8859-1 or UTF-8, and it contains a proper 
declaration of the charset under which it is really encoded :

<meta http-equiv="content-type" content="text/html; charset=iso-8859-1">
or
<meta http-equiv="content-type" content="text/html; charset=UTF-8">

To read the file, I first open it "raw", read a few lines, checking for the above <meta>

tag.  If found, I note the charset (say in $charset), close the file, and re-open it as

open(my $fh,"<:encoding($charset)", $file);

(note : if $charset is "UTF-8", then the open becomes
open(my $fh,'<:utf8', $file);)

I also at that point set the response charset by one of the means above.

Then I read the file line by line, substituting some strings in the line, and print out 
the line via
$r->print($line);
etc..

My problem is that, if the input file is for example iso-8859-1 and contains the word 
"Männer", the output comes out as "M(A tilde)(some byte)nner" (the bytes corresponding to

the UTF-8 encoding of the "a umlaut").

Can I / should I do something like
binmode($r,":$charset"); # ??

TIA


Mime
View raw message