Return-Path: X-Original-To: apmail-perl-modperl-archive@www.apache.org Delivered-To: apmail-perl-modperl-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B7D5BD340 for ; Thu, 29 Nov 2012 12:19:53 +0000 (UTC) Received: (qmail 18845 invoked by uid 500); 29 Nov 2012 12:19:52 -0000 Delivered-To: apmail-perl-modperl-archive@perl.apache.org Received: (qmail 18501 invoked by uid 500); 29 Nov 2012 12:19:49 -0000 Mailing-List: contact modperl-help@perl.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list modperl@perl.apache.org Received: (qmail 18459 invoked by uid 99); 29 Nov 2012 12:19:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2012 12:19:47 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of aw@ice-sa.com designates 212.85.38.228 as permitted sender) Received: from [212.85.38.228] (HELO tor.combios.es) (212.85.38.228) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2012 12:19:39 +0000 Received: from [192.168.245.129] (p549E86B6.dip0.t-ipconnect.de [84.158.134.182]) (Authenticated sender: andre.warnier@ice-sa.com) by tor.combios.es (Postfix) with ESMTPA id CB2B13C0BA8 for ; Thu, 29 Nov 2012 13:20:01 +0100 (CET) Message-ID: <50B752C2.4070309@ice-sa.com> Date: Thu, 29 Nov 2012 13:19:14 +0100 From: =?UTF-8?B?QW5kcsOpIFdhcm5pZXI=?= Reply-To: mod_perl list User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) MIME-Version: 1.0 To: mod_perl list Subject: Re: Charset in response References: <50B72CF3.6030704@ice-sa.com> In-Reply-To: <50B72CF3.6030704@ice-sa.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Addendum at end. André Warnier wrote: > Hi. > > I have a problem with a PerlResponseHandler, regarding the character set > used in the response to a request. > Basically, the question is : how to I set the character set properly for > the "handle" used in > $r->print("string") ? > (where string can be "äéèöü" for example) > > Neither of the following (which I do before starting to print output) > seems to work : > > $r->headers_out->unset('content-type'); > $r->headers_out->set('content-type','text/html;charset=xxxx'); > > or > > $r->content_type('text/html;charset=xxxx'); > > When I say that it doesn't work, I mean in fact : > - the "Content-Type" response header sent by the server is properly set > according to what I do above (as verified in a browser plugin) > - but if what I print contains "accented" characters, they are not being > encoded properly > > So, do I need to set something else so that the $r->print(string) will > output "string" properly ? > > > Background : > > My PerlResponseHandler reads a html file from disk, replaces some > strings into it, and sends the result out via $r->print. > The source html file can be encoded in iso-8859-1 or UTF-8, and it > contains a proper declaration of the charset under which it is really > encoded : > > > or > > > To read the file, I first open it "raw", read a few lines, checking for > the above tag. If found, I note the charset (say in $charset), > close the file, and re-open it as > > open(my $fh,"<:encoding($charset)", $file); > > (note : if $charset is "UTF-8", then the open becomes > open(my $fh,'<:utf8', $file);) > > I also at that point set the response charset by one of the means above. > > Then I read the file line by line, substituting some strings in the > line, and print out the line via > $r->print($line); > etc.. > > My problem is that, if the input file is for example iso-8859-1 and > contains the word "Männer", the output comes out as "M(A tilde)(some > byte)nner" (the bytes corresponding to the UTF-8 encoding of the "a > umlaut"). > > Can I / should I do something like > binmode($r,":$charset"); # ?? > > TIA > > Addendum : I added some logging to the ResponseHandler as follows : PARAM: while (defined($line = <$form_fh>)) { if ($Debug > 1) { $r->log->warn(" input line is [$line], utf8 flag : " . (Encode::is_utf8($line) ? "y" : "n")); } The corresponding line in the log, for a line containing the word "männlich", is : [Thu Nov 29 10:34:37 2012] [warn] [client 192.168.245.129] input line is [\t\t\t\t m\xc3\xa4nnlich\n], utf8 flag : y Of course, as is usual in the type of case, one never knows how the logfile itself is written.. But it does confirm that, as read in the Handler, the string is properly encoded internally in perl, with the utf8 flag set. However, when I look in the result as received by the browser, - the browser says that the page received is encoded as iso-8859-1 - the browser's "view page source" confirms that this character is (incorrectly) represented by 2 bytes :  männlich