Return-Path: Delivered-To: apmail-httpd-users-archive@www.apache.org Received: (qmail 81242 invoked from network); 8 Nov 2006 17:57:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Nov 2006 17:57:58 -0000 Received: (qmail 2680 invoked by uid 500); 8 Nov 2006 17:57:54 -0000 Delivered-To: apmail-httpd-users-archive@httpd.apache.org Received: (qmail 2583 invoked by uid 500); 8 Nov 2006 17:57:53 -0000 Mailing-List: contact users-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: users@httpd.apache.org list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list users@httpd.apache.org Received: (qmail 2572 invoked by uid 99); 8 Nov 2006 17:57:53 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Nov 2006 09:57:53 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of mickg@mickg.net designates 129.49.50.31 as permitted sender) Received: from [129.49.50.31] (HELO dnalab.cc) (129.49.50.31) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Nov 2006 09:57:40 -0800 Received: from mail.mickg.net (mailserver.mickg.net [192.168.17.20]) (authenticated user mickg@dnalab.cc) by dnalab.cc (mail.dnalab.cc [127.0.0.1]) (MDaemon.PRO.v6.8.5.R) with ESMTP id 23-md50000000026.tmp for ; Wed, 08 Nov 2006 12:56:25 -0500 Received: from [192.168.17.121] by mickg.net (MDaemon PRO v9.0.4) with ESMTP id md50000224871.msg for ; Wed, 08 Nov 2006 12:56:23 -0500 Message-ID: <45521A4C.7070502@mickg.net> Date: Wed, 08 Nov 2006 12:56:28 -0500 From: mickg User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: users@httpd.apache.org References: <454FCF9F.2030909@mickg.net> <20061107091554.1b423ac5@grimnir> <45508F26.6010404@mickg.net> <20061107152249.2d588d52@grimnir> <45510D75.60004@mickg.net> <20061107233000.58a7d83c@grimnir> <45512272.8040504@mickg.net> <45516FB7.3080104@mickg.net> <20061108123809.53ec2294@grimnir> In-Reply-To: <20061108123809.53ec2294@grimnir> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Authenticated-Sender: mickg@mickg.net X-HashCash: 1:20:061108:users@httpd.apache.org::8xYrCxd7vmGysX93:0000000000000000000000000000000000000002eTB X-Spam-Processed: mailserver.mickg.net, Wed, 08 Nov 2006 12:56:23 -0500 (not processed: message from valid local sender) X-Authenticated-Sender: mickg@dnalab.cc X-Spam-Processed: mail.dnalab.cc, Wed, 08 Nov 2006 12:56:25 -0500 (not processed: message from valid local sender) X-MDRemoteIP: 192.168.17.20 X-Return-Path: mickg@mickg.net X-MDaemon-Deliver-To: users@httpd.apache.org X-Virus-Checked: Checked by ClamAV on apache.org Subject: Re: [users@httpd] Question about mod_charset_light and mod_proxy_html (Solved!) Nick Kew wrote: > On Wed, 08 Nov 2006 00:48:39 -0500 > mickg wrote: > >> Just to put my money where my mouth is, I have implemented a (stupid) >> prototype that does: If no known charset is native to libxml2 >> detected , a recompiled version of mod_proxy_html now uses iconv >> (eventually via the xmlFindCharEncodingHandler function) to convert >> from the source encoding to UTF-8. > > Interesting. You've gone one up on my aliasing proposal, for > what looks like rather less work than I thought that would take. > I might snarf the basic idea for Version 3. Do you want the full working code once I clean up the memory problem? It is, after all, GPL, so it would be in good spirit for me to release the modified source. :) Although, to be truly honest, what the thing is doing IS somewhat backwards. The dataflow would be such (And I am more familiar with Python code, as the next snippet will show). data comes in if ctxt.encoder==None: obtain charset if need iconv to convert charset: ctxt.encoder=charset return enc=UTF-8 else: return enc proir to processing buf, if ctxt.encoder!=None: convert(buf) convert if encoder is set (non-null). This guarantees that either the data is in known enc to libxml, or was utf8 to begin with, or was converted to utf8, or conversion failed miserably (the miserable failure was logged.) > >> If no encoding info is specified, it assumes windows-1251 (yes, >> stupid, but still). > > But not stupid if we make it a configurable default! > Yeah, preferably via a directive such as HTMLSourceDefaultEnc windows-1251 or some such. > >> It does work on my _own_ website, where it quite happily converts >> win-1251 to utf-8. Once I fix the memory leak (any help appreciated), >> I'll be happy. > > See http://www.apachetutor.org/dev/pools for an easy way to > deal with the memory. > >> And a great many thanks to Nick Kew for getting me off my lazy ... to >> start coding (which, honestly, I am better at than administering >> systems). > > :-) > >> BTW, I still have no clue why I cannot do this with mod_charset_lite. > > Neither am I. But a closer look at mod_charset_lite has been on > my TODO list for so long it's probably on a permanent back-burner. > Did you also look at the full mod_charset? AIUI it was written by > Russian developers, so cyrillic was presumably important to them. > The thing about mod_charset, is that they assume no iconv, and do all internal translation. With translation settings and weird maps, where needed. This seems a bit insane to me, unless needed. I believe the reason was that we had: win1251 read as koi8, transcoded into LATIN1 Now, we need to make sense of *that*. Also, they do not cleanly support utf8 translation (they do not support translation back from utf8). iconv does. Honestly, remaking mod_proxy_html into mod_proxy_charset_convert would be trivial now, IMO. And maybe that's the better idea. Although that does duplicate mod_charset_light, at least I know it'll work. And , it would use libxml2 where possible, not iconv. mickg --------------------------------------------------------------------- The official User-To-User support forum of the Apache HTTP Server Project. See for more info. To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org " from the digest: users-digest-unsubscribe@httpd.apache.org For additional commands, e-mail: users-help@httpd.apache.org