Return-Path: Delivered-To: apmail-httpd-modules-dev-archive@locus.apache.org Received: (qmail 24100 invoked from network); 7 Feb 2008 13:06:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Feb 2008 13:06:10 -0000 Received: (qmail 42773 invoked by uid 500); 7 Feb 2008 13:06:02 -0000 Delivered-To: apmail-httpd-modules-dev-archive@httpd.apache.org Received: (qmail 42460 invoked by uid 500); 7 Feb 2008 13:06:02 -0000 Mailing-List: contact modules-dev-help@httpd.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: modules-dev@httpd.apache.org Delivered-To: mailing list modules-dev@httpd.apache.org Received: (qmail 42451 invoked by uid 99); 7 Feb 2008 13:06:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 05:06:01 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [80.229.52.226] (HELO grimnir.webthing.com) (80.229.52.226) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 13:05:30 +0000 Received: from grimnir.webthing.com (localhost [127.0.0.1]) by grimnir.webthing.com (Postfix) with ESMTP id 3042B2137 for ; Thu, 7 Feb 2008 13:05:36 +0000 (GMT) Date: Thu, 7 Feb 2008 13:05:35 +0000 From: Nick Kew To: modules-dev@httpd.apache.org Subject: ANN: mod_xml2enc: improved i18n for markup filters. Message-ID: <20080207130535.2daa2ea5@grimnir> Organization: WebThing X-Mailer: Sylpheed-Claws 2.5.0-rc3 (GTK+ 2.10.6; i486-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I'm happy to announce that mod_xml2enc is now ready for use. mod_xml2enc is designed to be used with libxml2-based filter modules, such as: mod_accessibility mod_proxy_html mod_publisher mod_transform mod_xml2 mod_xslt and serves to improve their internationalisation support: (1) It sniffs the encoding of incoming documents, using HTTP headers where available, or XML or HTML rules where there is no HTTP information. (2) If a character set is not supported by libxml2, it converts to UTF-8 ahead of the markup filter. (3) It removes any encoding information that is invalidated by the processing, and substitutes a correct HTTP header. To take advantage of this, filter modules should use the xml2enc_charset optional function to retrieve the charset argument to pass to the libxml2 parser. Note that you may have to handle APR_EAGAIN, if your module sets up the parser before mod_xml2enc has been able to sniff the first data. I'll be updating published versions of my filter modules to use it as round tuits permit. Filter modules can also postprocess to output a different charset again, using the xml2enc_filter optional function. Additional capabilities are preprocessing of bad HTML (a function introduced in mod_proxy_html 3, but also relevant to other HTML modules), and an additional optional hook for preprocessing. These extra functions are untested. Developers, feel free to explore and send feedback! http://apache.webthing.com/mod_xml2enc/ -- Nick Kew Application Development with Apache - the Apache Modules Book http://www.apachetutor.org/