httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Kew <>
Subject Transcoding module for libxml2-based filters
Date Tue, 25 Dec 2007 22:54:55 GMT
As developer or co-developer of several libxml2-based filter
modules, I sometimes find myself wanting to replicate functionality
across a number of modules.  One such case is improved
internationalisation, which is a good candidate for a separate
module.  So I've been hacking just such a module: mod_xml2enc
is now at

The basic features are:
  1. Sniff charset of incoming data, from (in order):
	(a) HTTP headers, if available
	(b) XML BOM / XML Declaration
	(c) HTML <meta> elements
	(d) Configuration default
  2. If the charset is not supported by libxml2,
     convert it to UTF-8 using apr_xlate (if supported).
  3. Remove <meta> elements that are invalidated by
     any such conversion.
  4. Perform other preprocessing fixups, and offer an
     optional hook for preprocessing.
  5. Support post-filtering from UTF-8 to a server admin's
     choice of charset.

This is work-in-progress, and currently won't do anything more
useful than crash your server.  But I think it's time to
solicit developer feedback, particularly from those of you who 
use libxml2 with apache.  So I've committed it to public SVN, 
and started on a module page:

The challenging aspect of this is to enable it to be inserted
twice in a filter chain (before and after libxml2), and perform
different transformations each time.  Currently it offers
configuration options appropriate to a pre-filter, and will
export a function for other filter modules to insert it with
their own configuration options (f->ctx) for post-filtering.
Unless anyone has a better suggestion.

Nick Kew

Application Development with Apache - the Apache Modules Book

View raw message