httpd-modules-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Kew <n...@webthing.com>
Subject Re: Transcoding module for libxml2-based filters
Date Fri, 04 Jan 2008 22:06:44 GMT
On Fri, 04 Jan 2008 22:47:16 +0100
Joachim Zobel <jzobel@heute-morgen.de> wrote:

> Am Dienstag, den 25.12.2007, 22:54 +0000 schrieb Nick Kew:
> > As developer or co-developer of several libxml2-based filter
> > modules, ...
> 
> Hey, I thought you were on the expat side :) 

Just mod_xmlns.  All my other SAX parsing modules are libxml2.

> > The basic features are:
> >   1. Sniff charset of incoming data, from (in order):
> > 	(a) HTTP headers, if available
> > 	(b) XML BOM / XML Declaration
> > 	(c) HTML <meta> elements
> > 	(d) Configuration default
> 
> A configuration Like
> XML2EncSniff HTTP XML META CONF
> might be desirable for this in the long run. So one can for example
> ignore META.

Indeed, that's a thought.  Not to mention sniffing according
to Content-Type, since one purpose of this is *also* to support
non-markup text.

> >   2. If the charset is not supported by libxml2,
> >      convert it to UTF-8 using apr_xlate (if supported).
> >   3. Remove <meta> elements that are invalidated by
> >      any such conversion.
> >   4. Perform other preprocessing fixups, and offer an
> >      optional hook for preprocessing.
> 
> This means e.g. fix XML decl. if the header tells different?

Yes, though that's a TBD.

> >   5. Support post-filtering from UTF-8 to a server admin's
> >      choice of charset.
> 
> Good.
> 
> > The challenging aspect of this is to enable it to be inserted
> > twice in a filter chain (before and after libxml2), and perform
> > different transformations each time. 
> 
> This means two different filter functions, right?

No, one function, with its behaviour determined by its ctx.

> > Currently it offers
> > configuration options appropriate to a pre-filter, and will
> > export a function for other filter modules to insert it with
> > their own configuration options (f->ctx) for post-filtering.
> > Unless anyone has a better suggestion.
> 
> Why do you think it is necessary to ask other filters for
> configuration this way? What is the advantage of this above simply
> having configuration options for the post filter?

That gets messy, with two filters both of AP_FTYPE_RESOURCE.
If I hack it with offsets, that breaks interaction with other
filters.

> Hey, you may want to interface with mod_negotiate :) Charsets are not
> really negotiable now, but with your module they will we.

Hehe.  Well, there's also mod_charset_lite:-)

Thanks for the comments.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

Mime
View raw message