httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Kew <n...@webthing.com>
Subject Re: mod_mbox and atom 1.0
Date Wed, 12 Oct 2005 10:29:07 GMT
On Wednesday 12 October 2005 04:31, Paul Querna wrote:

> > An outline of what needs to be done can be found here:
> > 
> >   http://intertwingly.net/stories/2005/09/28/xchar.rb

Erm, no.  We need to reencode from any incoming charset.
We don't need to reinvent any wheels by recreating individual
charset conversion tables.

> Right now mod_mbox does *no* encoding translation.  We really need to be
> calling apr_xlate all over, and turning everything into UTF-8 First.
> Currently, each item is encoded in whatever the client program sent it
> as... which isn't good.

Even the HTML is erroneously sent as iso-8859-1, so posts that arrive as
utf-8 (eg from wrowe) display incorrectly!  As of now it's not really fit for 
purpose.  We should fix this for the benefit of all display formats, rather
than address html, atom, or indeed anything else in isolation.

Regarding the mail archives, the ideal solution would be to transcode
incoming messages to a homogenous utf-8 before storing them.  To make
that useful, we'd need to transcode the existing archives too, though that
would just be a one-off script.  I see a mod_smtpd filter thrashing around
that to-do list ...  dammit, it's the long-awaited updates to charset_lite!

The harder bit to deal with is _local_ encoding in a different charsets in
header lines.  That's a PITA, and is AFAIK peculiar to SMTP.

-- 
Nick Kew

Mime
View raw message