httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William A. Rowe, Jr." <>
Subject RE: Charset translation and buckets.
Date Thu, 02 Nov 2000 15:46:53 GMT
> From: []
> Sent: Thursday, November 02, 2000 10:04 AM
> We started this conversation last week sometime, and with all 
> the travel to london and bad net access, it kind of dropped, 
> so I am picking it up again now.  :-)

Just in time, perhaps that's my problem with mod_autoindex.

> I see only one solution to this.  Each bucket needs to get a charset
> flag.  That flag just informs the server what charset the data is in.  I
> would prefer to use a global key to associate each charset with a unique
> integer, because that will be faster to compare than using a strcmp.

And these are defined by IANA, let's not reinvent things please.

> This solves all of the issues mentioned above, because we can move the
> charset-lite filter down the stack to a connection filter, and it can just
> run through the brigade once for each charset in the brigade, and
> translate the appropriate buckets as it goes.
> Thoughts?

Yes, as Dirk kicked into my head, this doesn't work as you expect.

Take the following fragment of a text/html response:

<a href=""

Ok... problem... we are translating from utf-8 to iso-8859-x, what do
you change?  The href?  What happens where the position of the ':' symbol
maps to the wrong char code?  The display looks right, and the href becomes
an illegal value.

The design you propose is a huge problem, and charset translation is never
a connection level event.  It must be a content filter, and the appropriate
filter must be associated with the individual fragments, -where it is 
appropriate-!  So the output of mod_autoindex should be passed through the
charset filter until it creates a bucket for the html fragments, and then
no translation can occur within those buckets, period.  Even simple things
like lt, gt, and the like must be maintained.

If we are playing html, and utf-8 is a problem, then we want to escape
odd display things with &#n; - but this won't work for anything else.
Charset filters must be content level and mime-subjective or we explode :-)

View raw message