httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Charset translation and buckets.
Date Thu, 02 Nov 2000 16:04:17 GMT

We started this conversation last week sometime, and with all the travel
to london and bad net access, it kind of dropped, so I am picking it up
again now.  :-)

Our charset translation is broken in 2.0 right now, although it does look
okay.  The problem is that we only translate from a single charset to a
single charset for each request, and even then we only do it for body

The latter problem requires that we handle all translation of protocol
data when actually writing the data to a bucket.  This is annoying at
worst, and a real issue at best.  Imagine somebody writes a module that
implements a new transport encoding, but they forget to include the
charset translation.  That module will only work on some platforms.

The former problem is actually broken however, and this is why we need to
fix the bug.  :-)  Think of mod_include, which reads a file in one
charset, and then adds the date in the implementation charset.  When we
send this, the bulk of the response will be okay, because we translated
the whole file into the correct charset, and we translated the headers
when writing them, but that date is still in the wrong charset.

I see only one solution to this.  Each bucket needs to get a charset
flag.  That flag just informs the server what charset the data is in.  I
would prefer to use a global key to associate each charset with a unique
integer, because that will be faster to compare than using a strcmp.  This
solves all of the issues mentioned above, because we can move the
charset-lite filter down the stack to a connection filter, and it can just
run through the brigade once for each charset in the brigade, and
translate the appropriate buckets as it goes.



Ryan Bloom               
406 29th St.
San Francisco, CA 94131

View raw message