httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Ames <grega...@raleigh.ibm.com>
Subject Re: BUFF, IOL, Chunking, and Unicode in 2.0 (long)
Date Tue, 02 May 2000 21:41:03 GMT


Martin Kraemer wrote:

> 
> In my understanding, we need a layered buff.c (which I number from 0
> upwards):
> 

Jeff and I have given up on the idea of a charset conversion iol's above
or below a monolithic buff, for the reasons you've articulated.  The
code we are running with (which you will see shortly) is structured like
1.3 - buff functions call APR-ized conversion routines with a handle
which identifies a pair of charsets.  

> 2) this layer handles conversion. I was thinking about a concept
>    where a generic character set conversion would be possible based on
>    Unicode-to-any translation tables. This would also deal with
>    multibyte character sets, because at this layer, it would
>    be easy to convert SBCS to MBCS.

We are heading in that direction, using iconv() to do the dirty work
with APR wrappers.  Do you have iconv() on BS2000, Martin?  If not, I
suppose it would be possible to roll-your-own inside of APR.

>    Note that conversion *MUST* be positioned above the chunking layer
>    and below the buffering layer. The former guarantees that chunking
>    information is not converted twice (or not at all), and the latter
>    guarantees that ap_bgets() is looking at the converted data
>    (-- otherwise it would fail to find the '\n' which indicates end-
>    of-line).

yep - unless ap_bgets() uses a char variable to search of end-of-line. 
Init it to '\012' when reading from the network, '\n' when reading from
a cgi script, and wrap it in a macro so ascii boxes have no performance
hit.

btw, after staring at ap_bgets() for a few days, I think we can speed it
up some for all platforms.  The inner loop tests for both LF and
off-the-end-of-the-buffer for every character read.  The off-the-end
test in the loop can be eliminated by using a sentinel LF after the last
char in the buff, then once the loop terminates, sort out why.  That
eliminates a conditional branch in this loop - generally a good thing on
a pipelined processor.  I'll post that as a separate patch.  
 
> The resulting layering would look like this:
> 
>     | Caller: using ap_bputs() | or ap_bgets/apbwrite etc.
>     +--------------------------+
>     | Layer 3: Buffered I/O    | gets/puts/getchar functionality
>     +--------------------------+
>     | Layer 2: Code Conversion | (optional conversions)
>     +--------------------------+
>     | Layer 1: Chunking Layer  | Adding chunks on writes
>     +--------------------------+
>     | Layer 0: Binary Output   | bwrite/bwritev, error handling
>     +--------------------------+
>     | iol_* functionality      | basic i/o
>     +--------------------------+
>     | apr_* functionality      |
>     ....

I don't think we want to separate chunking from buffering (layer 1 from
0) for reasons dean mentions.  They seem to be pretty happy together. 
We have a version of buff.c (unfortunately not the one in the diffs Jeff
just sent you) where the only ebcdic hit to chunking is converting the
printable hex chunk count to ascii.  It looks pretty clean. 

Greg Ames

Mime
View raw message