httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Trawick <trawi...@bellsouth.net>
Subject Re: BUFF, IOL, Chunking, and Unicode in 2.0 (long)
Date Sat, 06 May 2000 03:45:15 GMT
> Date: Tue, 2 May 2000 15:51:30 +0200
> From: Martin Kraemer <Martin.Kraemer@Mch.SNI.De>
> In my understanding, we need a layered buff.c (which I number from 0
> upwards):
> 
> 0) at the lowest layer, there's a "block mode" which basically
>    supports bread/bwrite/bwritev by calling the equivalent iol_*
>    routines. It doesn't know about chunking, conversion, buffering and
>    the like. All it does is read/write with error handling.
> 
> 1) the next layer handles chunking. It knows about the current
>    chunking state and adds chunking information into the written
>    byte stream at appropriate places. It does not need to know about
>    buffering, or what the current (ebcdic?) conversion setting is.
> 
> 2) this layer handles conversion. I was thinking about a concept
>    where a generic character set conversion would be possible based on
>    Unicode-to-any translation tables. This would also deal with
>    multibyte character sets, because at this layer, it would
>    be easy to convert SBCS to MBCS.
>    Note that conversion *MUST* be positioned above the chunking layer
>    and below the buffering layer. The former guarantees that chunking
>    information is not converted twice (or not at all), and the latter
>    guarantees that ap_bgets() is looking at the converted data
>    (-- otherwise it would fail to find the '\n' which indicates end-
>    of-line).
>    Using (loadable?) translation tables based on unicode definitions
>    is a very similar approach to what libiconv offers you (see
>    http://clisp.cons.org/~haible/packages-libiconv.html -- though my
>    inspiration came from the russian apache, and I only heard about
>    libiconv recently). Every character set can be defined as a list
>    of <hex code> <unicode equiv> pairs, and translations between
>    several SBCS's can be collapsed into a single 256 char table.
>    Efficiently building them once only, and finding them fast is an
>    optimization task.
> 
> 3) This last layer adds buffering to the byte stream of the lower
>    layers. Because chunking and translation have already been dealt
>    with, it only needs to implement efficient buffering. Code
>    complexity is reduced to simple stdio-like buffering.
> 
> 
> Creating a BUFF stream involves creation of the basic (layer 0) BUFF,
> and then pushing zero or more filters (in the right order) on top of
> it. Usually, this will always add the chunking layer, optionally add
> the conversion layer, and usually add the buffering layer (look for
> ap_bcreate() in the code: it almost always uses B_RD/B_WR).
> 
> Here's code from a conceptual prototype I wrote:
>     BUFF *buf = ap_bcreate(NULL, B_RDWR), *chunked, *buffered;
>     chunked   = ap_bpush_filter(buf,     chunked_filter, 0);
>     buffered  = ap_bpush_filter(chunked, buffered_filter, B_RDWR);
>     ap_bputs("Data for buffered ap_bputs\n", buffered);
> 
> 
> Using a BUFF stream doesn't change: simply invoke the well known API
> and call ap_bputs() or ap_bwrite() as you would today. Only, these
> would be wrapper macros
> 
>     #define ap_bputs(data, buf)             buf->bf_puts(data, buf)
>     #define ap_write(buf, data, max, lenp)  buf->bf_write(buf, data, max, lenp)
> 
> where a BUFF struct would hold function pointers and flags for the
> various levels' input/output functions, in addition to today's BUFF
> layout.

Greg Ames and I finally sat down and started playing with buff.c
again today.  Previously we had converted most all of the buff
operations to use the new APR translation functions.  What we did
today is to start implementing hooks so that when translating
different code is called.  I hesitate to call it layering though it
can be very similar (perhaps indistinguishable depending on which buff
primitive you look at? :) ).

We started with ap_bwrite().

ap_bwrite() was renamed to ap_bwrite_core().  We introduced a new
ap_bwrite_xlate().  ap_bwrite() is a macro, similar to what you
showed. 

#ifdef CHARSET_EBCDIC /* pretend APACHE_XLATE */
typedef struct biol {
    ap_status_t (*bwrite)(BUFF *, const void *, ap_size_t, ap_ssize_t
*);
} biol;
#endif

#ifdef CHARSET_EBCDIC /* pretend APACHE_XLATE */
#define ap_bwrite(fb,buf,nbyte,bytes_written) \
(fb)->biol.bwrite(fb,buf,nbyte,bytes_written)
#else
#define ap_bwrite(fb,buf,nbyte,bytes_written) \
ap_bwrite_core(fb,buf,nbyte,bytes_written)
#endif

While it isn't a big speed problem to call via a function ptr when not
supporting translation, it is also trivial to keep plain so if not
CHARSET_EBCDIC (substitute the hypothetical APACHE_XLATE) then just do
the plain stuff.

When we enable translate-on-write, ap_bwrite_xlate() gets hooked in by 
storing its address in fb->biol.bwrite().  (We want to store an APR
translation handle at the same time.)  When we disable
translate-on-write, we call ap_bwrite_core() directly.  This is just a
minimal change to the old code, which first translated and then did
the real work.

ap_bwrite_xlate() has the translation part of the old ap_bwrite(), and
then calls ap_bwrite_core() to do the dirty work.

(By the way... the following code shows part of a thread-safe
implementation of the translate buffer previously declared as static
in ap_bwrite(), so don't be surprised at the fb->xbuf stuff.)

#ifdef CHARSET_EBCDIC
static ap_status_t ap_bwrite_xlate(BUFF *fb, const void *buf,
				   ap_size_t nbyte,
                                   ap_ssize_t *bytes_written)
{
    ap_size_t inbytes_left, outbytes_left;
    ap_status_t rv;

    if (fb->flags & (B_WRERR | B_EOUT)) {
        *bytes_written = 0;
        return fb->saved_errno;
    }
    if (nbyte == 0) {
        *bytes_written = 0;
        return APR_SUCCESS;
    }

    if (nbyte > fb->xbufsize) {
        if (fb->xbuf != NULL) {
            free(fb->xbuf);
        }
        fb->xbufsize = (nbyte + HUGE_STRING_LEN + 1023) & ~1023;
        fb->xbuf = (char *)malloc(fb->xbufsize);
        ap_assert(fb->xbuf);
    }
    inbytes_left = outbytes_left = nbyte;
    rv = ap_xlate_conv_buffer(fb->xlate->to_net, buf, &inbytes_left,
                              fb->xbuf ? fb->xbuf : (void *)buf,
                              &outbytes_left);
    /* we still only handle SBCS conversions */
    ap_assert(!rv && !inbytes_left && !outbytes_left);

    return ap_bwrite_core(fb, fb->xbuf, nbyte, bytes_written);
}
#endif /* CHARSET_EBCDIC */

In this example, ap_bwrite_xlate() is an extra layer on top of
ap_bwrite_core().  I would guess that when all is said and done, in
some cases translation may be an extra layer but in other cases 
translation will simply be an alternate version of the read/write
primitive.  Perhaps in some cases there will be one function for some
primitives (e.g., ap_bgets()) with checks for translation mixed in
with the "normal" code.

> The resulting layering would look like this:
> 
>     | Caller: using ap_bputs() | or ap_bgets/apbwrite etc.
>     +--------------------------+
>     | Layer 3: Buffered I/O    | gets/puts/getchar functionality
>     +--------------------------+
>     | Layer 2: Code Conversion | (optional conversions)
>     +--------------------------+
>     | Layer 1: Chunking Layer  | Adding chunks on writes
>     +--------------------------+
>     | Layer 0: Binary Output   | bwrite/bwritev, error handling
>     +--------------------------+
>     | iol_* functionality      | basic i/o
>     +--------------------------+
>     | apr_* functionality      |
>     ....
> 
> -- 
> <Martin.Kraemer@MchP.Siemens.De>             |    Fujitsu Siemens
> Fon: +49-89-636-46021, FAX: +49-89-636-41143 | 81730  Munich,  Germany

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Mime
View raw message