Message-Id: <m0tS2tm-0000miC@mamba.ast.cam.ac.uk>
Date: Tue, 19 Dec 95 14:20 GMT
From: drtr@ast.cam.ac.uk (David Robinson)
To: new-httpd@hyperreal.com
Subject: Re: Generalising Connections
Content-Length: 4277
Sender: owner-new-httpd@apache.org
Precedence: bulk
Reply-To: new-httpd@apache.org

Ben wrote:
>As I have mentioned before, the problem with Apache and modules which want to
>take over the data transport, is that Apache knows that a connection is a file
>descriptor. This is not generally true - especially under non-Unix OSes, where
>typically even a plain ordinary TCP/IP connection is not a file descriptor.
>
>Making connections totally generalised is a non-trivial task, but there is at
>least one thing that is clear: the "client" and "request_in" members of
>conn_rec must go (see httpd.h). This implies that all functions that use them
>need changing, and all low-level functionality (e.g. read, write, open, close)
>must be supplied in a modular way.
>
>This is, of course, similar to the way that modules work, but not quite the
>same; the function table should be associated with each connection (this
>allows dynamic matching of transports to connection, and layering), rather
>than being part of a static list.
>
>I propose a scheme along these lines; we have a transport function table:
>
>typedef struct connection connection;
>typedef struct transport_fn_table transport_fn_table;
>
>struct transport_fn_table
>        {
>        int (*write)(connection *conn,const char *buf,int n);
>        int (*read)(connection *conn,char *buf,int n);
>/* etc... */
>        };
>
>client and request_in in conn_rec are replaced by:
>
>        connection *conn;
>
>and a connection looks like:
>
>struct connection
>        {
>        void *info;     // private data for the particular type of connection
>        transport_fn_table *fn;
>        };
>
>Simple, huh? Add a few macros, and the whole thing is (nearly) transparent to
>the ordinary module, for example:
>
>#define conn_write(conn,buf,n)  (conn)->fn->write(conn,buf,n)
>
>Of course, C++ fans will note how much neater this would be in C++.
>
>The reason I intended to write about this in conjunction with CVS is simple;
>with the patch and vote system it could take a long and painful time to get
>this change in. It'll be a good test of the efficacy of CVS trying to get such
>a global change done.

I've been giving this a lot of thought recently. The main problem with
your scheme is that is insufficiently modularised. Instead, I would suggest
something based loosely on the SVR4 STREAMS interface; i.e. allow
multiple modules to intercept the data flowing to/from the client.

A stream is a sequence of, err, 'boxes' (the standard name of 'modules' would
be confusing for Apache). 

The active handler talks to the box at the head of the stream.
To output data to the client, the handler sends a message to the stream
head by calling its put() routine. This box then passes the message downstream
by calling the next box's put() routine. This repeats until a box actaully
sends the data.

                 /----------\
                 | mod_asis |           Handler routine
                 \----------/
                   |     /|\
                  \|/     |
                 +----------+
                 | box_tr   |           Stream head
                 +----------+
                   |     /|\
                  \|/     |
                 +----------+
                 | box_http |           Driver
                 +----------+


Boxes can be added to the stream head by 'pushing' them onto the stream.

mod_include.c could be usefully re-written as a box (assuming the problem
of saved state could be solved).

Example: a CGI request which returns a server-parsed document.

1. A connection is created.
2. A stream for the connection is created, containing a basic 'I/O' box.
3. Other boxes are pushed onto the stream, e.g. one for chunked encoding
   on a persistant connection.
4. A request is received; the stream is duplicated and initialised for this
   request.
5. The document type of the script is found to be text/x-server-parsed-html
   so box_include is pushed onto the stream.
6. The CGI modules runs the script and sends its output down the stream.
7. The per-request stream is closed down.

It isn't really that complicated. The only tricky part is what to do
about the HTTP message headers; are they sent down the stream as well?
(Memo: in the chunked content encoding, extra HTTP headers can be sent
to the client _after_ the object body has been sent.)

 David.