httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <>
Subject Re: PLEASE READ: Filter I/O
Date Thu, 22 Jun 2000 12:09:36 GMT
I believe Greg Marr's analysis is completely correct, and is a very good
exposition of my own thoughts on why the hook scheme has fundamental flaws.
Ryan responded to Greg Marr, but some of those responses are a bit unclear
or incorrect. I'll address them below...

[ always saying "Greg Marr" above and below sounds a bit weird, but there
  are obvious difficulties given two Gregs in the conversation... :-) ]

On Wed, Jun 21, 2000 at 09:09:58PM -0700, wrote:
> On <some time>, Greg Marr wrote:
> > On <some time>, Ryan wrote:

> > >A)  The 100MB is passed on to the next filter until it reaches the bottom 
> > >filter, and then it is sent to the network.  The hook scheme can allow for 
> > >a configuration directive that would throttle how much of the 100MB is 
> > >passed to subsequent filters at any one time.  [...]  Regardless, both of 
> > >the current schemes require the same working set size, because the 100MB 
> > >is allocated on the heap and it isn't cleared until the request is done.
> > 
> > Okay, how about a slightly different phrasing of the problem: (very 
> > hypothetical, since I don't do any work with database servers)
> > 
> > I fetch records that are approximately 100 kB, but vary in size, from a 
> > database 1000 times for insertion into the content stream.  How does that 
> > work in each scheme?
> > Hook: The approximately 100 kB blocks are allocated 1000 times on the heap, 
> > and the data is passed on to the next filter.  All of the data is allocated 
> > and passed through all the filters before any of it is sent to the 
> > network.  The client sits and waits for the entire 100 MB to be 
> > processed.  If the network is congested, then the 100 MB sits around in 
> > memory until the network send is completed.
> This is a poorly written module.  It could VERY easily send just the first
> 100 kB block down to the next hook.

While it is "sending the first block down to the next hook", what does it do
with the other 999 blocks? There are two options:

1) Go ahead and read them into the heap, saving them for later processing.
   The continued processing can occur automatically by the hook system, or
   the blocks can be stored on a context for piece-meal return to the hook

2) Rebuild the module in an asynchronous fashion. On the first entry, it
   opens the database cursor and fetches a chunk of data. It returns this,
   but signals "there is more." The hook system processes the first chunk,
   then goes back for more. After 1000 iterations, the module finally closes
   the cursor and signals completion. Somewhere in here, the module must
   take particular care to create/wipe a pool so that memory can be reused
   on each iteration.

The first option has a huge working set, and the second is a complex beast
of coding. While it is certainly doable, it emphasizes the impact that the
hook scheme has on module design -- async styles of behavior could be
required where a simple procedural approach could easily be used.

> In fact, this scheme allows the
> module to specify which sections of the output should be passed down and
> which should be saved until later.

"saved until later" implies option (1) above, which means your working set
is now around 100 megabytes. For that single request.

> > Link: ~100 kB is allocated on the heap, the data is passed on to the next 
> > layer.  The ~100 kB is cleared or reused for the next database 
> This is not the way Apache works.  Apache does not free the memory until
> the request is finished.

I disagree.

ap_status_t my_link_callback(...)
#define BUFSIZE 100000
     void *buf = ap_palloc(p, BUFSIZE);
     while ( cursor_has_data(cursor) ) {
         db_read_data(cursor, buf, BUFSIZE);
	 ap_lwrite(layer, buf, BUFSIZE);

This is exactly the kind of code that Greg Marr posited. The 100k block is
allocated on the heap and used over and over. The total working set is 100k
rather than 100M. Play with the constant to meet your needs; no need to
worry about queries that return too much.

Note there is one change that I would suggest to the above code:

ap_status_t my_link_callback(ap_layer_t *layer, ...)
    if (layer->ctx == NULL)
        layer->ctx = ap_palloc(p, BUFSIZE);

    buf = layer->ctx;

This ensures that you allocate the buffer once rather than on each entry to
the callback :-)

> However, the hook based scheme makes it
> possible to implement ap_create_sibling pool, which would allow for some
> memory to be cleared and re-used.  I have not figured out how the link
> based scheme could use this.  But that may be because I haven't really
> thought about it too much.

>From the context of your post, I'm presuming that "sibling" pools means
something like this:

    if (layer->ctx == NULL)
        layer->ctx = ap_make_sub_pool(p, NULL);
    my_pool = layer->ctx;
    buf = ap_palloc(my_pool, BUFSIZE);

The above approach certainly works in the link-based scheme.

For the hook-based scheme, you would need to convert your module to the
async pattern of option (2). In each iteration, you could reuse this child
pool. (of course, a simple buffer is possible, too; the child pool is nicer
when a number of various allocations may be made -- it allows grouping them
all up quite easily)

Note that the link-based scheme associates this buffer with the particular
layer instance. If the layer is inserted multiple times, then each gets a
copy of their own state. Similar per-instance provisions are not available
for the hook scheme.

> > retrieval.  Each chunk is sent down to the network, and the client starts 
> > to receive it as the next chunk is being retrieved.  If the network is 
> > currently clear, all the data is absorbed, and sent out right away.  If the 
> > network is congested, then the ~100 kB sits around in memory until the 
> > network send is completed.
> This can be done with the hook based scheme.

This can only happen if the module author switches to an async model, and
the hook scheme introduced a notion of looping until the filter module had
no new data to return. In each iteration (presumably in filter_io(), the
data would be dropped down to the network). To make things even more
complicated, the second filter could do much the same thing -- this implies
you have a loop going for the first filter, sending chunks to a loop that is
processing data from the second filter. These N loops could make filter_io()
quite complicated.

Note that the link-based scheme places the loop in the filter and avoids the
complexities of the async style.

> > >4)  Flow control
> > >A)  Flow control can be controlled with configuration directives, by not 
> > >allowing all of a chunk to be passed to subsequent filters.  Quite 
> > >honestly, this is one place where this design does flounder a bit, but 
> > >with all of the other optimizations that can be added on top of this 
> > >design, I think this is ok.
> > 
> > A configuration directive will not take into account current network or 
> > server load conditions.  A chunk value that is perfectly reasonable at 2 AM 
> > on a weekend on a large company's customer service database server may be 
> > way too big during the week at peak usage times.
> As I said, the hook based scheme does fall down a little bit
> here.  However, a well written module will not exhibit any of the
> properties you are suggesting.

If "well written" means async, then I protest. The link-based scheme
provides a very simple, clear model of implementing a filter and
automatically getting the network-pushback effect. For a hook-based filter
to get any of this benefit, it must rejigger its code to operate in an
asynchronous manner. As described above, this also creates further
complexities within filter_io() itself and the overall execution and
processing model.

> > Forcing every filter to process the entire request before the next one gets 
> > a shot at it not only requires more memory in large applications, but 
> > decreases the apparent response time as seen by the user, since the page 
> > won't even start displaying until everything is handled.
> The hook based scheme does not force each module to process the entire
> request.  It does allow a module to save pieces off to the side.

"... save pieces off to the side [on the heap]."

The link-based scheme doesn't even generate/fetch the additional data when
network-pushback occurs.

> It also
> allows a module to process more of the request at one time, but only if
> that is reasonable.

The buffering of content, implied here, is applicable to both schemes.

> The buffering and apparent response times of the
> server are not affected by using the hook based scheme.  I have a module
> that does display the response as it has been processed.

The effect can be very dramatic based on how that module is written. As Greg
Marr pointed out, if the filter is required to fetch the whole 100 meg, then
the client sees zippo while that happens. If they tip their module
upside-down into an async model, *then* they could blip out filtered content
during request processing.


1) to solve working set and network-pushback issues, the hook-based scheme
   requires filter writers to use an asynchronous model (and the inherent
   complexities of that model)

2) to support (1), filter_io() would need to introduce a complex, nested
   loop mechanism to iterate the filters through their async state machines
   as they generate content.

3) per-instance context mechanisms have not been defined for the hook-based


Greg Stein,

View raw message