Mailing-List: contact new-httpd-help@apache.org; run by ezmlm
Precedence: bulk
Reply-To: new-httpd@apache.org
Date: Wed, 21 Jun 2000 21:09:58 -0700 (PDT)
From: rbb@covalent.net
To: new-httpd@apache.org
Subject: Re: PLEASE READ: Filter I/O
In-Reply-To: <4.3.1.2.20000621233402.00ae79c0@pop.ma.ultranet.com>
Message-ID: <Pine.LNX.4.21.0006212100030.12222-100000@koj.rkbloom.net>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


> >A)  The 100MB is passed on to the next filter until it reaches the bottom 
> >filter, and then it is sent to the network.  The hook scheme can allow for 
> >a configuration directive that would throttle how much of the 100MB is 
> >passed to subsequent filters at any one time.  [...]  Regardless, both of 
> >the current schemes require the same working set size, because the 100MB 
> >is allocated on the heap and it isn't cleared until the request is done.
> 
> Okay, how about a slightly different phrasing of the problem: (very 
> hypothetical, since I don't do any work with database servers)
> 
> I fetch records that are approximately 100 kB, but vary in size, from a 
> database 1000 times for insertion into the content stream.  How does that 
> work in each scheme?

> Hook: The approximately 100 kB blocks are allocated 1000 times on the heap, 
> and the data is passed on to the next filter.  All of the data is allocated 
> and passed through all the filters before any of it is sent to the 
> network.  The client sits and waits for the entire 100 MB to be 
> processed.  If the network is congested, then the 100 MB sits around in 
> memory until the network send is completed.

This is a poorly written module.  It could VERY easily send just the first
100 kB block down to the next hook.  In fact, this scheme allows the
module to specify which sections of the output should be passed down and
which should be saved until later.

> Link: ~100 kB is allocated on the heap, the data is passed on to the next 
> layer.  The ~100 kB is cleared or reused for the next database 

This is not the way Apache works.  Apache does not free the memory until
the request is finished.  However, the hook based scheme makes it
possible to implement ap_create_sibling pool, which would allow for some
memory to be cleared and re-used.  I have not figured out how the link
based scheme could use this.  But that may be because I haven't really
thought about it too much.

> retrieval.  Each chunk is sent down to the network, and the client starts 
> to receive it as the next chunk is being retrieved.  If the network is 
> currently clear, all the data is absorbed, and sent out right away.  If the 
> network is congested, then the ~100 kB sits around in memory until the 
> network send is completed.

This can be done with the hook based scheme.

> >4)  Flow control
> >A)  Flow control can be controlled with configuration directives, by not 
> >allowing all of a chunk to be passed to subsequent filters.  Quite 
> >honestly, this is one place where this design does flounder a bit, but 
> >with all of the other optimizations that can be added on top of this 
> >design, I think this is ok.
> 
> A configuration directive will not take into account current network or 
> server load conditions.  A chunk value that is perfectly reasonable at 2 AM 
> on a weekend on a large company's customer service database server may be 
> way too big during the week at peak usage times.

As I said, the hook based scheme does fall down a little bit
here.  However, a well written module will not exhibit any of the
properties you are suggesting.

> Forcing every filter to process the entire request before the next one gets 
> a shot at it not only requires more memory in large applications, but 
> decreases the apparent response time as seen by the user, since the page 
> won't even start displaying until everything is handled.

The hook based scheme does not force each module to process the entire
request.  It does allow a module to save pieces off to the side.  It also
allows a module to process more of the request at one time, but only if
that is reasonable.  The buffering and apparent response times of the
server are not affected by using the hook based scheme.  I have a module
that does display the response as it has been processed.

Ryan