Return-Path: Delivered-To: apmail-new-httpd-archive@apache.org Received: (qmail 88223 invoked by uid 500); 22 Jun 2000 04:09:40 -0000 Mailing-List: contact new-httpd-help@apache.org; run by ezmlm Precedence: bulk X-No-Archive: yes Reply-To: new-httpd@apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list new-httpd@apache.org Received: (qmail 88208 invoked from network); 22 Jun 2000 04:09:37 -0000 X-Authentication-Warning: koj.rkbloom.net: rbb owned process doing -bs Date: Wed, 21 Jun 2000 21:09:58 -0700 (PDT) From: rbb@covalent.net X-Sender: rbb@koj.rkbloom.net To: new-httpd@apache.org Subject: Re: PLEASE READ: Filter I/O In-Reply-To: <4.3.1.2.20000621233402.00ae79c0@pop.ma.ultranet.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N > >A) The 100MB is passed on to the next filter until it reaches the bottom > >filter, and then it is sent to the network. The hook scheme can allow for > >a configuration directive that would throttle how much of the 100MB is > >passed to subsequent filters at any one time. [...] Regardless, both of > >the current schemes require the same working set size, because the 100MB > >is allocated on the heap and it isn't cleared until the request is done. > > Okay, how about a slightly different phrasing of the problem: (very > hypothetical, since I don't do any work with database servers) > > I fetch records that are approximately 100 kB, but vary in size, from a > database 1000 times for insertion into the content stream. How does that > work in each scheme? > Hook: The approximately 100 kB blocks are allocated 1000 times on the heap, > and the data is passed on to the next filter. All of the data is allocated > and passed through all the filters before any of it is sent to the > network. The client sits and waits for the entire 100 MB to be > processed. If the network is congested, then the 100 MB sits around in > memory until the network send is completed. This is a poorly written module. It could VERY easily send just the first 100 kB block down to the next hook. In fact, this scheme allows the module to specify which sections of the output should be passed down and which should be saved until later. > Link: ~100 kB is allocated on the heap, the data is passed on to the next > layer. The ~100 kB is cleared or reused for the next database This is not the way Apache works. Apache does not free the memory until the request is finished. However, the hook based scheme makes it possible to implement ap_create_sibling pool, which would allow for some memory to be cleared and re-used. I have not figured out how the link based scheme could use this. But that may be because I haven't really thought about it too much. > retrieval. Each chunk is sent down to the network, and the client starts > to receive it as the next chunk is being retrieved. If the network is > currently clear, all the data is absorbed, and sent out right away. If the > network is congested, then the ~100 kB sits around in memory until the > network send is completed. This can be done with the hook based scheme. > >4) Flow control > >A) Flow control can be controlled with configuration directives, by not > >allowing all of a chunk to be passed to subsequent filters. Quite > >honestly, this is one place where this design does flounder a bit, but > >with all of the other optimizations that can be added on top of this > >design, I think this is ok. > > A configuration directive will not take into account current network or > server load conditions. A chunk value that is perfectly reasonable at 2 AM > on a weekend on a large company's customer service database server may be > way too big during the week at peak usage times. As I said, the hook based scheme does fall down a little bit here. However, a well written module will not exhibit any of the properties you are suggesting. > Forcing every filter to process the entire request before the next one gets > a shot at it not only requires more memory in large applications, but > decreases the apparent response time as seen by the user, since the page > won't even start displaying until everything is handled. The hook based scheme does not force each module to process the entire request. It does allow a module to save pieces off to the side. It also allows a module to process more of the request at one time, but only if that is reasonable. The buffering and apparent response times of the server are not affected by using the hook based scheme. I have a module that does display the response as it has been processed. Ryan