Return-Path: Delivered-To: apmail-httpd-dev-archive@httpd.apache.org Received: (qmail 90873 invoked by uid 500); 28 Aug 2001 03:02:50 -0000 Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm Precedence: bulk Reply-To: dev@httpd.apache.org list-help: list-unsubscribe: list-post: Delivered-To: mailing list dev@httpd.apache.org Received: (qmail 90861 invoked from network); 28 Aug 2001 03:02:50 -0000 Date: Mon, 27 Aug 2001 19:59:55 -0700 From: Brian Pane Subject: Re: chunking of content in mod_include? To: dev@httpd.apache.org Message-id: <3B8B092B.20201@pacbell.net> MIME-version: 1.0 Content-type: text/plain; charset=us-ascii; format=flowed Content-transfer-encoding: 7BIT X-Accept-Language: en-us User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.3) Gecko/20010801 References: <3B8AD23B.2040305@pacbell.net> <0108271616510J.15633@koj.rkbloom.net> <3B8B0112.1E5910CB@remulak.net> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Status: O X-Status: X-Keywords: X-UID: 1447 Paul J. Reder wrote: >Ryan Bloom wrote: > >>On Monday 27 August 2001 16:05, Brian Pane wrote: >> >>>In mod_include's find_start_sequence function, there's some code that >>>splits the current bucket if "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD." >>> >>>Can somebody explain the rationale for this? It seems redundant to be >>>splitting the data into smaller chunks in a content filter; I'd expect >>>mod_include to defer network block sizing to downstream filters. In the >>>profile data I'm looking at currently, this check accounts for 35% of the >>>total run time of find_start_sequence, so there's some performance to >>>be gained if the "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD" check can >>>be eliminated. >>> >>It is used to ensure that we don't buffer all the data in mod_include. It isn't >>really done correctly though, because what we should be doing, is just continuing >>to read as much data as possible, but as soon as we can't read something, send >>what we have down the filter stack. >> >>This variable, basically ensures we don't keep reading all the data until we >>process the whole file, or reach the first tag. >> > >In what manner do you mean "as soon as we can't read something"? It is my understanding >that the bucket code hides reading delays from the mod_include code. If that is true >how would the mod_include code know when to send a chunk along? Are you saying the >bucket code should do some magic like send all buckets in the brigade up to the >current one? This would wreak havoc on code like mod_include that may be setting >aside or tagging buckets for replacement when the end of the tag is found. > >This code was put in because we were seeing the mod_include code buffer up the entire >collection of buckets until an SSI tag was found. If you have a 200 MB file with an >SSI tag footer at the end of the brigade, the whole thing was being buffered. How do >you propose that this be done differently? > >The only thing I can think of is to add to and check the byte tally at bucket >boundaries. We might go over the BYTE_COUNT_THRESHOLD, but the check wouldn't >happen on every byte and there wouldn't need to be a bucket split to send along >the first part. Is this what you mean? > I think checking on bucket boundaries would be better. And to guard against the case where a single bucket might contain 200 MB of data, wouldn't it work to just check the bucket size right after the apr_bucket_read in find_start_sequence and split the bucket there if its size exceeds some reasonable threshold? --Brian