Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Date: Mon, 27 Aug 2001 19:59:55 -0700
From: Brian Pane <bpane@pacbell.net>
Subject: Re: chunking of content in mod_include?
To: dev@httpd.apache.org
Message-id: <3B8B092B.20201@pacbell.net>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii; format=flowed
Content-transfer-encoding: 7BIT
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.3) Gecko/20010801
References: <3B8AD23B.2040305@pacbell.net>
 <0108271616510J.15633@koj.rkbloom.net> <3B8B0112.1E5910CB@remulak.net>
Status: O

Paul J. Reder wrote:

>Ryan Bloom wrote:
>
>>On Monday 27 August 2001 16:05, Brian Pane wrote:
>>
>>>In mod_include's find_start_sequence function, there's some code that
>>>splits the current bucket if "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD."
>>>
>>>Can somebody explain the rationale for this?  It seems redundant to be
>>>splitting the data into smaller chunks in a content filter; I'd expect
>>>mod_include to defer network block sizing to downstream filters.  In the
>>>profile data I'm looking at currently, this check accounts for 35% of the
>>>total run time of find_start_sequence, so there's some performance to
>>>be gained if the "ctx->bytes_parsed >= BYTE_COUNT_THRESHOLD" check can
>>>be eliminated.
>>>
>>It is used to ensure that we don't buffer all the data in mod_include.  It isn't
>>really done correctly though, because what we should be doing, is just continuing
>>to read as much data as possible, but as soon as we can't read something, send
>>what we have down the filter stack.
>>
>>This variable, basically ensures we don't keep reading all the data until we
>>process the whole file, or reach the first tag.
>>
>
>In what manner do you mean "as soon as we can't read something"? It is my understanding
>that the bucket code hides reading delays from the mod_include code. If that is true
>how would the mod_include code know when to send a chunk along? Are you saying the
>bucket code should do some magic like send all buckets in the brigade up to the
>current one? This would wreak havoc on code like mod_include that may be setting
>aside or tagging buckets for replacement when the end of the tag is found.
>
>This code was put in because we were seeing the mod_include code buffer up the entire
>collection of buckets until an SSI tag was found. If you have a 200 MB file with an
>SSI tag footer at the end of the brigade, the whole thing was being buffered. How do
>you propose that this be done differently?
>
>The only thing I can think of is to add to and check the byte tally at bucket
>boundaries. We might go over the BYTE_COUNT_THRESHOLD, but the check wouldn't
>happen on every byte and there wouldn't need to be a bucket split to send along
>the first part. Is this what you mean?
>
I think checking on bucket boundaries would be better.  And to guard 
against the
case where a single bucket might contain 200 MB of data, wouldn't it 
work to just
check the bucket size right after the apr_bucket_read in find_start_sequence
and split the bucket there if its size exceeds some reasonable threshold?

--Brian