Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Content-Type: text/plain;
  charset="iso-8859-1"
From: Ryan Bloom <rbb@covalent.net>
Reply-To: rbb@covalent.net
Organization: Covalent Technologies
To: dev@httpd.apache.org, "Paul J. Reder" <rederpj@remulak.net>
Subject: Re: chunking of content in mod_include?
Date: Mon, 27 Aug 2001 20:24:50 -0700
References: <3B8AD23B.2040305@pacbell.net>
 <0108271616510J.15633@koj.rkbloom.net> <3B8B0112.1E5910CB@remulak.net>
In-Reply-To: <3B8B0112.1E5910CB@remulak.net>
MIME-Version: 1.0
Message-Id: <01082720245001.20249@koj.rkbloom.net>
Content-Transfer-Encoding: 8bit
Status: O

On Monday 27 August 2001 19:25, Paul J. Reder wrote:
> Ryan Bloom wrote:
> > On Monday 27 August 2001 16:05, Brian Pane wrote:
> > > In mod_include's find_start_sequence function, there's some code that
> > > splits the current bucket if "ctx->bytes_parsed >=
> > > BYTE_COUNT_THRESHOLD."
> > >
> > > Can somebody explain the rationale for this?  It seems redundant to be
> > > splitting the data into smaller chunks in a content filter; I'd expect
> > > mod_include to defer network block sizing to downstream filters.  In
> > > the profile data I'm looking at currently, this check accounts for 35%
> > > of the total run time of find_start_sequence, so there's some
> > > performance to be gained if the "ctx->bytes_parsed >=
> > > BYTE_COUNT_THRESHOLD" check can be eliminated.
> >
> > It is used to ensure that we don't buffer all the data in mod_include. 
> > It isn't really done correctly though, because what we should be doing,
> > is just continuing to read as much data as possible, but as soon as we
> > can't read something, send what we have down the filter stack.
> >
> > This variable, basically ensures we don't keep reading all the data until
> > we process the whole file, or reach the first tag.
>
> In what manner do you mean "as soon as we can't read something"? It is my
> understanding that the bucket code hides reading delays from the
> mod_include code. If that is true how would the mod_include code know when

The bucket does not hide reading delays at all.  Basically, you call apr_bucket_read
with APR_NONBLOCK_READ, and when you get a return code of APR_EAGAIN,
you send what you have, and then call apr_bucket_read with APR_BLOCK_READ.

> to send a chunk along? Are you saying the bucket code should do some magic
> like send all buckets in the brigade up to the current one? This would
> wreak havoc on code like mod_include that may be setting aside or tagging
> buckets for replacement when the end of the tag is found.

Huh?  The bucket code doesn't ever send data down the filter stack unless you
tell it to.  Take a look at the content-length filter to see what I mean.

> This code was put in because we were seeing the mod_include code buffer up
> the entire collection of buckets until an SSI tag was found. If you have a
> 200 MB file with an SSI tag footer at the end of the brigade, the whole
> thing was being buffered. How do you propose that this be done differently?

I don't care if mod_include buffers 200 Megs, as long as it is constantly doing
something with the data.  If we have a 200 Meg file that has no SSI tags in
it, but we can get all 200 Meg at one time, then we shouldn't have any problem
just scanning through the entire 200 Megs very quickly.  Worst case, we do what
Brian suggested, and just check the bucket length once we have finished
processing all of the data in that bucket.  The buffering only becomes a
real problem when we sit waiting for data from a CGI or some other slow
content generator.

Ryan
______________________________________________________________
Ryan Bloom                        	rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------