Mailing-List: contact dev-help@httpd.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@httpd.apache.org
Date: Sun, 26 Aug 2001 12:24:34 -0400
From: "Eric Prud'hommeaux" <eric@w3.org>
To: Greg Stein <gstein@lyra.org>
Cc: dev@httpd.apache.org
Subject: Re: appending to the content brigade
Message-ID: <20010826122434.B30842@w3.org>
References: <20010824154233.E4104@w3.org>
 <Pine.GSO.4.32.0108241544500.12287-100000@cobra.cs.Virginia.EDU>
 <20010824143232.L28464@lyra.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <20010824143232.L28464@lyra.org>;
 from gstein@lyra.org on Fri, Aug 24, 2001 at 02:32:32PM -0700
Status: O

On Fri, Aug 24, 2001 at 02:32:32PM -0700, Greg Stein wrote:
> On Fri, Aug 24, 2001 at 04:00:46PM -0400, Cliff Woolley wrote:
> > On Fri, 24 Aug 2001, Eric Prud'hommeaux wrote:
> >...
> > > > Note that if the last bucket is not eos, depending on how your filter is
> > > > supposed to work that might be an indication that you need to stash the
> > > > brigade away until you DO get an eos, because the content you're wrapping
> 
> Okay... let's back up. I believe that some poor advice may have been given.
> 
> For now, let's assume that bPayload *does* have the entire content. Given
> that, there is *no* reason to copy the darn payload into a new brigade.
> 
> The whole *idea* of the filters and the brigades is that you modify the
> brigade passed to you, and then pass it on down. Thus, your operation is
> like this:
> 
> #define BOUNDARY "---- boundary ----"
> 
>   /* insert into head in reverse order (like pushing a stack) */
>   bucket = apr_bucket_immortal_create(BOUNDARY, sizeof(BOUNDARY)-1);
>   APR_BRIGADE_INSERT_HEAD(bPayload, bucket); /* boundary of data/payload */
> 
>   my_data_bucket = ...
>   APR_BRIGADE_INSERT_HEAD(bPayload, my_data_bucket);

I beleive there are good reasons for this data being a brigade. It may
come from disk, *allloc, or shared memory. Also, this generator can
benifit from convenient apr_brigade_write functions.

Is there a good way to write into a list of buckets rather than a
brigade? This would permit a splice
  APR_BUCKET_INSERT_HEAD(bPayload, myBuckets)
which would, I believe, be more efficient than doing the maintainance
to copy the buckets from myBrigade.

In a survey (grep), of uses of APR_BRIGADE_INSERT_HEAD
APR_BRIGADE_INSERT_TAIL APR_BUCKET_INSERT_BEFORE
APR_BUCKET_INSERT_AFTER. All were inserting a single bucket that was
either newly created or newly removed from another brigade. Given
this, I would add a hint to buckets.html that said that there were no
known practical uses of lists of buckets outside of a brigade - the
right way to manage an arbitrary set of buckets is with a brigade.

>   bucket = apr_bucket_immortal_create(BOUNDARY, sizeof(BOUNDARY)-1);
>   APR_BRIGADE_INSERT_HEAD(bPayload, bucket); /* first boundary */
> 
>   /* insert a boundary before the EOS bucket */
>   eos = APR_BRIGADE_LAST(bPayload);
>   APR_BRIGADE_INSERT_BEFORE(eos, bucket);
> 
>   return ap_pass_brigade(f->next, bPayload);
> 
> 
> Construction of the boundary is probably dynamic, and I'd bet you have much
> more data to insert. But the concept demonstrated above is how you "should"
> deal with the passing brigade, and how to pass it along.
> 
> Copying the brigade is *not* the answer, if you can avoid it.

Assuming my analysis above is correct I have brigade of payload and a
brigade of metadata. The headers and associated crap can be
{pre,post}pended to either. Now, do I stick them together
(APR_BRIGADE_CONCAT) or make multiple calls to ap_pass_brigade? I
assume the CONCAT is more efficient.

> Now... the second point is that, as Cliff said, you may not get the EOS in
> your brigade. Not a problem. On the *first* invocation of your filter,
> insert the stuff up front and pass it along. (record the "first" in a
> filter-specific context structure). When you see an EOS bucket at the end of
> the brigade (using APR_BRIGADE_LAST and APR_BUCKET_IS_EOS), then you insert
> the final boundary.

Yeah, I removed my single-call dependency on my filter. I should put
an example of using f->ctx in filters.html.

> If your boundary is dynamically constructed, the store the string into your
> context (to reuse on the various invocations of your filter). When you go to
> insert it, wrap it in a POOL bucket, using whatever pool your boundary was
> allocated in (to ensure the lifetimes match).

roger, done.

> >...
> > > Oops, I forgot to include the code snippet to which I was
> > > referring. From ap_byterange_filter:
> >...
> > Hooo, that's ugly.  You don't have to do that for what you want.  The
> > reason it's done here is that byteranges might overlap within the same
> > request, so rather than just moving buckets to where they need to go, it
> > has to make copies of all the requisite buckets for each of the ranges.
> 
> To be clearer, the client might request the same range multiple times. Thus,
> we need multiple copies of the buckets.
> 
> [ and yes: we could optimize the thing to avoid copying by analyzing the
>   ranges up front, determining where overlaps may occur, and only copy those
>   portions; thus using the original brigade as much as possible, without any
>   copies.
>   
>   that said: many buckets are very efficient at copying because of the whole
>   refcount and sub-bucket stuff. ]
> 
> >...
> > heap) and then THAT can be copied.  Actually, the block here is broken
> > because it puts str into two buckets at a time, which will double-free
> > str when the second bucket gets destroyed.  It should read:
> > 
> >     if ((rv = apr_bucket_copy(ec, &foo)) != APR_SUCCESS) {
> >         if (rv == APR_ENOTIMPL) {
> >             /* morph to a copyable bucket by reading */
> >             apr_bucket_read(ec, &str, &len, APR_BLOCK_READ);
> >             apr_bucket_copy(ec, &foo);
> 
> Or, it could simply pass 1 to heap_create to COPY the data passed. That *is*
> what that parameter is for :-)
> 
> > >                 foo = apr_bucket_heap_create(str, len, 0, NULL);
> 
> becomes:
> 
> > >                 foo = apr_bucket_heap_create(str, len, 1, NULL);
> 
> >...
> > That way, ec and foo end up as two apr_buckets of type heap that share a
> > single apr_bucket_heap structure which points to str with a refcount of
> > 2, rather than having two entirely separate apr_bucket/apr_bucket_heap
> > pairs both of which point to str and each of which has a refcount of
> > 1.

This brings up a meta question I have: Assuming you want the same
string in multiple places in your response, may one have a pool bucket
pointing at the same pool memory as is used in a header as long as the
lifetimes are the same? Would one do this by using the request pool
for the bucket? Would it matter if the pools were different as long as
they were both reaped at the end of the request? Probably a bad idea,
but I'm just curious.

-- 
-eric

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.