httpd-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject cvs commit: apache-2.0/src/lib/apr/buckets doc_bucket_brigades.txt
Date Thu, 13 Jul 2000 08:01:46 GMT
fielding    00/07/13 01:01:46

  Added:       src/lib/apr/buckets doc_bucket_brigades.txt
  Gather together the main points of bucket brigades.
  Revision  Changes    Path
  1.1                  apache-2.0/src/lib/apr/buckets/doc_bucket_brigades.txt
  Index: doc_bucket_brigades.txt
  Subject: bucket brigades and IOL
  Date: Fri, 12 Nov 1999 23:57:43 -0800
  From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
  Message-ID:  <>
  About two years ago I wasted a lot of time writing an Ada95 library
  called Onions that provides a stackable stream abstraction for files,
  sockets, etc.  It is at <>
  if you want to take a look at it, but I don't recommend looking at the
  code since it is almost all just working around Ada95's lack of a
  system interface.  I'll describe the worthwhile bits here.
  The heart of Onions is the input and output stream object
  classes and classwide types for building a data stream via a
  stack of stream objects (Input_Pipe and Output_Pipe).  Reading
  from the head of an input pipe causes the head stream object
  to read from the next outbound stream object, and on down the line.
  Likewise for writing to the head of an output pipe.  One of the
  main features of streams is that they can filter the data as it
  passes, converting, adding to, and/or removing from the data
  before giving it to the next object.  Since multiple streams can be
  cascaded, the complete data conversion is the sum of the individual
  data conversions performed by the stream objects.
  So far, no big deal -- this can be manually created by stacking ap_iol
  types in a meaningful way.  But, the one unique thing I did in Onions was 
  abstract the memory handling into something called Buckets and moved them
  around in Bucket_Brigades.  A bucket is an allocated segment of memory
  with pointers to its allocation address and current size.  If I were doing
  this in C, I'd also add a pointer to current start address and allocated
  size, so that a single bucket could be shrunk from both ends without
  copying, and a function pointer for freeing it at the stream end.
  Note that this is the same type of memory structure that IO-Lite uses,
  though developed independently and for different reasons.
  A bucket brigade is a list-queue of buckets.  Each of the stream read/write
  calls would pass a bucket brigade instead of single bucket, since this
  made insertion by filters more efficient, with the general idea being that
  the outbound end of the sream would be writing them out using writev
  or reading them in using readv, which is about as efficient as I could
  get with Ada95.  [I call it a list-queue instead of just queue because you
  have the choice of removing buckets from (or adding to) the queue one
  bucket at a time or an entire linked list of buckets.]
  But we could go one step further.  A bucket is an ADT, and as such can
  be used as a general handle for read-only memory, read-write memory,
  cache object, file handle, mmap handle, file name, URL, whatever.
  What if, instead of just a stream of memory, it could pass around a
  stream of memory interspersed with file handles or references to
  remote objects?  A filter could then add stuff around the stream without
  causing too much parsing overhead, and if it needed to look at all the
  bytes in the stream it would just replace the bucket handle with a stream
  of memory sucked from that handle.  Something like this was talked about
  last year (see threads on "Stacking up Response Handling" on 23 Sep 1998
  and "I/O filters & reference counts" in late December 1998 and January 1999).
  And Dean started something with ap_buf.h, but I don't know how he meant
  to finish it.
  What I was thinking of was
     typedef enum {
     } ap_bucket_color_t;
     typedef struct ap_bucket_t ap_bucket_t;
     struct ap_bucket_t {
         ap_bucket_color_t color;
         void              *content;
         ap_status_t       (*free)(ap_bucket_t *bucket);
         unsigned int      refcount;
     typedef struct ap_bucket_rwmem_t ap_bucket_rwmem_t;
     struct ap_bucket_rwmem_t {
         void   *alloc_addr;
         size_t  alloc_len;
         void   *addr;
         size_t  len;
     typedef struct ap_bucket_rmem_t ap_bucket_rmem_t;
     struct ap_bucket_rmem_t {
         void   *addr;
         size_t  len;
     typedef struct ap_bucket_filename ap_bucket_filename;
     struct ap_bucket_filename {
         ap_context_t *ctx;
         char         *name;
         ap_stat_t    *stat;  /* useful if already stat'ed */
         ap_aa_t      *conf;  /* access control structure for this file */
  and then
     typedef struct ap_bucket_list_t ap_bucket_list_t;
     struct ap_bucket_list_t {
         ap_bucket_t *bucket;
         ap_bucket_list_t *prev;
         ap_bucket_list_t *next;
     typedef struct ap_brigade_t ap_brigade_t;
     struct ap_brigade_t {
         ap_context_t     *ctx;
         ap_bucket_list_t *first;
         ap_bucket_list_t *last;
         unsigned int     count;
  and then construct the input and output streams as pushing these
  bucket brigades to or from the client.  The streams would have to
  be a little more complicated than Onions, since I learned later that
  you also need a parallel stream of header fields (in tokenized form)
  in order for it to work with anything HTTP-like.
  Why use an enum instead of a bunch of file pointers for each type
  of bucket, kind of like ap_iol?  Because it allows adjacent memory
  buckets (the most frequent kind after a filter operation) to be
  gathered into a single writev.  Also, we need a way to be able to
  set up an operation and figure out what it will produce without
  actually performing the operation -- this is for OPTIONS and HEAD.
  Note that this would completely change the way we handle internal
  redirects, subrequests, server-side include, mod_proxy, access control, etc.
  And then most of the API hooks would need to change.  I think that is why
  Dean was putting it off until 2.1.  The annoying thing is that this is the
  most useful rearchitecting of the server -- the MPM, APR, and hook changes
  make 2.0 easier/cleaner/faster to port to other platforms, but layering
  enables in one fell swoop almost every significant non-config feature
  that our users have requested.  A cache would just be a hash table or
  btree of file buckets, complete with AA info.
  Anyway, that was stuck in the back of my head and had to get out.
  I won't be able to work on it until after the dissertation is done,
  which every day seems to be further away.  Maybe 3.0, with rHTTP/2.0.
  Subject: Re: bucket brigades and IOL
  In-reply-to: Your message of "Sat, 13 Nov 1999 20:43:58 GMT."
  Date: Sun, 14 Nov 1999 22:24:03 -0800
  From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
  Message-ID:  <>
  BenL wrote:
  >I've got to say that this is the most coherent suggestion along these
  >lines that I've seen yet. I rather like it. One thing I'd add is that if
  >you are going to have a movable "start of block" pointer, and changeable
  >length, it can be nice to allocate extra around the edges under some
  >circumstances, so that lower layers can expand the block without having
  >to add extra chunks.
  Or, alternatively, allocate equal size blocks and just pass around
  a reference pair within the buckets that, when the bucket is freed,
  access a more complicated reference-counting pool.  I think that is
  closer to what IO-Lite does.
  >Also, the usual objections still apply - i.e. it is awkward to do things
  >like searching for particular strings, since they may cross boundaries.
  >I'm beginning to think that the right answer to this is to provide nice
  >matching functions that know about the chunked structures, and last
  >resort functions that'll glue it all back into one chunk...
  Yep, that's what I ended up doing for Ada95, though in that case there
  were no easier alternatives.
  Subject: Re: layered I/O (was: cvs commit: ...) 
  In-reply-to: Your message of "Wed, 29 Mar 2000 01:21:09 PST."
  Date: Wed, 29 Mar 2000 02:05:08 -0800
  From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
  Message-ID:  <>
  >Selection of IO Layers
  >The core selects a source module and IO layers based on the urlspace
  >configuration.  Content might be generated by mod_perl, and the result is
  >piped through mod_chunk, mod_ssl, and mod_net, in turn.  When the content
  >generator runs, the core enforces that the module set the content type
  >before the first call to ap_bput.  The content type is set by a function
  >call.  The function (ap_set_content_type(request_rec *, char *)) examines
  >the content type and adds IO layers as neccessary.  For server parsed
  >html, the core might insert mod_include immediately after mod_perl.
  The problem of thinking of it that way is that, like Dean mentioned,
  the output of one module may be filtered and the filter indicate that
  content should be embedded from another URL, which turns out to be a
  CGI script that outputs further parseable content.  In this instance,
  the goal of layered-IO is to abstract away such behavior so that the
  instance is processed recursively and thus doesn't result in some tangled
  mess of processing code for subrequests.  Doing it requires that each
  layer be able to pass both data and metadata, and have both data and
  metadata be processed at each layer (if desired), rather than call a
  single function that would set the metadata for the entire response.
  My "solution" to that is to pass three interlaced streams -- data,
  metadata, and meta-metadata -- through each layer.  The metadata
  streams would point to a table of tokenized name-value pairs.
  There are lots of ways to do that, going back to my description of
  bucket brigades long ago.  Basically, each block of memory would
  indicate what type of data, with metadata occurring in a block before
  the data block(s) that it describes (just like chunk-size describes
  the subsequent chunk-data) and the layers could be dynamically
  rearranged based on the metadata that passed through them, in
  accordance with the purpose of the filter.
  >(Can anyone produce a use case where the IO chain could change after
  >output begins?)
  Output is a little easier, but that is the normal case for input.
  We don't know what filters to apply to the request body until after
  we have passed through the HTTP headers, and the HTTP message processor
  is itself a filter in this model.
  Subject: Re: filtering patches
  In-reply-to: Your message of "Mon, 10 Jul 2000 15:33:25 PDT."
  Date: Mon, 10 Jul 2000 16:58:00 -0700
  From: "Roy T. Fielding" <fielding@kiwi.ICS.UCI.EDU>
  Message-ID:  <>
  I meant that the filters, when written to as part of the output stream,
  are treated as a stack (write to the top-most filter without any knowledge
  of what may lie underneath it).  So the process of arranging filters
  for a particular response is like dropping them onto a stack.  When a
  filter is done or the stream is closed, each instantiated filter cleans
  up according to its local state and then destroys itself (as it is popped
  off the stack).
  This is completely separate from the registration of filters by
  name and purpose, which could be done by hooks.  The difference is that
  filters are registered at config time but only instantiated (given local
  storage) and arranged on a per stream basis.
  Bucket brigades is simply a way to encapsulate and pass data down the stream
  such that it can be as efficient as the sender desires, while retaining
  a simple interface.  The purpose of the bucket is to make handling of the
  data uniform regardless of its type, or make type-specific conversions
  via a single ADT call if and only if they are needed by some filter.
  The purpose of the brigade is to reduce the number of calling arguments
  and linearize the calling sequence for insertion filters.  Each filter
  definition is separate from its instantiation on the stream because
  there may be many streams operating at once within a single program.
  Each bucket is independent of the brigade so that the filters can rearrange
  and insert buckets at will.  Each data item is isolated by the bucket
  structure, which allows them to be split across child buckets or shared
  with multiple streams (e.g., cached objects).  We don't need to implement
  all of this on the first pass -- we just need to implement the ADT external
  interfaces such that they don't have to change as we make the overall
  stream more efficient.
  BTW, in case I didn't make this clear in past messages, this design is
  an amalgam of the best aspects of the designs from Henrik's Streams
  (see w3c-libwww), sfio (AT&T Research), IO-Lite (Rice Univ.), and
  libwww-ada95 (UCI).  The MIME stuff in MSIE is based on Henrik's streams.
  Henrik's stuff is very fast, but is spaghetti code because it relies on
  callbacks and legacy stuff in libwww.  sfio is great but is intended to
  be a complete replacement for stdio and hence does way too much and is
  subject to a few patents that I don't appreciate.  IO-Lite is cool but
  is probably only worth it when the entire OS is based on IO-Lite memory
  management, but regardless the code isn't available for commercial use.
  As Dean has mentioned many times, we already get most of the performance
  benefit of IO-Lite simply by avoiding memory copies on large writes.
  libwww-ada95 was an attempt to make Ada95 suck less for systems programming,
  which was only a partial success (it is very fast compared to other Ada95
  libraries, but memory management became a problem with complex filters).
  Writing our own streams library isn't NIH syndrome -- both Dean and I
  have independently investigated the other available alternatives and they
  just aren't suitable for our purpose.  Even with all that, my own design
  only truly pays off (versus plain old BUFF) when you make good use of
  sendfile and shared object caching.
  Other stuff Roy wrote on new-httpd:
  My buckets are passed around in list-queues (really just lists with front
  and back pointers).  My buckets carry data and metadata and meta-metadata.
  My buckets are used to indicate stream-end, and the filter configuration
  itself is determined by the stream content.  It probably sounds weird, but
  the effects of this interface are completely different than mere content
  filters.  They simplify everything.  I'm not saying that we have to
  simplify everything right away, but I am saying that it is just as easy
  to implement a fully-general filter using bucket brigades as it is
  to implement string interface filters -- all of the complex parts
  are handled by the ADTs.
  The real psychedelic stuff happens when you can pass metadata (tokenized
  header fields) as buckets and the filters know how to pass that down the
  chain without treating them as data.
  The purpose of passing a list of buckets around is to linearize
  the call stack for the frequent case of filtered content
  splitting one large bucket into separate buckets with filtered results
  interspersed in between.  The effect is that a filter chain can frequently
  process an entire message in one pass down the chain, which enables the
  stream end to send the entire response in one go, which also allows it
  to do interesting things like provide a content length by summing the
  data length of all the buckets' data, and set a last-modified time
  by picking the most recent time from a set of static file buckets.
  I think it would help if we stopped using artificial examples.  Let's
  try something simple:
         socket <-- http <-- add_footer <-- add_header <-- send_file
  send_file calls its filter with an ap_file_t bucket and End-of-Stream (EOS)
  in the bucket list.  add_header sets a flag, prepends another ap_file_t
  bucket to the list and sends the list to its filter.  add_footer looks
  at the list, finds the EOS, inserts another ap_file_t bucket in
  front of the EOS, and sends the list on to its filter.  http walks through
  the list picking up the (cached) stat values, notes the EOS and seeing
  that its own flag for headers_sent is false, sets the cumulative metadata
  and sends the header fields, followed by three calls to the kernel to
  send out the three files using whatever mechanism is most efficient.
  The point here isn't that this is the only way to implement filters.
  The point is that no other interface can implement them as efficiently.
  Not even close.  Yes, there are cases where string filters are just as
  efficient as any other design, but there is no case in which they are
  more efficient than bucket brigades.  The reason is that being able
  to process a list of strings in one call more than offsets the extra
  cost of list processing, regardless of the filter type, and allows
  for additional features that have benefits for http processing.
  Like, for example, being able to determine the entire set of resources
  that make up the source of this dynamic resource without teaching every
  filter about WebDAV.
  Making many small calls down the filter chain is something best
  avoided, which is why the bucket brigades interface consists of
  a linked list of buckets, such that all of the currently available
  data can be passed-on in a single call.
  Being able to handle sendfile, cached objects and subrequests is very
  effective at improving efficiency, which is why the buckets are typed.
  A filter that needs to do byte-level processing will have to call a
  routine to convert the typed bucket into a data stream, but that decision
  is delayed until no other choice is available and adds no overhead to
  the common cases of non-filtered or pre-/post-filtered objects.
  Being able to process header fields (metadata) through the same filter
  set as the data is necessary for correctness and simplicity in the
  proper ordering of independently developed filter modules, which is
  why the buckets can carry metadata on the same stream.  Every filter
  has to be knowledgeable about metadata because only the filter knows
  whether or not its actions will change the nature of the data.

View raw message