httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <>
Subject Re: Filter I/O take 3.
Date Sun, 18 Jun 2000 10:39:45 GMT
On Sat, Jun 17, 2000 at 06:00:59AM -0700, wrote:
> > *) complexity due to the ioblock/ioqueue stuff
> > 
> > *) I do not understand how sub requests are handled in this scheme
> >    (essentially, how are the two sets of hooks joined together)
> > 
> > *) if I fetch 100Mb from a database for insertion into the content stream,
> >    how does that work in this scheme? (flow control, working set)
> > 
> > *) flow control (I didn't see how this works)
> > 
> > *) working set size (since all data must occur on the heap)
> > 
> > I'm off to sleep soon, but today I'm going to work on fixing up the common
> > stuff between the alternate schemes. For example, we agree that an
> > "insert_filters" hook needs to be added. There are a number of ap_rputs()
> > (and similar) calls that need to change to BUFF operations. etc.
> A couple of things.  Don't do this!  By you making these changes now, you
> break a patch that is being considered.

1) There are common elements between both approaches. I believe we should
   get that stuff ironed out first. That will make the "meat" of the
   filtering much smaller and reviewable.

   It also means that the non-meat portions can be effectively reviewed. For
   example, your change from ap_rvputs() to ap_bvputs() was incorrect. I saw
   it because I'm reviewing your filtering patches very closely. Did anybody
   else? I seriously doubt it. It seems most people have tuned out :-(
   We are in no rush to get the "meat" checked in, so it seems eminently
   better to work on getting the fundamentals completed first. Specifically,
   that means changing ap_rwrite types of calls, which write non-content,
   over to direct buffer calls.
   [ caveat: actually, there may be subtle gradations here; need to check
     with Ronald for the digest stuff; for example: when generating a
     multipart message, what parts do we digest? just the parts, or do we
     include the whole response body, which includes the boundaries and
     per-part headers? the answer could change some of the writes. ]

2) "considered" isn't entirely true, as the patch has been vetoed. Until
   some basic changes occur, I don't see that veto changing.

>... commentary on link-based scheme:

> 1)  Modules must maintain their own stat for the data they haven't dealt
> with yet.

In the link scheme, each ap_layer_t structure has a context pointer. This
allows a layer to store arbitrary context about where they are.

These context pointers are analogous to the r->unfiltered[] concept that you
introduced in your latest patch. You alloc one per layer; I put them right
in with the layer. The unfiltered[] thing is specific to "unprocessed input"
while the context pointer is arbitrary (and may include unprocessed input
along with other values, such as the parsing state of a finite automata that
is operating within the filter).

IMO, associating the state in the ap_layer_t is more direct than a secondary
array. I'm also a bit uneasy about keeping the two in sync (although it is
quite unlikely that a filter would be added/removed/re-ordered after the
first content byte goes out), and that it relies on the .nelts field which
seems rather private to the hook implementation.

> The hook scheme just has them return that data, and Apache
> hands it back to them later.  BTW, the only way I can see to really solve
> this is to either use iovec's inside the module (invalidating the
> ioblock/ioqueue argument) or strcat data (invalidating the working
> set/ stack/heap argument).

I'm unaware of any particular issues with storing state in the ap_layer_t.
It doesn't seem like the considerations mentioned just above apply; please
correct me otherwise.

> 2)  Take a look at the bwrite code.  If we don't write enough data at
> once, bwrite makes a copy of all of the data in order to concatenate it
> together.  The hook based scheme minimizes this problem in two
> ways.  1)  It encourages people to maintain big chunks to send at
> once.  2)  There is an obvious optimization where we can combine data to
> create larger chunks.  The link based scheme is going to promote writing
> small chunks of data down the pipe.  IMNSHO, this invalidates the working
> set/stack/heap argument, because both cases have this problem due to
> bwrite.

Eh? This is the entire design of the BUFF code. Accumulate data into a
pre-alloc'd buffer. When it gets large enough, dump it to the network. If
the input buffer is large enough, then drop it straight to the network.

I disagree with your characterization about the two schemes. I look at your
mod_gargle and see a bunch of small buffers created, and each is dumped to
ap_bwrite() in the filter_io() function.

As you make it easier to create ioblocks (e.g. ap_ioblock_add("string")),
then people will start appending little bits to the ioqueue. Same situation
as the link-based scheme. Any coalescing that can be done in the hook scheme
would duplicate what BUFF is already doing. IOW, even if one of the schemes
"can" coalesce blocks, it is a moot point.

In today's Apache, I look at output_directories() in mod_autoindex.c, and
see a function which calls a bazillion ap_rputs() calls for small buffers.
Apache does okay there, buffering those babies up.

Neither filtering scheme is going to encourage any particular buffer size.
They'll be big or small, many or few, as the module writer desires. I don't
see this (2) as an actionable "issue" with the link-based scheme.

> 3)  Modules must filter data without any context about what else is going
> on around the data they are directly working on.  Unless of course they
> maintain that data themselves.  There is an obvious change to the hook
> scheme that allows us to send a minimum amount of data to a hook at any
> one time.  This cannot be done for the link based scheme (at least not
> cleanly IMHO.)

I am unsure what you are saying here. It seems like you're suggesting that
we buffer up data and pass larger chunks to the filters. That seems possible
with either scheme.

Am I missing something here?

> 4)  We are jumping up and down the stack and basically arriving at the
> same place.  The link based scheme encourages people to write small bits
> of data to the next filter, which will make us jump up and down the stack
> all the time.  The only solution to this is to use iovecs (Roy's bucket
> brigade) which invalidates the ioblock/ioqueue argument.

This characterization is similar to (2), which I don't feel is a valid
assessment. There is no "encouragement" one way or another.

I do not see how an iovec can assist here (in accumulating larger bits).


1) state. stored in the ap_layer_t as a void*
2) "the link scheme encourages small writes; small writes are a problem"
   I disagree on both accounts.
3) buffering data to pass to the filters. possible for both. this would look
   about the same for both, too.
4) similar to (2). something about an iovec.

> I think that's it.  BTW, as far as getting negative feedback from the
> network, Apache doesn't do that currently.  Apache checks to see how much
> data is being written, and if it is less than a certain size, it buffers
> it.  If it is larger, it writes.

And that write will block if you shove too much into the socket.

Consider the following link-based filter:

    my_callback(layer, buf, len):
        ... figure out we're inserting database content here ...
	blob = fetch_blob_cursor(...)
	while db.more_exists():
	... other stuff ...

Since the ap_lwrite() can potentially block on the network, it means that
the entire database -> network copy will be paced at the speed that the
client truly needs/desires.

In the hook-based scheme, the entire blob must be read onto the heap and
passed back through the ioqueue. That falls down *very* hard when that blob
is a gigabyte of streaming video being yanked out of a database.

[ this example demonstrates flow control and working set problems with the
  hook-based scheme ]

> buff.c.  The hook scheme easily duplicates this behavior and there is a
> comment on how to do it.  In the patch I submitted there is a comment with
> MAX_STRING_SIZE in it.  That needs to be changed to

That would be an artificial limiter. The network is where the true limit and
pace is determined.

Okay, the "bad" news. I'm off to Napa for a few days. I'll be back
Wednesday. I might get a chance to read email in between the wine and
relaxation :-), but maybe not. Unfortunately, this also means that I won't
have an opportunity to remove my current veto on any hook-based patches.

Common work that still needs to be done:
- add the "insert_filters" hook
- some ap_rputs() calls switched to use an internal function (which looks
  exactly like today's ap_rputs); later, we'll switch ap_rputs() to filter
  content, which won't affect the non-content writes.
- automatically call ap_send_http_header() on the first output, or when the
  request terminates with an empty-body response such as 204
- add/use trailer_hdrs to request_rec for mod_auth_digest
- create a flush_filters() to be called from ap_finalize_request_protocol();
  will be filled in later.
  (this is also where the trailer_hdrs are sent out)

Back on Wednesday...


Greg Stein,

View raw message