httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Roy T. Fielding" <field...@kiwi.ICS.UCI.EDU>
Subject Re: PLEASE READ: Filter I/O
Date Sat, 24 Jun 2000 07:50:45 GMT
Gee, another few months of this and you guys will arrive at the
bucket brigades design.  ;-)

Seriously though, there are two central problems that are being
overlooked in the desire to just hack a solution:

  1) no layered-IO solution will work with the existing memory
     allocation mechanisms of Apache.

The reason is simply that some filters can incrementally process data and
some filters cannot, and they often won't know the answer until they have
processed the data they are given.  This means the buffering mechanism
needs some form of overflow mechanism that diverts parts of the stream
into a slower-but-larger buffer (file), and the only clean way to do
that is to have the memory allocator for the stream also do paging
to disk.  You can't do this within the request pool because each layer
may need to allocate more total memory than is available on the
machine, and you can't depend on some parts of the response being
written before later parts are generated because some filtering
decisions require knowledge of the end of the stream before they
can process the beginning.  That's life.  If it were easy, I would
have put it in Apache 1.2 when we added HTTP/1.1 support.

This is not solved by copying the buffers at each layer.  This is
not solved by having the generator "hold onto" some parts of the
stream and generate them at a later time.  You can't solve the problem
by limiting the data stream because the front of the stream cannot
"unblock" itself until it gets a chance to see the end.  This is why
I said early-on that I wasn't interested in any partial solution
to the filtering problem, and haven't cared to debate half-measures.
Besides, solving memory allocation also gets you sendfile, caching,
and subrequests, all using the same uniform interface, which was the
original goal of layered-IO from back on the nature trail in `98.

  2) the purpose of the filtering mechanism is to provide a useful
     and easy to understand means for extending the functionality of
     independent modules (filters) by rearranging them in stacks
     via a uniform interface.

As much as I appreciate Ryan's desire to get something working within
the existing architectural hooks, the fact of the matter is that the
hook mechanism absolutely sucks when it comes to ease of understanding.
We could spend months working on that code and I will bet you that it
still won't work robustly and you will never be able to figure out
why module authors can't seem to restrain themselves to the dozen or
so invisible side-effect-driven design constraints inherent in a
relay architecture.  That isn't because it can't work -- it is because
it isn't a good abstraction of the basic design concept of a data-flow
network of filters.  Using a link-based mechanism is a better abstraction,
which is why it has been used in every other filtering library available.

I don't understand this concern about stack size and register windows.
Filters do not have large per-routine stacks -- the filter state and
new data will be passed-in by two or three pointers to heap structures,
at most.  While there is a theoretical possibility of recursion and
long filter lists, in practice (and remember that I've implemented this
already for libwww-ada95 and was peripherally involved in libwww) the
filters tend to be arranged in strings of four or five.  The notion that
this would cause visible delays in processing is just ludicrous and
has been proven not to be the case in all of the filtering libraries.
There have been many published articles on sfio and libwww performance
relative to other architectures and I don't recall any of them mentioning
register windows or stack size as a concern.  And they all work on SPARC.
Recursion on SPARC is only an issue for things like quick sort on very
large data sets.

Making many small calls down the filter chain is something best
avoided, which is why the bucket brigades interface consists of
a linked list of buckets, such that all of the currently available
data can be passed-on in a single call.

Being able to handle sendfile, cached objects and subrequests is very
effective at improving efficiency, which is why the buckets are typed.
A filter that needs to do byte-level processing will have to call a
routine to convert the typed bucket into a data stream, but that decision
is delayed until no other choice is available and adds no overhead to
the common cases of non-filtered or pre-/post-filtered objects.

Being able to process header fields (metadata) through the same filter
set as the data is necessary for correctness and simplicity in the
proper ordering of independently developed filter modules, which is
why the buckets can carry metadata on the same stream.  Every filter
has to be knowledgeable about metadata because only the filter knows
whether or not its actions will change the nature of the data.

You don't get to the bucket brigades design by piecemeal adaptation
of the existing architecture.  I got there by experiencing several
other failed designs.  I was hoping that someone would take the
design from <199911122357.aa18914@gremlin-relay.ics.uci.edu> I posted
on 12 Nov 1999 and run with that, but I wouldn't require it in general.
I'm just not interested in anything less powerful than the right design.
In other words, count me as -0 on the whole filter deal.  I'll read the
threads as they come along (when I can keep up), but my lack of comment
is due to me being a stubborn old fart and not because I am not paying
attention to the discussion.

....Roy   :p

Mime
View raw message