httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Stein <>
Subject Re: recap of filtered I/O
Date Sun, 04 Jun 2000 23:16:21 GMT
On Sun, 4 Jun 2000, Life is hard, and then you die wrote:
> On Sat, Jun 03, 2000 at 02:26:16PM -0700, Greg Stein wrote:
> > *) how would my insert-100Mb-filter be written?
> I Ryan addressed this point with the filter being able to signal that
> it hasn't sent all yet.

In the hook-based scheme, the hook is only called when the
content-generator writes some output (e.g. via ap_rwrite()). If the filter
isn't going to write all the data at that point, then when will it?

    ap_rwrite("foobar<!--#giant-data -->bleek");
    ap_rwrite("more stuff");

That 100Meg needs to go out before the "bleek" can be written. Either, it
gets completely dumped into an iovec[], or the hook calling will look

    iovec[0] = (buf, len)
    run_hook(iovec, &more_to_write)
    while more_to_write:
        iovec = empty
        run_hook(iovec, &more_to_write)

That just seems a bit complicated :-)

I'm not sure, but you might be confusing the "I have more to write" with
the concept of "I haven't processed all the input yet."

The latter situation is for something like:

    ap_rwrite("lude foo.html -->");

In the above scenario, mod_include is going to stash away the "<!--#inc"
part and wait for the rest of the command. At the end of the response, a
"flush" is performed. If mod_include has a partial command, then it raises
a syntax error (due to EOF) at that point.

[ both schemes can handle the "wait for the rest of the input" scenario; I
  believe that the link-based approach is a bit easier, it that it
  provides an explicit context pointer for maintaining this data; the
  hook-based approach requires the filter to attach "user data" to the
  request (which then prevents "SetFilters SSI PHP SSI" because there is
  only one "SSI user data slot" ]

> > *) if the filter wants to insert a file, then how do we get the FD
> >    returned to Apache so that it can use sendfile/TransmitFile?
> This is something that's not clear to me in *either* scheme. Can you
> elaborate how this should be done using layers?

Quite easily :-)

  mod_include::filter_callback(this_layer, buf, len):
     ap_lwrite(this_layer->next, plain_text)
     ... oh! found a #include ...
     ap_lsend_fd(this_layer->next, fd)
     ap_lwrite(this_layer->next, plain_text)

Under the covers, ap_lsend_fd() looks like:

  ap_lsend_fd(next_layer, fd):
     if next_layer == NULL:
        mmap_thingy map = mmap_file(fd)
        invoke_callback(next_layer, map.buf, map.len)

Essentially, if you are the last layer, then you can sendfile/TransmitFile
directly to the network. Otherwise, the file gets mmap'd in and passesd to
the next layer.

> > *) flow control
> I don't see this as really any different from the layer approach. But
> I'm assuming that a filter/content-generator that creates large output
> will do so in parts (third point above), and only smaller stuff will
> slurp all up and send it in one write.

A hook-based filter never generates a block-on-the-network event. It
stuffs everything into the iovec[], regardless of what is happening with
the network.

Consider the following:

  my_layer_callback(layer, buf, len):
     while 1:
        buf = read_next_file_block()
        buf2 = do_some_processing(buf)
        ap_lwrite(layer, buf2)

In the above code, the ap_lwrite() can block on the network, so we pause
in reading the file.

In the hook-based scheme, the filter must read in the whole file, process
it, and return the processed data to the caller. Congestion on the network
will not slow this processing down.

> > *) how would "SetFilter SSI PHP" be implemented in the hook scheme?
> >    (note this implies a table mapping names to functions; also, I believe
> >     that solving this directive for the hook-based scheme will essentially
> >     look just like the link-based scheme)
> I think this is a minor point. The hook scheme allows you to order a specific
> filter after another one. Yes, it essentially degrades to a linked-list.

Actually, I don't believe it is all that minor. If the typical behavior is
to order them, and this degrades to a linked-list, then why are we
bothering with the hook-based scheme and its iovec complexity? (among my
other concerns raised in the recap)

One other item here:

In both schemes, we call the (new) "install_filters" hook on the modules.
In the hook-based scheme, this inserts per-request hooks. In the
link-based scheme, this adds to a linked-list of layers.

One of the things about the hook-based scheme is the notion of "we don't
care who registered in the per-request chain... we'll just call them; they
might have ordered themselves." Well, this "opaqueness" of the callbacks
totally disappears when you add support for the "SetFilter" directive.
Why? Because modules are no longer adding private functions into the
per-request chain. An external mechanism must find the function pointers
and insert them appropriately.

> OTOH, I've been thinking how this would integrate with Dean's proposal to
> have a dedicated process/thread which serves slow clients (via some
> select/poll based pseudo-asynch-io) so as to free up the processes/threads
> for doing the content-generation/filtering work as quickly as possible,
> which I think is a really cool idea.  With the iovec's it seems more
> straightforward at first glance, because it would get all the data it
> needs in one go. However, as soon as the filters can produce partial
> output this reverts to what I'm guessing will be needed for the layer
> scheme: a largish buffer into which the stuff is assembled (i.e copied),
> which the slow-client-process can then dribble to the client at will. So
> I'm not sure I see any advantage here using iovec's either. However, I
> suspect I may be missing something, because I also don't see how sendfile
> integrates into these schemes - I'm starting to think Roy's
> bucket-brigades may be the way to go after all.

I do not believe you're missing anything either. In both schemes, they can
take the (filtered) network output and pass that off to the delivery
thread. If the response is fully generated and passed off, then the worker
thread can start on another request/response. Both schemes could do this
invisibly w.r.t. the filters.

There is no advantage to either scheme in terms of the delivery thread,
presuming the "iovec" in the hook-based scheme is made a bit more complex.

I do believe the link-based approach has some simplicity advantages. The
ability to pass a file descriptor to the network layer, for example. The
network layer can pass that fd off to the delivery thread, which can then
use sendfile() on it. In the hook-based system, when we say "iovec", we
really mean something like Roy's bucket brigades -- each element would be
typed as a buffer, as a file descriptor, or whatever. IMO, introducing the
buckets into the system is more complex than different types of writes
(e.g. ap_lwrite, ap_lsend_fd, ap_lwritev)


Greg Stein,

View raw message