httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Leggett <>
Subject Re: Httpd 3.0 or something else
Date Mon, 09 Nov 2009 23:47:40 GMT
Greg Stein wrote:

>> How is "pull" different from "push"[1]?
> The network loop pulls data from the content-generator.
> Apache 1.x and 2.x had a handler that pushed data at the network.
> There is no loop, of course, since each worker had direct control of
> the socket to push data into.

As I said in [1], apart from the obvious ;)

>> Pull, by definition, is blocking behaviour.
> You may want to check your definitions.
> When you read from a serf bucket, it will return however much you ask
> for, or as much as it has without blocking. When it gives you that
> data, it can say "I have more", "I'm done", or "This is what I had
> without blocking".

Who is "you"?

Up till now, my understanding is that "you" is the core, and therefore
not under control of a module writer.

Let me put it another way. Imagine I am a cache module. I want to read
as much as possible as fast as possible from a backend, and I want to
write this data to two places simultaneously: the cache, and the
downstream network. I know the cache is always writable, but the
downstream network I am not sure of, I only want to write to the
downstream network when the downstream network is ready for me.

How would I do this in a serf model?

>> You will only run as often as you are pulled, and never more often. And
>> if the pull is controlled by how quickly the client is accepting the
>> data, which is typically orders of magnitude slower than the backend can
>> push, you have no opportunity to try speed up the server in any way.
> Eh? Are you kidding me?
> One single network thread can manage N client connections. As each
> becomes writable, the loop reads ("pulls") from the bucket and jams it
> into the client socket. If you're really fancy, then you know what the
> window is, and you ask the bucket for that much data.

That I understand, but it makes no difference as I see it - your loop
only reads from the bucket and jams it into the client socket if the
client socket is good and ready to accept data.

If the client socket isn't good and ready, the bucket doesn't get pulled
from, and resources used by the bucket are left in limbo until the
client is done. If the bucket wants to do something clever, like cache,
or release resources early, it can't - because as soon as it returns the
data it has to wait for the client socket to be good and ready all over
again. The server runs as slow as the browser, which in computing terms
is glacially slow.

>> Push however, gives you a choice: the push either worked (yay! go
>> browser!), or it didn't (sensible alternative behaviour, like cache it
>> for later in a connection filter). Push happens as fast the backend, not
>> as slow as the frontend.
> Push means that you have a worker per connection, pushing the response
> onto the network. I really would like to see us get away from a worker
> per connection.

Only if you write it that way (which we have done till now).

There is no reason why one event loop can't handle many requests at the
same time.

One event loop handling many requests each == event MPM (speed and
resource efficient, but we'd better be bug free).
Many event loops handling many requests each == worker MPM (compromise).
Many event loops handling one request each == prefork (reliable old

In theory if we turn the content handler into a filter and bootstrap the
filter stack with a bucket of some kind, this may work.

In fact, using both "push" and "pull" at the same time might also make
some sense - your event loop creates a bucket from which data is
"pulled" (serf model), which is in turn "pulled" by a filter stack
(existing filter stack model) and "pushed" upstream.

Functions that work better as a "pull" (proxy and friends) can be
pulled, functions that work better as a "push" (like caching) can be


View raw message