httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Pane <brian.p...@cnet.com>
Subject Re: httpd memory ownership and response pools was Re: [PATCH] Update to Brian's patch...
Date Sun, 22 Dec 2002 23:24:52 GMT
On Sat, 2002-12-21 at 15:50, Justin Erenkrantz wrote:
> --On Saturday, December 21, 2002 10:00 AM -0800 Brian Pane 
> <brian.pane@cnet.com> wrote:
> 
> > On Sat, 2002-12-21 at 02:05, Justin Erenkrantz wrote:
> >
> >> Saying a response pool 'doubles' the memory is an absolute
> >> non-starter.  Back that up, please.  If you're meaning the
> >> additional  8kb minimum for the separate pool, I seriously can't
> >> believe that's  going to be a major concern.
> >
> > I do consider it a cause for concern as we try to scale
> > the httpd to handle thousands of connections.
> 
> And, an additional 8k is going to grossly harm that?  Sorry, but I 
> don't believe that.  If that is your main reason against it, then go 
> change the pool code to have a minimum threshold of 4k.  Again, I'd 
> like to see proof that this will irreparably harm httpd's 
> architecture and scalability.  (Oh, 4k might kill any alignment 
> benefits, but hey, you want to keep our memory footprint constant 
> rather than our speed.)

Actually, I don't want to keep the memory footprint constant;
I want to decrease it.  Currently, we dedicate at least
24KB (8KB each for request pool, connection pool, and bucket
allocator) to each connection.  At 1,000 connnections, that's
24MB of allocated by mostly unused that I'd much rather be
devoting to something useful like the filesystem cache.

> > I have mixed opinions on response pools.  They have all the usual
> > benefits of pools, but with a couple of big complications.  They're
> > great if they enable you to free up the request pool when you're
> > finished generating the response.  But there's data in the request
> > pool that really needs to survive until after the response has been
> > sent--specifically, anything needed in support of the logger.  That
> 
> Doesn't that mean that that information should be allocated from the 
> response pool instead?

Probably, although that's a slippery slope; when you move everything
needed to either deliver or log the response into a response pool,
there may be very little left in the request pool for typical
requests.  In that case, it would be easier to just alloc everything
from the request pool, and keep the request pool in existence until
the full response is set.

That's for typical requests, though.  For any app that uses a lot of
temporary space when generating a request (I think you had an example
of a module that did this, but I can't remember which one), it would
be beneficial to be able to free up all that working memory as soon
as the generation is finished.

That begs the question, though: how common are such modules?  If
it's common to have a large ratio of "request data" to "response
data," then it makes sense to have separate pools.  If it's uncommon,
then it's better to have just one pool, to eliminate the risk of
the core or any module every using the wrong pool for a given
allocation--and let the exceptional modules that do need a memory
cleanup pre-response create their own temp pools.

I suspect that modules that need separate request and response pools
are very uncommon, but let me know what you think...  Either way,
it would be a big win to be able to keep the pool containing the
response data in existence until we sent all the data allocated
from it; that could eliminate the need for some very brittle bucket
setaside logic.

> > leaves us with the choice of either keeping the request pool around
> > (which somewhat defeats the point of using a response pool) or
> > copying the needed request and connection info to the response pool
> > (which works but is slightly wasteful).  The other problem with
> 
> I'm not saying this is an easy change - it's a radical departure from 
> our current architecture (only suited for 2.1 and beyond).  I don't 
> think you'd copy anything from the response pool to the request pool. 
> We'd change the lifetime of whatever needs to live as long as the 
> response instead of the request.  The request lifetime ends as soon 
> as the EOS is generated.  The response lifetime ends as soon as the 
> EOS is consumed.  Only when we're positive that all code is 
> allocating from the correct pools can we switch to destroying the 
> request pool when the EOS is seen rather than consumed.  And, since 
> we're relying on pools, we get error cleanups for free.
> 
> > response pools is that they're pools.  If you have an app that's
> > generating a large response in small units, you'll have a lot of
> > small brigades being allocated from the response pool, and you
> > won't be able to free up any of that space until the entire
> > response is finished.  This is bearable for most HTTP connections,
> > but it won't work for connections that live forever.
> 
> Connections that live forever?  Don't you mean a response that never 
> ends?  (If you had a connection that lived forever with multiple 
> responses, the memory would be cleaned up as the response is sent.) 

No, the problem is that if the brigade is ever copied to a
connection-level pool, it won't get deallocated until the
connection closes.  For HTTP, it's bearable.  But if we use
brigades for connections that live forever (think app server
or database connections, for example), then we'll have a memory
leak if ever a brigade is allocated from a connection-level pool.

> So, how is your model going to make things appreciably better?  The 
> only thing you'd gain is that you might be able to reuse a brigade 
> sooner.
> 
> As seen in our code, we're *trying* to reuse the brigade memory right 
> now.  So, where is the big benefit of a bucket allocator for brigades 
> right now?  I don't think we're creating an infinite number of 
> brigades.  Most of our filters seem to be trying to reuse the 
> brigades it creates.  It isn't creating a brigade every filter call - 
> they are usually doing their best to create only one brigade (perhaps 
> two or three - yet a constant number).  While the number of buckets 
> may be unbounded (since we don't know how large the response is), the 
> number of brigades should almost certainly be (near) constant 
> (probably as a factor of the number of filters involved).
> 
> This is why everything falls down with your patch.  The 
> core_output_filter shouldn't be destroying a brigade it doesn't own. 
> It should only be clearing them.  It didn't create them, so it 
> shouldn't destroy it.

That's actually independent of my patch.  The core_output_filter
has been deleting brigades all along.  That's most likely a bug,
given Bill's observation that the code is then reusing brigades
after destroying them; it only works as a side-effect of the
deleted brigade still having space devoted to it in a pool.

>   (The only time it should destroy is when it 
> setaside the brigade - seems easy enough to fix.)  We're already 
> reusing the bucket memory, so no big win there.
> 
> I have a feeling you want to switch ownership of all memory into an 
> implicit ownership by core_output_filter.

More generally, I want to transfer ownership to the entity that
consumes the data, so that we don't require the generator of the
data to be responsible for cleaning it up.

>   (I think you were 
> expecting this to be occurring right now, but it isn't.)

No, I'm only expecting that for post-2.0.  In 2.0, the brigade is
owned by the pool.

>   Only 
> core_output_filter would be responsible for freeing all memory (note 
> that it isn't responsible for allocating memory).  However, this 
> would require a rewrite of all the code to be built around the 
> assumption that if it passes a brigade down, it loses ownership of 
> that brigade.

I think we'll have to lose that assumption no matter what in
2.1 and later, because passing a brigade may mean handing it
to another thread.

>   Then, you also introduce the problem with brigades 
> becoming numerous (parallel to the number of buckets).  Hence, we 
> must allocate brigades from the bucket allocator rather than a pool.
> 
> But, isn't this creating more work for httpd if it were using a 
> synchronous MPM rather than an async MPM?  Do we really think that 
> your ownership model works well for *all* cases?  I'm beginning to 
> think it doesn't.  Is it really cheaper to call the bucket allocator 
> every time we pass something down the filter chain than reusing the 
> same brigade for the entire duration of the request in a filter?  I 
> would be suspicious of that claim under a sync MPM.

Reusing the brigade is easy, but I don't think it's likely to be
an option for a filter that needs to work with both sync and async
MPMs.

> Is there a compromise?  I think so.  A hybrid completion MPM isn't a 
> true aysnc MPM.  However, the memory ownership model in this hybrid 
> could still be identical to the sync MPM.  Once we see EOS, we can 
> clear the request pool and transfer the request to a 'completion' 
> thread/process that multiplexes all finished-not-sent-responses. 
> This is why it is crucial to have a response pool in addition to the 
> request pool - all information pertaining to the response live in the 
> response pool not the request pool.  (All intermediate data that 
> helps to produce the response is in the request pool, of course.)

I've been a proponent of the hybrid completion model in the past
(see some of my notes in the ROADMAP file), but I think of it as
a stepping-stone to a full async MPM, rather than as something
actually usable.

The problem with the hybrid completion model is that it's too
easy to conduct a DoS against it.  If we release an MPM that
reads requests and generates responses in a thread-per-request
pool and then hands off the brigades to an event-loop thread to
do the write, all an attacker needs to do is open enough connections
to fill up the request thread pool (which typically will be a lot
smaller than the thread pool in, say, worker, because the whole
point of the hybrid design is to let the server administrator
run with fewer threads) and never send a request.

> This allows the current sync thread to go back and process a request 
> as soon as it can.  When this completion thread is done, it can 
> simply clear the pool as it is guaranteed ownership.  Once the EOS is 
> seen, we *know* that all filters are done with the response.  So, if 
> we implement this hybrid MPM, what benefits are we gaining from 
> further switching to a fully async MPM?  -- justin

This is also susceptible to a DoS on the write side.  If a malicious
client requests a URI that's handled by a streaming generator (like
a CGI) and then never reads the response, the writes will block as
soon as the outgoing TCP buffer fills up.  If the writes are being
handled by a multiplexed I/O completion thread at that point, no
problem.  But in the case of a streamed response, if the request
handler thread is allowed to do the socket writes until it produces
an EOS, then you'll block the request handler thread.

In order to have a small number of request handler threads without
making the server too vulnerable to such attacks, we'll need to
ensure that the client can't have a huge influence on how much
time a request handler thread (our scarce resource in an aysnc
MPM) spends processing a request.  This basically means doing all
the network reads and writes in a non-scarce resouce, such as a
dedicated I/O thread that can multiplex hundreds of sockets.

We'll still face some major design challenges due to the "client
sends a request for a streamed resource and then never reads" case.
We don't want to let the request processing thread generate an
unbounded amount of response data that we have to buffer until
the connection becomes writable or is aborted.  That leaves two
choices that I can think of: either make the request processing
thread block if there's too much unsent data on the connection,
or design such modules as state machines that can be set aside
in the output overflow condition and then resumed when the connection
is writable.  Either way, it's a nasty problem. For the post-2.0
releases, we may also want to explore ways to partition the
request handling thread pool to keep simple static requests
separate from dynamic requests, so that a server can still
serve static requests even if all the dynamic request threads
are busy.

Brian



Mime
View raw message