httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Erenkrantz <jerenkra...@apache.org>
Subject Re: httpd memory ownership and response pools was Re: [PATCH] Update to Brian's patch...
Date Sat, 21 Dec 2002 23:50:19 GMT
--On Saturday, December 21, 2002 10:00 AM -0800 Brian Pane 
<brian.pane@cnet.com> wrote:

> On Sat, 2002-12-21 at 02:05, Justin Erenkrantz wrote:
>
>> Saying a response pool 'doubles' the memory is an absolute
>> non-starter.  Back that up, please.  If you're meaning the
>> additional  8kb minimum for the separate pool, I seriously can't
>> believe that's  going to be a major concern.
>
> I do consider it a cause for concern as we try to scale
> the httpd to handle thousands of connections.

And, an additional 8k is going to grossly harm that?  Sorry, but I 
don't believe that.  If that is your main reason against it, then go 
change the pool code to have a minimum threshold of 4k.  Again, I'd 
like to see proof that this will irreparably harm httpd's 
architecture and scalability.  (Oh, 4k might kill any alignment 
benefits, but hey, you want to keep our memory footprint constant 
rather than our speed.)

> I have mixed opinions on response pools.  They have all the usual
> benefits of pools, but with a couple of big complications.  They're
> great if they enable you to free up the request pool when you're
> finished generating the response.  But there's data in the request
> pool that really needs to survive until after the response has been
> sent--specifically, anything needed in support of the logger.  That

Doesn't that mean that that information should be allocated from the 
response pool instead?

> leaves us with the choice of either keeping the request pool around
> (which somewhat defeats the point of using a response pool) or
> copying the needed request and connection info to the response pool
> (which works but is slightly wasteful).  The other problem with

I'm not saying this is an easy change - it's a radical departure from 
our current architecture (only suited for 2.1 and beyond).  I don't 
think you'd copy anything from the response pool to the request pool. 
We'd change the lifetime of whatever needs to live as long as the 
response instead of the request.  The request lifetime ends as soon 
as the EOS is generated.  The response lifetime ends as soon as the 
EOS is consumed.  Only when we're positive that all code is 
allocating from the correct pools can we switch to destroying the 
request pool when the EOS is seen rather than consumed.  And, since 
we're relying on pools, we get error cleanups for free.

> response pools is that they're pools.  If you have an app that's
> generating a large response in small units, you'll have a lot of
> small brigades being allocated from the response pool, and you
> won't be able to free up any of that space until the entire
> response is finished.  This is bearable for most HTTP connections,
> but it won't work for connections that live forever.

Connections that live forever?  Don't you mean a response that never 
ends?  (If you had a connection that lived forever with multiple 
responses, the memory would be cleaned up as the response is sent.) 
So, how is your model going to make things appreciably better?  The 
only thing you'd gain is that you might be able to reuse a brigade 
sooner.

As seen in our code, we're *trying* to reuse the brigade memory right 
now.  So, where is the big benefit of a bucket allocator for brigades 
right now?  I don't think we're creating an infinite number of 
brigades.  Most of our filters seem to be trying to reuse the 
brigades it creates.  It isn't creating a brigade every filter call - 
they are usually doing their best to create only one brigade (perhaps 
two or three - yet a constant number).  While the number of buckets 
may be unbounded (since we don't know how large the response is), the 
number of brigades should almost certainly be (near) constant 
(probably as a factor of the number of filters involved).

This is why everything falls down with your patch.  The 
core_output_filter shouldn't be destroying a brigade it doesn't own. 
It should only be clearing them.  It didn't create them, so it 
shouldn't destroy it.  (The only time it should destroy is when it 
setaside the brigade - seems easy enough to fix.)  We're already 
reusing the bucket memory, so no big win there.

I have a feeling you want to switch ownership of all memory into an 
implicit ownership by core_output_filter.  (I think you were 
expecting this to be occurring right now, but it isn't.)  Only 
core_output_filter would be responsible for freeing all memory (note 
that it isn't responsible for allocating memory).  However, this 
would require a rewrite of all the code to be built around the 
assumption that if it passes a brigade down, it loses ownership of 
that brigade.  Then, you also introduce the problem with brigades 
becoming numerous (parallel to the number of buckets).  Hence, we 
must allocate brigades from the bucket allocator rather than a pool.

But, isn't this creating more work for httpd if it were using a 
synchronous MPM rather than an async MPM?  Do we really think that 
your ownership model works well for *all* cases?  I'm beginning to 
think it doesn't.  Is it really cheaper to call the bucket allocator 
every time we pass something down the filter chain than reusing the 
same brigade for the entire duration of the request in a filter?  I 
would be suspicious of that claim under a sync MPM.

Is there a compromise?  I think so.  A hybrid completion MPM isn't a 
true aysnc MPM.  However, the memory ownership model in this hybrid 
could still be identical to the sync MPM.  Once we see EOS, we can 
clear the request pool and transfer the request to a 'completion' 
thread/process that multiplexes all finished-not-sent-responses. 
This is why it is crucial to have a response pool in addition to the 
request pool - all information pertaining to the response live in the 
response pool not the request pool.  (All intermediate data that 
helps to produce the response is in the request pool, of course.)

This allows the current sync thread to go back and process a request 
as soon as it can.  When this completion thread is done, it can 
simply clear the pool as it is guaranteed ownership.  Once the EOS is 
seen, we *know* that all filters are done with the response.  So, if 
we implement this hybrid MPM, what benefits are we gaining from 
further switching to a fully async MPM?  -- justin

Mime
View raw message