Mailing-List: contact dev-help@apr.apache.org; run by ezmlm
Precedence: bulk
Date: Sun, 23 Sep 2001 20:09:36 -0400 (EDT)
From: Cliff Woolley <cliffwoolley@yahoo.com>
To: Ryan Bloom <rbb@covalent.net>
cc: <dev@apr.apache.org>
Subject: Re: bucket free lists (was Re: [PATCH] buckets sms phase 2)
In-Reply-To: <20010923215739.B998246DFC@koj.rkbloom.net>
Message-ID: 
 <Pine.LNX.4.33.0109231920120.672-100000@deepthought.cs.virginia.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

On Sun, 23 Sep 2001, Ryan Bloom wrote:

> Mostly good.  I did notice a few ommissions that will be necessary.
>
> apr_status_t apr_bucket_list_init(int num_threads)
>
> This function creates the static array.  We need to tell them how many
> threads will be creating pools.

We'll also need the apr_bucket_alloc(int list) and apr_bucket_free
functions, whatever they're called.  I left all these out for now because
I just wanted to be sure that the other part was what you had had in mind.

> I assume we'll be removing the apr_bucket_*create_in functions once
> the code is finished.

We can.  But if we're going to do that then let's forget all this #define
business and just go ahead and change the API.  That saves us changing the
names to _create_in and then changing them back.

> I am trying to decide if we should pass in a pool or an integer. I
> think we need to at least associate a pool with each array slot.
> That way, the buckets will always be allocated out of a pool, so that
> they are guaranteed to be cleaned up properly.  The reason to not pass
> a pool each time, is that we want to thread pool to be the pool
> associated with array slot, not the request pool.
>
> So, I think we will also need:
>
> apr_status_t apr_bucket_list_pool(int num, apr_pool_t *p)
>
> Feel free to change that name.

I've been debating this whole allocate-out-of-the-thread-pool thing.  If
some request manages to generate 8192 1-byte buckets before freeing any of
them, then we have forced the amount of allocated memory for that thread
to be roughly 8192*80 ~= 640KB and to stay that way for as long as the
thread lives.  I don't think that's what we want.  Perhaps if we created a
subpool for each slot and allocated the buckets out of those?  We could
then clean up the pool at certain intervals [once a connection, once every
ten connections, it doesn't matter] and start over again... of course that
would require some other API function to do it
(apr_bucket_list_cleanup(int num) or something like that).  What do you
think?

Alternatively, of course (and I'm kind of leaning this way), we could just
keep using malloc and basically just use the logic from the (former)
blocks SMS, which after all was written expressly with the buckets in mind
anyway.  It behaves a lot like pools except it gives you the free-list
capabilities and already has the cleanup feature built in.  I know you
really want everything to use the same allocator, of course...

> I am not sure about using the value from apr_thread_current to do this.
> That function just returns a tid from the OS, so it unlikely to be really
> useful.  The core should just use the thread id, which can be retrieved
> cleanly from the conn_rec->id field, using AP_CHILD_THREAD_FROM_ID.

Okay, you've convinced me that we don't need this _list_find() thing.
Therefore we also don't need the _create_in() stuff.  We can just expect
the caller to pass us a number unique to that thread.  We can always give
them the hint that apr_os_thread_current() is one way to create a number
unique to each thread in the documentation, of course, though we should
also mention that they probably just want to find that unique number once
and cache its value somewhere.  Single-threaded apps can of course just
use 0 or some other constant; it's better to let the caller decide that
than for us to do it and base the decision on whether APR_HAS_THREADS or
not.  For instance, in prefork we could always use a constant number, but
apr-util has no way to know that prefork won't ever have buckets in more
than one thread since APR is still threaded.

> That will be faster, because we don't need to call into the thread libs, and
> it is more likely to work cleanly.

Yep.  My intention (even when we were talking about using sms's) was
always that this 'key' in whatever form it takes would get stashed in the
conn_rec.  It's convenient that the id is already in there... though you
can't quite use AP_CHILD_THREAD_FROM_ID, because that macro returns two
numbers separated by a comma.  Just the second half of it is
what we want (that's c->id%HARD_THREAD_LIMIT for mpmt MPM's, c->id for
spmt MPM's, and 0 for prefork), but we'd have to make a new macro to get
just the second half.

My only question is this: if we use a static array, then we expect the
caller to pass us an index basically; a number between 0 and num_threads.
Fine.  But what if num_threads isn't constant?  What if in some
application it's hard to determine the index from what information you do
have (thread ID) for some reason?  Something in the back of my mind tells
me we might have to use a hash table instead of a static array so that we
can account for cases like this, even though a hash table would be less
efficient.  The extra benefit would be that the key the caller gives us
can really be any value at all, as long as that value is unique to that
thread.  They don't have to worry about shoehorning the value into the
range 0..num_threads.  I fear that if we use a static array we're tying
this too closely to httpd, and more than that to current MPM's for httpd.
Thoughts?

--Cliff

PS: Why is conn_rec->id a long and not an APR type?


--------------------------------------------------------------
   Cliff Woolley
   cliffwoolley@yahoo.com
   Charlottesville, VA