apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Freeman \(saurik\)" <sau...@saurik.com>
Subject Re: any documentation on the point of having pools?
Date Fri, 29 Mar 2002 02:55:14 GMT
Karl:

Well, thank you very much for taking the time to write these comments...
this is definitely a C-centric way of thinking :).  However, this is
definitely not encouraging... I'm probably going to end up looking more into
the Netscape Portable Runtime as a replacement to APR.  As a C++ developer,
I manage object lifetime using scoping and try to avoid heap allocations
whereever possible.  When I allocate objects, they usually get allocated
onto the stack.  If I really need to heap allocate an object for some
reason, I'll have it wrapped into a small handle of some sort that can
manage it's lifetime for me.  This ends up being one of the driving tenents
of C++ design:  wrap resources into stack allocated objects in order to take
advantage of destructors.

The reasons for this are various, but there are two main ones.  First of
all, it keeps you from running into the problem of needing an arbitrary
number of free()s at the ends of functions to get rid of memory that was
allocated for the scope of that operation; hence leading to simpler code
(one of the reasons you mention for pools).  Secondly, it makes it easy to
support exceptions within the language.  If I were to be heap allocating
memory, and an exception occured, I would need to make sure I go to whatever
lengths neccessary to catch the various exceptions and destroy that
allocation.  Using C++ destructors, and the guarantee that they will be
called as the stack unwinds, I don't need to go through any extra work to
know that my function won't leak memory in the case of an exception.

Garbage collection (which you mention as a good thing for programmers) tends
to be frowned upon in this programming model, as it leads to
"non-deterministic finalization".  As much as I think garbage collectors
have promise to solve the problem of _memory_ management, they don't help at
all when you try to manage non-memory resources.  Really, they end up
hindering the process, as you have to explicitely call release() and free()
functions on the resources to get rid of them.  Look at the APR mutex.  I
still need to release the mutex when I'm done using it... if I were actually
putting control of that mutex into a garbage collected object I'd have to
either explicitely release() it or wait until the object got collected
(which might not be for a _long_ time after I stop using the mutex object).

That's something that seems to be common to a lot of these memory management
schemes... they try to make managing memory easier, but they either end up
making you A) manage the meta-level of how to allocate things and B) make it
virtually impossible to abstract away non-memory resource deallocation from
the user of the resource.

To demonstrate how C++ tries to handle these problems, let's work our way
towards using a mutex in C++ (to pick an APR friendly example that doesn't
use many lines of code... if the analogy is stretched to far pretend it is a
file handle or a database connection or something).  This is an example of a
non-memory related resource that needs to be tracked and handled
appropriately.

To start with, here is how (on Win32), this is accomplished at the C level
(please note that all formatting is going to be with the "short e-mail"
convention that I'm making up for the scope of this e-mail, hehe):

void myfunction() {
    HANDLE lock = CreateMutex(NULL, NULL, NULL);
    /* some code goes here that uses the mutex */
    CloseHandle(lock);
}

All right.  This works, but (in addition to having the "what if you forget
issue") it doesn't hold well when you are working in C++ and in many
situations is simply incorrect code.  The problem is that "some code" might
call some function that throws an exception, and we wouldn't get that mutex
handle closed (and thereby waste OS resources until such time as our process
exitted).  To solve this, the mutex implementation might be rewritten to be
wrapped into a class:

class mutex {
  protected:
    HANDLE mutex_;
  public:
    mutex() {   mutex_ = CreateMutex(NULL, NULL, NULL);   }
    ~mutex() {   CloseHandle(mutex_);   }
    lock() {   WaitForSingleObject(mutex_, INFINITE);   }
    unlock() {   ReleaseMutex(mutex_);   }
};

void myfunction() {
    mutex lock;
    /* some code goes here that uses the mutex */
}

Given a reasonably good compiler (in cases where exceptions were _known_ to
not be an issue for whatever reason) this would even generate the exact same
code as the previous version, so it isn't as if we added any un-needed
abstraction penalty.  Now, if an exception occurs in "some code", the mutex
handle will still get closed, and there won't be a memory leak.  Also, there
is no longer much of a risk of accidentally freeing the mutex twice, or of
forgetting to free it at all.  Obviously I'd want some error checking in
there, but I'm trying to keep this simple :).

So, over the last week or so, I've been occasionally applying this C++
mentality to writing a small little library to provide higher-level features
(such as networking) to my C++ programs using APR as the underlying OS
abstraction.  This involves wrapping core APR concepts into classes,
providing appropriate destructors, and mapping some things up to more
standard C++isms (my initial usage was to write a C++ iostream/streambuf for
tcp client connections).

If I start writing an APR implementation of mutex's, then I end up with code
like this:

class mutex {
  protected:
    apr_thread_mutex_t *mutex_;
  public:
    mutex() {   apr_thread_mutex_create(&mutex_, APR_THREAD_MUTEX_DEFAULT,
/*POOL*/);   }
    ~mutex() {   apr_thread_mutex_destroy(mutex_);   }
    lock() {   apr_thread_mutex_lock(mutex_);   }
    unlock() {   apr_thread_mutex_unlock(mutex_);   }
};

Note that there is this extra little thing there having to do with a "pool".
Well, to deal with this, I wrote a wrapper for APR's pools called "pool".
The constructor creates a pool, the destructor destroys it, and it has a
clear() method.  It also has an autocast operator so you can use it as if it
were an apr_pool_t *.  OK, so let's use it:

    mutex(pool &context) {   apr_thread_mutex_create(&mutex_,
APR_THREAD_MUTEX_DEFAULT, context);   }

Now I can write my function as follows:

void myfunction() {
    pool context;
    mutex lock(context);
    /* some code that uses the mutex */
}

If an exception gets thrown in this case, both the pool will be destroyed,
and the mutex will be released.  That isn't that bad for the mutex example
(as if someone else were to have freed that pool it likely isn't going to
cause an issue, and the pool is only used once in that single constructor),
but it requires every single thing that wants to use a mutex to have a pool
around to allocate it into.  It actually starts to cause problems when you
are trying to provide more complex features.  Let's look at the networking
example... here is my netaddr class:

class netaddr {
  protected:
    pool pool_;
    apr_sockaddr_t *addr_;

  public:
    netaddr(const std::string &addr) {
        char *host;   apr_port_t port;
        apr_parse_addr_port(&host, NULL, &port, addr.c_str(), pool_);
        apr_sockaddr_info_get(&addr_, host, APR_INET, port, 0, pool_);
    }

    explicit netaddr(socket &sock) {   apr_socket_addr_get(&addr_,
APR_LOCAL, sock);   }
    netaddr(const std::string &host, apr_port_t port) {
apr_sockaddr_info_get(&addr_, host.c_str(), APR_INET, port, 0, pool_);   }

    void set_port(apr_port_t port) {   apr_sockaddr_port_set(addr_,
rt);   }
    operator apr_sockaddr_t *() const {   return addr_;   }
};

What's nice about this class is, if you have a function that takes a "const
netaddr &", and you pass it a string (such as "saurik.com:80"), the compiler
will take care of calling the constructor and making it work for you.  When
the class goes away, so does the pool that allocated its single snippet of
memory.  Everything is self contained.  What's annoying about this class is
that it has it's own private pool for no reason other than to make APR happy
:-P.  More annoyingly, this strategy doesn't even continue to work for other
types of objects.  My TCP Server class's accept() method is (accourding a
recent thread on this mailing list) leaking memory, as the pool that you use
to accept the connection with gets stuff allocated into it, and therefor
needs to be cleared more often than the pool that you have the server socket
working with.  That pool should apparently be something bound to the new
connection, not something bound to the thing listening on the old
connection...

This means that I should really be allowing these different functions that
need pools to actually take pools as arguments (thereby exposing the memory
management to the user of the object, as well as removing the ability to
have implicit constructors and overloaded operators).  In the case of my
network address class, if I allowed it to take a pool and allocate out of
that pool (as in the mutex example), then the pool might get destroyed
before this object does, and then this entire object would be invalidated.
That's a bad thing.  There's no reason I should have to run into that
situation.  I've actually got to the point this morning (soon before I sent
my original post) where I was seriously considering adding a reference
counting abstraction over the pool memory system, and then having any object
that has any memory allocated into the pool holding a reference to said pool
to make sure the memory didn't get taken out from underneath it.  It was
soon after I realized there was no way in hell I could still feel good about
proposing this solution to the people I work with as a replacement for our
existing, not really OS-independent networking and threading library that I
started researching the Netscape Portable Runtime :(.

Sincerely,
Jay Freeman (saurik)
saurik@saurik.com

----- Original Message -----
From: "Karl Fogel" <kfogel@newton.ch.collab.net>
To: "Jay Freeman (saurik)" <saurik@saurik.com>
Cc: "apr-dev" <dev@apr.apache.org>
Sent: Thursday, March 28, 2002 7:34 PM
Subject: Re: any documentation on the point of having pools?


> Jay,
>
> I can partially answer your question.
>
> Let's say there are three kinds of memory allocation in the world:
>
>    1. raw -- you know, like C malloc() and free()
>    2. pools
>    3. fully garbage-collected
>
> For the programmer, full GC is ideal.  Unfortunately, it takes time
...
> Anyway, APR is written in C, and that's actually an important part of
> its design as a portability layer.  So full GC would be technically,
...
> So let's look at the remaining two options: raw vs pools.
>
> Some programmers find pools easier to work with, some prefer raw
> allocation.  We'll probably never get agreement on that.
...
> Aside from the efficiency aspect (which I suspect is not so great as
> to be a major motivation, perhaps Sander or someone can comment?),
> people who like pools like them because they give a convenient idiom
> for expressing the lifetimes of objects.  If you have a run of code
> that's going to cons up [er, excuse me, allocate] some objects, all of
> which need to remain valid for the duration of a certain set of
> operations, it's handy to put them all in the same pool, and just
> destroy the pool at the end.  When the same code is written using raw
> allocation, it usually flaunts a dozen calls to free() at the end, and
> when you add a new object to that run of code, it's easy to forget to
> add yet another call to free().  Note that in the pool style, it's
> usually easy to see which pool you're supposed to allocate the thing
> in, or at least the presence of multiple pools there will force you to
> ask yourself about the object's lifetime, which malloc won't.
>
> Wow, I can't believe I stopped coding to write this :-).  I hope it's
> at least technically accurate (fixes welcome!), if not persuasive.
> For the record, I like pools when I don't hate them.
>
> -Karl


Mime
View raw message