apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Emery Berger" <em...@cs.utexas.edu>
Subject RE: any documentation on the point of having pools?
Date Fri, 29 Mar 2002 02:07:44 GMT
Hi all,

I (together with my co-authors, Ben Zorn & Kathryn McKinley) have just
submitted a paper to OOPSLA which discusses lots of custom allocation
strategies, including pools (more commonly known as regions). It
explains the benefits of pools, along with some of their drawbacks & a
proposed solution. Here's the abstract:

Programmers hoping to achieve performance improvements often use
custom memory allocators. This in-depth study examines eight
applications that use custom allocators. Surprisingly, for six of
these applications, a state-of-the-art general-purpose allocator
performs as well as or better than the custom allocators. The two
exceptions use regions, which deliver higher performance (improvements
of up to 44\%). Regions also reduce programmer burden and eliminate a
source of memory leaks. However, we show that the inability of
programmers to free individual objects within regions can lead to a
substantial increase in memory consumption. Worse, this limitation
precludes the use of regions in common programming idioms, reducing
their usefulness.

We present a generalization of general-purpose and region-based
allocators that we call {\em reaps}. Reaps are a combination of
regions and heaps, providing a full range of region semantics with the
addition of individual object deletion. We show that our
implementation of reaps provides high performance, outperforming other
allocators with region-like semantics. Our results indicate that most
programmers needing faster memory allocators should use a better
general-purpose allocator rather than writing a custom allocator, and
that programmers needing regions should instead use reaps.

A pre-print of the paper is available at:

http://www.cs.utexas.edu/users/emery/download.cgi?location=custom.pdf

--
Emery Berger
Dept. of Computer Science
The University of Texas at Austin
www.cs.utexas.edu/users/emery


> -----Original Message-----
> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> Sent: Thursday, March 28, 2002 7:34 PM
> To: Jay Freeman (saurik)
> Cc: apr-dev
> Subject: Re: any documentation on the point of having pools?
> 
> Jay,
> 
> I can partially answer your question.
> 
> Let's say there are three kinds of memory allocation in the world:
> 
>    1. raw -- you know, like C malloc() and free()
>    2. pools
>    3. fully garbage-collected
> 
> For the programmer, full GC is ideal.  Unfortunately, it takes time
> for the GC code to figure out what's garbage and what's not, and to
> free it.  Or if that phase is to be instantaneous, then there must be
> lots of little bits of overhead scattered all around, since all
> allocations will be required to do some GC bookkeeping.  Usually GC is
> implemented with a mixture of these two strategies, but they total up
> to the same penalty, speaking *very* broadly of course.
> 
> I don't want to get into one of programming's longest-running debates,
> but let's just say that despite occasional claims that full GC can, in
> principle, be just as efficient as "raw" allocation methods, in
> practice it never has been, and looks unlikely to be so in the near
> future.  There are also some issues when it comes to interacting with
> non-GC'd languages, as you might expect.  Not knocking GC -- my
> far-and-away favorite language is Lisp -- but GC comes with a penalty.
> 
> Anyway, APR is written in C, and that's actually an important part of
> its design as a portability layer.  So full GC would be technically,
> uh, difficult under the circumstances, even without considering the
> performance hit. :-)
> 
> So let's look at the remaining two options: raw vs pools.
> 
> Some programmers find pools easier to work with, some prefer raw
> allocation.  We'll probably never get agreement on that.
> 
> However, there is one nice thing about pools: they can fulfill the
> promise that GC never did -- the promise of being more efficient than
> malloc() and free().  The reason is that in raw-style allocation,
> every malloc() call must have a matching free() call.  But a pool can
> clean up multiple mallocs() with one free().  I'm not talking about
> literal "malloc" calls, of course, but just the act of allocating
> something in that pool; and by "free" I mean apr_pool_clear or
> apr_pool_destroy, but you get the idea.  Pool bookkeeping is done in
> such a way that you can mark the whole pool as reclaimable in one
> essentially constant-time operation, independent of how many objects
> (of whatever lengths) you may have allocated in that pool.
> 
> Aside from the efficiency aspect (which I suspect is not so great as
> to be a major motivation, perhaps Sander or someone can comment?),
> people who like pools like them because they give a convenient idiom
> for expressing the lifetimes of objects.  If you have a run of code
> that's going to cons up [er, excuse me, allocate] some objects, all of
> which need to remain valid for the duration of a certain set of
> operations, it's handy to put them all in the same pool, and just
> destroy the pool at the end.  When the same code is written using raw
> allocation, it usually flaunts a dozen calls to free() at the end, and
> when you add a new object to that run of code, it's easy to forget to
> add yet another call to free().  Note that in the pool style, it's
> usually easy to see which pool you're supposed to allocate the thing
> in, or at least the presence of multiple pools there will force you to
> ask yourself about the object's lifetime, which malloc won't.
> 
> Wow, I can't believe I stopped coding to write this :-).  I hope it's
> at least technically accurate (fixes welcome!), if not persuasive.
> For the record, I like pools when I don't hate them.
> 
> -Karl
> 
> "Jay Freeman \(saurik\)" <saurik@saurik.com> writes:
> > Is there any documentation anywhere that describes "why you would
want
> to
> > use pools"?  I've been using APR for over a year now in virtually
all of
> my
> > projects, and I _still_ don't get what the advantage of this pool
> management
> > that's strewn all over my programs is.  I finally got fed up, wrote
a
> C++
> > class named "pool" (with an autocast operator for getting an
apr_pool_t
> *
> > and a destructor that destroys the pool), and have an instance of it
in
> > _every APR related object_ so I have something I can pass to the APR
> > functions when they scream out for their precious pools :-P.  I pray
at
> > nights that I'm not using an insane amount of working set by doing
this,
> > hehe.
> >
> > [...]


Mime
View raw message