harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Garner <robin.gar...@anu.edu.au>
Subject Re: [general] google perftool
Date Sat, 09 Dec 2006 13:33:10 GMT
Stefano Mazzocchi wrote:
> have we considered using
>  http://goog-perftools.sourceforge.net/
This could possibly be useful for the VM's internal memory allocation, 
but I doubt if there would be a noticeable win.

> If I look at
>  http://wiki.wikked.net/wiki/Squid_memory_fragmentation_problem
> it shows you some serious potential for improvement in memory management
> (but I don't know enough about our GC to know how useful that is)
> Thoughts?

In a garbage collected system the issues are very different.  I haven't 
studied xiaofeng's code, but in a standard implementation of a 
generational GC with a mark-compact nursery:

- allocation is done with a bump-pointer.  The actual allocation can be 
done in something like 7 instructions (although I'm not sure whether in 
DRLVM the fetch of the pointer from the TLS can be optimized away yet). 
   Appel's paper is the standard reference for this.

- Each thread grabs a chunk of memory and allocates from it without 
synchronization, then uses a lock to synchronize access to the global 
pool when it needs another chunk.  The synchronization cost of 
allocation is virtually negligible given an appropriate chunk size.

- The various malloc implementations are free-list allocators.  They 
need to maintain lists of free and allocated memory.  They need to worry 
about merging adjacent free blocks, finding a best-fit block in a list 
of variable size free blocks, fragmentation and all that.  In a copying 
GC, we don't have that problem because we can move objects around.

The main ways to get performance out of a garbage collected memory 
subsystem are:
- co-locating objects that exhibit temporal locality
- Engineering the mechanisms with minimal overhead (eg the helper 
inlining work now being done)
- Choosing algorithms that minimise copying overhead.  Pre-tenuring 
objects we know will be long-lived.  Generational collection.

In short, there has been work on improving performance of both garbage 
collected and explicit memory management work, which overlap in the 
design of free-list allocators (used by mark-sweep and reference counted 
collectors), but beyond that the two are very different worlds.


Robin Garner
Dept. of Computer Science
Australian National University

View raw message