harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weldon Washburn" <weldon...@gmail.com>
Subject Re: [drlvm] Class unloading support - tested one approach
Date Thu, 02 Nov 2006 05:11:54 GMT
On 11/1/06, Robin Garner <robin.garner@anu.edu.au> wrote:
>
> Rana Dasgupta wrote:
> > Robin,
> >    The basic difference of this with Etienne's method is that the flag
> is
> > on the vtable, instead of the CL instance. Do I understand correctly ?
> The
> > GC perf impact is therefore reduced because you need to lookup
> > object->vtable instead of object->class->CLinstancewhen tracing the
> heap.
> > The space overhead is correspondingly slightly higher. Also the GC
> impact
> > may look lower because there are a couple of pseudo GC cycles to reset
> the
> > vtables and sweep the vtables.
> >
> > Thanks,
> > Rana
>
> The relevant part of Etienne's design I believe is this:
>
> > 7- Each class loader structure maintains a set of boolean flags, one
> >  flag per "non-nursery" garbage collected area (even when thread-local
> >  heaps are used).  The flag is set when an instance of a class loaded by
> >  this class leader is moved into the related GC-area.  The flag is unset
> >  when the GC-area is emptied, or (optionally) when it can be determined
> >  that no instance of a class loaded by this class loader remains in the
> >  GC-area.  This is best implemented as follows: a) use an unconditional
> >  write of "true" in the flag every time an object is moved into the
> >  GC-area by the garbage collector, b) unset the related flag in "all"
> >  class loader structures just before collecting a GC-area, then setting
> >  the flag back when an object survives in the area.
>
> My design differs in several key ways from this:
> 1. There is no requirement for a per non-nursery area flag

2. The mark byte/word is set unconditionally whenever an object is
> visited by the GC, not when an object is moved into a particular mature
> space.  This may be the same for some GCs, but not all.

3. The mark byte/word is an unconditional write - Etienne's proposal
> would use a load/mask/write sequence.  This is performance critical.
> 4. My memory of x86 assembler is a little rusty, but I believe a
> constant store can be done without requiring a register for the value to
> be written, where as or-ing a bit value into a word requires a temporary
> register or two.
> 5. In a parallel GC, setting bits in a mask requires a synchronized
> update.  My design doesn't.
>
> The point is an unconditional store to a structure you are already
> accessing is very cheap, whereas register spills, loads and synchronized
> updates are expensive.


I might be missing something here.  But my take is that Robin's design is
really the best one.  Its probably in the noise but it might be possible to
even reduce the overhead of clearing the vtable "mark" by using a epoch
number instead of a boolean.  The idea is that after every major GC,
increment the value used for the mark.  When sweeping the vtables, the stale
mark values are the unreachable classes.

cheers
>
> > On 10/31/06, Robin Garner <robin.garner@anu.edu.au> wrote:
> >>
> >> Actually, just thinking about how I would implement this in JikesRVM, I
> >> would use the reachability based algorithm, but piggyback on the
> >> existing GC mechanisms:
> >>
> >> - Allocate a byte (or word) in each vtable for the purpose of tracking
> >> class reachability.
> >> - Periodically, at a time when no GC is running (even the most
> >> aggressive concurrent GC algorithms have these, I believe), zero this
> >> flag across all vtables.  This is the beginning of a class-unloading
> >> epoch.
> >> - During each GC, when the GC is fetching the GC map for an object,
> >> *unconditionally* write a value to the class reachability byte.  It may
> >> make sense for this byte to be in either the first cache-line of the
> >> vtable, or the cache line that points to the GC map - just make sure
> the
> >> mark operation doesn't in general fetch an additional cache line.
> >> - At a point in the sufficiently far future, when all reachable objects
> >> are known to have been traced by the GC, sweep the vtables and check
> the
> >> reachability of the classloaders.
> >>
> >> The features of this approach are:
> >>
> >> - Minimal additional work at GC time.  The additional write will cause
> >> some additional memory traffic, but a) it's to memory that is already
> >> guaranteed to be in L1 cache, and b) it's an unconditional independent
> >> write, and c) multiple writes to common classes will be absorbed by the
> >> write buffer.
> >>
> >> - Space cost of at most 1 word per vtable.
> >>
> >> - This works whether vtables are objects or VM structures
> >>
> >> - If the relationship between a class and a vtable is not 1:1, this
> only
> >> need affect the periodic sweep process, which should be infrequent and
> >> small compared to a GC.
> >>
> >> - shouldn't need a stop-the-world at any point.
> >>
> >> I've implemented and tested the GC-relevant part of this in JikesRVM,
> >> and the GC time overhead appears to be just under 1% in the MMTk
> >> MarkSweep collector.
> >>
> >> cheers,
> >> Robin
> >>
> >
>
>


-- 
Weldon Washburn
Intel Enterprise Solutions Software Division

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message