harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Garner <Robin.Gar...@anu.edu.au>
Subject Re: Questions about GC implementations
Date Sat, 29 Oct 2005 05:57:46 GMT

>>In the garbage collectors I've worked with, the essential design is:
>>
>>- 'new' allocates space on the heap.
>>- The header of each object contains a pointer (or equiv) to a per-class
>>data structure, that points to the GC map for the object (and virtual
>>dispatch tables etc etc)
>>- Reference fields in objects contain pointers directly to the heap
>>objects they reference.
>>- Pointer loads and stores are (optionally) performed via barriers - the
>>terminology is a little confusing: these are not synchronization barriers,
>>but opportunities for the GC to intercept the load/store and do some
>>additional processing.  Write barriers are common, read barriers less so.
>>    
>>
>
>This is also the approach I have taken, so I think we are
>on the same page.  I think we are just saying the same thing
>in different words.  When an object is to be allocated,
>its pointer will be set by the allocator.  This pointer
>is 'robject.pgarbage' and is found in 'jvm/src/object.h'.
>The reason you did not find one is because I have only
>provided a stub GC implementation.  The 'new' operation
>is performed by object_instance_new() in 'jvm/src/object.c'
>and includes a call to GC_OBJECT_NEW().
>  
>
Ok, so now I understand what the 'robject.pgarbage' pointer is - this is 
the pointer to the body of an object.  So do I understand correctly that 
the body of an array is pointed to by a different pointer in an object 
header ?  Do you realise that this means that a GETFIELD requires two 
indirections ?

Do you realise that your object header is somewhere in the vicinity of 
40 bytes ?  The average payload of a java object (at least for 
SPECjvm98) is 16 bytes.  Many of the fields in the bootJVM object header 
are actually constant across all objects of a class.

The object table needs to go - I think this is a fundamental enough 
design feature that it needs to be removed as soon as possible.  An 
object header should be 2 words at the most (3 for arrays), and they 
should be contiguous with the object. 

>Any time a reference to that object is made, its object
>hash, of type 'jvm_object_hash' has a reference recorded
>by GC_OBJECT_MKREF_FROM_OBJECT() or GC_OBJECT_FIELD_MKREF()
>or GC_OBJECT_MKREF_FROM_JVM().  These are for internal
>JVM references, references from Java object reference variables,
>and references from Java local method reference variables,
>respectively.  When the reference is no longer needed, such as
>when an object is destroyed or when a local method returns,
>the reference is no longer needed.  When this occurs,
>then GC_OBJECT_RMREF_FROM_OBJECT() or GC_OBJECT_FIELD_RMREF()
>or GC_OBJECT_RMREF_FROM_JVM() are called, respectively.  When
>an object is no longer used, then GC_OBJECT_DELETE() is called.
>  
>
So these I guess are what we would refer to in the GC world as 
barriers.  My confusion is that a) they don't have enough parameters, 
and b) there are too many of them.  Am I right in assuming that you are 
invoking a barrier on every update to a local variable, every update to 
a stack ?  This is going to be way too inefficient.

For a stop the world GC, what is needed is a barrier on a PUTFIELD and 
AASTORE, and a way that the GC can query the current pointers that the 
VM has into the heap when it starts a collection.  Optionally, we can 
also have a PUTSTATIC barrier, and corresponding read barriers.

I also don't get the distinction between the classes and objects.  
Either a class is an object, and lives in the heap - in which case it 
can be treated as an object - or it isn't and lives in the VM.

>When a local methods is called, GC_STACK_NEW() is invoked to set
>up GC for that stack frame.  Adding and removing objects is done
>per above.  Adding and removing references to objects is done
>with GC_STACK_MKREF_FROM_JVM() and GC_STACK_RMREF_FROM_JVM().
>When it returns, GC_STACK_DELETE() is called.
>  
>
This is way too heavyweight.  Stack manipulation is fast-path code, and 
shouldn't have to call out to the GC all the time.

>>- There are many styles of collector, but the most common class uses
>>tracing, in which a root set of pointers is used to determine an initial
>>set of live objects, and the collector performs a transitive closure over
>>this set to establish the set of all live objects.  The root set is
>>commonly the thread stacks and the static pointer fields.
>>    
>>
>
>The macro GC_RUN() refers to the collector.  The stub implementation
>shows what it is supposed to do.  The OBJECT_STATUS_GCREQ status bit
>controls when an object needs collecting.
>  
>
So where does OBJECT_STATUS_GCREQ get set ?  It's the GC (and only the 
GC) that knows when an object is alive or not (unless we do reachability 
analysis in the compiler/interpreter).

>All of the above setting up and tearing down of objects and references
>to objects have equivalents for classes using the same rationale.
>(The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'.
>The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in
>'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and
>GC_STACK_DELETE().)
>
>  
>
>>- The above is also complicated by
>>  . Reference types
>>  . Finalization
>>  . Locks
>>  . Address-based hashing
>>  The solutions to these are all pretty well known, but complicate the
>>    
>>
>design
>  
>
>>This is pretty much it - the rest (45 years of research) is optimizing the
>>way this is all done.
>>
>>    
>>
>>>          Perhaps I am taking it a bit too simplistically,
>>>but all I did was define a GC hook anywhere an object or
>>>array reference or a class load or unload happened.  How
>>>does your concept differ?  And what sort of approach should
>>>be used instead.
>>>      
>>>
>>This I think brings us back to my initial question, asking what these
>>hooks were supposed to do.  I guess you're saying you had a vague idea
>>that the GC might need to know about these events, so put hooks in for
>>them.
>>
>>When I was looking around the code trying to find out where to start
>>hooking in a managed heap, I looked for the 'new' or 'alloc' operation,
>>and couldn't seem to find it.
>>
>>    
>>
>
>'new' --  see object_instance_new() for objects,
>          class_static_new() for classes.
>
>'delete' -- see object_instance_delete() for objects,
>            class_static_delete() for classes.
>  
>
Again, what is it that calls 'delete' - isn't that the GC's job ?  I'm 
still confused.

>All four of these functions perform a number of GC activities,
>espcially calling the GC_xxx() macros listed above.
>  
>
>Notice that, throughout the body of the code, _any_ time a reference
>to an object (or class) is created or destroyed, one or more related
>GC functions are called.
>
>What is _done_ with these events is up to the implementor
>of the GC algorithm itself.
>  
>
The GC and the VM need to be in bed a bit more closely than this, at 
least for an implementation with reasonable performance.

Attached is a tarball with 3 header files, and two (untested) GC 
implementations - unfortunately there's too much work for me to refactor 
the VM to match the interface and therefore to test.  It's probably best 
treated as example code for now.

cheers
Robin

Mime
View raw message