harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <ge...@apache.org>
Subject Re: Questions about GC implementations
Date Mon, 31 Oct 2005 23:38:57 GMT
Robin - can you post this to a JIRA please?

geir

On Oct 29, 2005, at 1:57 AM, Robin Garner wrote:

>
>
>>> In the garbage collectors I've worked with, the essential design is:
>>>
>>> - 'new' allocates space on the heap.
>>> - The header of each object contains a pointer (or equiv) to a  
>>> per-class
>>> data structure, that points to the GC map for the object (and  
>>> virtual
>>> dispatch tables etc etc)
>>> - Reference fields in objects contain pointers directly to the heap
>>> objects they reference.
>>> - Pointer loads and stores are (optionally) performed via  
>>> barriers - the
>>> terminology is a little confusing: these are not synchronization  
>>> barriers,
>>> but opportunities for the GC to intercept the load/store and do some
>>> additional processing.  Write barriers are common, read barriers  
>>> less so.
>>>
>>>
>>
>> This is also the approach I have taken, so I think we are
>> on the same page.  I think we are just saying the same thing
>> in different words.  When an object is to be allocated,
>> its pointer will be set by the allocator.  This pointer
>> is 'robject.pgarbage' and is found in 'jvm/src/object.h'.
>> The reason you did not find one is because I have only
>> provided a stub GC implementation.  The 'new' operation
>> is performed by object_instance_new() in 'jvm/src/object.c'
>> and includes a call to GC_OBJECT_NEW().
>>
>>
> Ok, so now I understand what the 'robject.pgarbage' pointer is -  
> this is the pointer to the body of an object.  So do I understand  
> correctly that the body of an array is pointed to by a different  
> pointer in an object header ?  Do you realise that this means that  
> a GETFIELD requires two indirections ?
>
> Do you realise that your object header is somewhere in the vicinity  
> of 40 bytes ?  The average payload of a java object (at least for  
> SPECjvm98) is 16 bytes.  Many of the fields in the bootJVM object  
> header are actually constant across all objects of a class.
>
> The object table needs to go - I think this is a fundamental enough  
> design feature that it needs to be removed as soon as possible.  An  
> object header should be 2 words at the most (3 for arrays), and  
> they should be contiguous with the object.
>
>> Any time a reference to that object is made, its object
>> hash, of type 'jvm_object_hash' has a reference recorded
>> by GC_OBJECT_MKREF_FROM_OBJECT() or GC_OBJECT_FIELD_MKREF()
>> or GC_OBJECT_MKREF_FROM_JVM().  These are for internal
>> JVM references, references from Java object reference variables,
>> and references from Java local method reference variables,
>> respectively.  When the reference is no longer needed, such as
>> when an object is destroyed or when a local method returns,
>> the reference is no longer needed.  When this occurs,
>> then GC_OBJECT_RMREF_FROM_OBJECT() or GC_OBJECT_FIELD_RMREF()
>> or GC_OBJECT_RMREF_FROM_JVM() are called, respectively.  When
>> an object is no longer used, then GC_OBJECT_DELETE() is called.
>>
>>
> So these I guess are what we would refer to in the GC world as  
> barriers.  My confusion is that a) they don't have enough  
> parameters, and b) there are too many of them.  Am I right in  
> assuming that you are invoking a barrier on every update to a local  
> variable, every update to a stack ?  This is going to be way too  
> inefficient.
>
> For a stop the world GC, what is needed is a barrier on a PUTFIELD  
> and AASTORE, and a way that the GC can query the current pointers  
> that the VM has into the heap when it starts a collection.   
> Optionally, we can also have a PUTSTATIC barrier, and corresponding  
> read barriers.
>
> I also don't get the distinction between the classes and objects.   
> Either a class is an object, and lives in the heap - in which case  
> it can be treated as an object - or it isn't and lives in the VM.
>
>
>> When a local methods is called, GC_STACK_NEW() is invoked to set
>> up GC for that stack frame.  Adding and removing objects is done
>> per above.  Adding and removing references to objects is done
>> with GC_STACK_MKREF_FROM_JVM() and GC_STACK_RMREF_FROM_JVM().
>> When it returns, GC_STACK_DELETE() is called.
>>
>>
> This is way too heavyweight.  Stack manipulation is fast-path code,  
> and shouldn't have to call out to the GC all the time.
>
>
>>> - There are many styles of collector, but the most common class uses
>>> tracing, in which a root set of pointers is used to determine an  
>>> initial
>>> set of live objects, and the collector performs a transitive  
>>> closure over
>>> this set to establish the set of all live objects.  The root set is
>>> commonly the thread stacks and the static pointer fields.
>>>
>>>
>>
>> The macro GC_RUN() refers to the collector.  The stub implementation
>> shows what it is supposed to do.  The OBJECT_STATUS_GCREQ status bit
>> controls when an object needs collecting.
>>
>>
> So where does OBJECT_STATUS_GCREQ get set ?  It's the GC (and only  
> the GC) that knows when an object is alive or not (unless we do  
> reachability analysis in the compiler/interpreter).
>
>
>> All of the above setting up and tearing down of objects and  
>> references
>> to objects have equivalents for classes using the same rationale.
>> (The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'.
>> The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in
>> 'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and
>> GC_STACK_DELETE().)
>>
>>
>>
>>> - The above is also complicated by
>>>  . Reference types
>>>  . Finalization
>>>  . Locks
>>>  . Address-based hashing
>>>  The solutions to these are all pretty well known, but complicate  
>>> the
>>>
>>>
>> design
>>
>>
>>> This is pretty much it - the rest (45 years of research) is  
>>> optimizing the
>>> way this is all done.
>>>
>>>
>>>
>>>>          Perhaps I am taking it a bit too simplistically,
>>>> but all I did was define a GC hook anywhere an object or
>>>> array reference or a class load or unload happened.  How
>>>> does your concept differ?  And what sort of approach should
>>>> be used instead.
>>>>
>>>>
>>> This I think brings us back to my initial question, asking what  
>>> these
>>> hooks were supposed to do.  I guess you're saying you had a vague  
>>> idea
>>> that the GC might need to know about these events, so put hooks  
>>> in for
>>> them.
>>>
>>> When I was looking around the code trying to find out where to start
>>> hooking in a managed heap, I looked for the 'new' or 'alloc'  
>>> operation,
>>> and couldn't seem to find it.
>>>
>>>
>>>
>>
>> 'new' --  see object_instance_new() for objects,
>>          class_static_new() for classes.
>>
>> 'delete' -- see object_instance_delete() for objects,
>>            class_static_delete() for classes.
>>
>>
> Again, what is it that calls 'delete' - isn't that the GC's job ?   
> I'm still confused.
>
>
>> All four of these functions perform a number of GC activities,
>> espcially calling the GC_xxx() macros listed above.
>>
>> Notice that, throughout the body of the code, _any_ time a reference
>> to an object (or class) is created or destroyed, one or more related
>> GC functions are called.
>>
>> What is _done_ with these events is up to the implementor
>> of the GC algorithm itself.
>>
>>
> The GC and the VM need to be in bed a bit more closely than this,  
> at least for an implementation with reasonable performance.
>
> Attached is a tarball with 3 header files, and two (untested) GC  
> implementations - unfortunately there's too much work for me to  
> refactor the VM to match the interface and therefore to test.  It's  
> probably best treated as example code for now.
>
> cheers
> Robin
>
> <gc.tgz>
>

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org



Mime
View raw message