harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <ge...@apache.org>
Subject Re: Questions about GC implementations
Date Tue, 01 Nov 2005 12:38:46 GMT
thanks

On Oct 31, 2005, at 10:57 PM, Robin Garner wrote:

> Done. This is issue #11.
>
> cheers
>
> Geir Magnusson Jr. wrote:
>
>
>> Robin - can you post this to a JIRA please?
>>
>> geir
>>
>> On Oct 29, 2005, at 1:57 AM, Robin Garner wrote:
>>
>>
>>>
>>>
>>>
>>>>> In the garbage collectors I've worked with, the essential  
>>>>> design is:
>>>>>
>>>>> - 'new' allocates space on the heap.
>>>>> - The header of each object contains a pointer (or equiv) to a  
>>>>> per-class
>>>>> data structure, that points to the GC map for the object (and  
>>>>> virtual
>>>>> dispatch tables etc etc)
>>>>> - Reference fields in objects contain pointers directly to the  
>>>>> heap
>>>>> objects they reference.
>>>>> - Pointer loads and stores are (optionally) performed via  
>>>>> barriers - the
>>>>> terminology is a little confusing: these are not  
>>>>> synchronization barriers,
>>>>> but opportunities for the GC to intercept the load/store and do  
>>>>> some
>>>>> additional processing. Write barriers are common, read barriers  
>>>>> less so.
>>>>>
>>>>>
>>>>>
>>>>
>>>> This is also the approach I have taken, so I think we are
>>>> on the same page. I think we are just saying the same thing
>>>> in different words. When an object is to be allocated,
>>>> its pointer will be set by the allocator. This pointer
>>>> is 'robject.pgarbage' and is found in 'jvm/src/object.h'.
>>>> The reason you did not find one is because I have only
>>>> provided a stub GC implementation. The 'new' operation
>>>> is performed by object_instance_new() in 'jvm/src/object.c'
>>>> and includes a call to GC_OBJECT_NEW().
>>>>
>>>>
>>>>
>>> Ok, so now I understand what the 'robject.pgarbage' pointer is -  
>>> this is the pointer to the body of an object. So do I understand  
>>> correctly that the body of an array is pointed to by a different  
>>> pointer in an object header ? Do you realise that this means that  
>>> a GETFIELD requires two indirections ?
>>>
>>> Do you realise that your object header is somewhere in the  
>>> vicinity of 40 bytes ? The average payload of a java object (at  
>>> least for SPECjvm98) is 16 bytes. Many of the fields in the  
>>> bootJVM object header are actually constant across all objects of  
>>> a class.
>>>
>>> The object table needs to go - I think this is a fundamental  
>>> enough design feature that it needs to be removed as soon as  
>>> possible. An object header should be 2 words at the most (3 for  
>>> arrays), and they should be contiguous with the object.
>>>
>>>
>>>> Any time a reference to that object is made, its object
>>>> hash, of type 'jvm_object_hash' has a reference recorded
>>>> by GC_OBJECT_MKREF_FROM_OBJECT() or GC_OBJECT_FIELD_MKREF()
>>>> or GC_OBJECT_MKREF_FROM_JVM(). These are for internal
>>>> JVM references, references from Java object reference variables,
>>>> and references from Java local method reference variables,
>>>> respectively. When the reference is no longer needed, such as
>>>> when an object is destroyed or when a local method returns,
>>>> the reference is no longer needed. When this occurs,
>>>> then GC_OBJECT_RMREF_FROM_OBJECT() or GC_OBJECT_FIELD_RMREF()
>>>> or GC_OBJECT_RMREF_FROM_JVM() are called, respectively. When
>>>> an object is no longer used, then GC_OBJECT_DELETE() is called.
>>>>
>>>>
>>>>
>>> So these I guess are what we would refer to in the GC world as  
>>> barriers. My confusion is that a) they don't have enough  
>>> parameters, and b) there are too many of them. Am I right in  
>>> assuming that you are invoking a barrier on every update to a  
>>> local variable, every update to a stack ? This is going to be way  
>>> too inefficient.
>>>
>>> For a stop the world GC, what is needed is a barrier on a  
>>> PUTFIELD and AASTORE, and a way that the GC can query the current  
>>> pointers that the VM has into the heap when it starts a  
>>> collection. Optionally, we can also have a PUTSTATIC barrier, and  
>>> corresponding read barriers.
>>>
>>> I also don't get the distinction between the classes and objects.  
>>> Either a class is an object, and lives in the heap - in which  
>>> case it can be treated as an object - or it isn't and lives in  
>>> the VM.
>>>
>>>
>>>
>>>> When a local methods is called, GC_STACK_NEW() is invoked to set
>>>> up GC for that stack frame. Adding and removing objects is done
>>>> per above. Adding and removing references to objects is done
>>>> with GC_STACK_MKREF_FROM_JVM() and GC_STACK_RMREF_FROM_JVM().
>>>> When it returns, GC_STACK_DELETE() is called.
>>>>
>>>>
>>>>
>>> This is way too heavyweight. Stack manipulation is fast-path  
>>> code, and shouldn't have to call out to the GC all the time.
>>>
>>>
>>>
>>>>> - There are many styles of collector, but the most common class  
>>>>> uses
>>>>> tracing, in which a root set of pointers is used to determine  
>>>>> an initial
>>>>> set of live objects, and the collector performs a transitive  
>>>>> closure over
>>>>> this set to establish the set of all live objects. The root set is
>>>>> commonly the thread stacks and the static pointer fields.
>>>>>
>>>>>
>>>>>
>>>>
>>>> The macro GC_RUN() refers to the collector. The stub implementation
>>>> shows what it is supposed to do. The OBJECT_STATUS_GCREQ status bit
>>>> controls when an object needs collecting.
>>>>
>>>>
>>>>
>>> So where does OBJECT_STATUS_GCREQ get set ? It's the GC (and only  
>>> the GC) that knows when an object is alive or not (unless we do  
>>> reachability analysis in the compiler/interpreter).
>>>
>>>
>>>
>>>> All of the above setting up and tearing down of objects and  
>>>> references
>>>> to objects have equivalents for classes using the same rationale.
>>>> (The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'.
>>>> The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in
>>>> 'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and
>>>> GC_STACK_DELETE().)
>>>>
>>>>
>>>>
>>>>
>>>>> - The above is also complicated by
>>>>> . Reference types
>>>>> . Finalization
>>>>> . Locks
>>>>> . Address-based hashing
>>>>> The solutions to these are all pretty well known, but  
>>>>> complicate the
>>>>>
>>>>>
>>>>>
>>>> design
>>>>
>>>>
>>>>
>>>>> This is pretty much it - the rest (45 years of research) is  
>>>>> optimizing the
>>>>> way this is all done.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Perhaps I am taking it a bit too simplistically,
>>>>>> but all I did was define a GC hook anywhere an object or
>>>>>> array reference or a class load or unload happened. How
>>>>>> does your concept differ? And what sort of approach should
>>>>>> be used instead.
>>>>>>
>>>>>>
>>>>>>
>>>>> This I think brings us back to my initial question, asking what  
>>>>> these
>>>>> hooks were supposed to do. I guess you're saying you had a  
>>>>> vague idea
>>>>> that the GC might need to know about these events, so put hooks  
>>>>> in for
>>>>> them.
>>>>>
>>>>> When I was looking around the code trying to find out where to  
>>>>> start
>>>>> hooking in a managed heap, I looked for the 'new' or 'alloc'  
>>>>> operation,
>>>>> and couldn't seem to find it.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>> 'new' -- see object_instance_new() for objects,
>>>> class_static_new() for classes.
>>>>
>>>> 'delete' -- see object_instance_delete() for objects,
>>>> class_static_delete() for classes.
>>>>
>>>>
>>>>
>>> Again, what is it that calls 'delete' - isn't that the GC's job ?  
>>> I'm still confused.
>>>
>>>
>>>
>>>> All four of these functions perform a number of GC activities,
>>>> espcially calling the GC_xxx() macros listed above.
>>>>
>>>> Notice that, throughout the body of the code, _any_ time a  
>>>> reference
>>>> to an object (or class) is created or destroyed, one or more  
>>>> related
>>>> GC functions are called.
>>>>
>>>> What is _done_ with these events is up to the implementor
>>>> of the GC algorithm itself.
>>>>
>>>>
>>>>
>>> The GC and the VM need to be in bed a bit more closely than this,  
>>> at least for an implementation with reasonable performance.
>>>
>>> Attached is a tarball with 3 header files, and two (untested) GC  
>>> implementations - unfortunately there's too much work for me to  
>>> refactor the VM to match the interface and therefore to test.  
>>> It's probably best treated as example code for now.
>>>
>>> cheers
>>> Robin
>>>
>>> <gc.tgz>
>>>
>>>
>>
>>
>
>

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org



Mime
View raw message