harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Garner <Robin.Gar...@anu.edu.au>
Subject Re: Questions about GC implementations
Date Tue, 01 Nov 2005 03:57:00 GMT
Done. This is issue #11.


Geir Magnusson Jr. wrote:

> Robin - can you post this to a JIRA please?
> geir
> On Oct 29, 2005, at 1:57 AM, Robin Garner wrote:
>>>> In the garbage collectors I've worked with, the essential design is:
>>>> - 'new' allocates space on the heap.
>>>> - The header of each object contains a pointer (or equiv) to a 
>>>> per-class
>>>> data structure, that points to the GC map for the object (and virtual
>>>> dispatch tables etc etc)
>>>> - Reference fields in objects contain pointers directly to the heap
>>>> objects they reference.
>>>> - Pointer loads and stores are (optionally) performed via barriers 
>>>> - the
>>>> terminology is a little confusing: these are not synchronization 
>>>> barriers,
>>>> but opportunities for the GC to intercept the load/store and do some
>>>> additional processing. Write barriers are common, read barriers 
>>>> less so.
>>> This is also the approach I have taken, so I think we are
>>> on the same page. I think we are just saying the same thing
>>> in different words. When an object is to be allocated,
>>> its pointer will be set by the allocator. This pointer
>>> is 'robject.pgarbage' and is found in 'jvm/src/object.h'.
>>> The reason you did not find one is because I have only
>>> provided a stub GC implementation. The 'new' operation
>>> is performed by object_instance_new() in 'jvm/src/object.c'
>>> and includes a call to GC_OBJECT_NEW().
>> Ok, so now I understand what the 'robject.pgarbage' pointer is - this 
>> is the pointer to the body of an object. So do I understand correctly 
>> that the body of an array is pointed to by a different pointer in an 
>> object header ? Do you realise that this means that a GETFIELD 
>> requires two indirections ?
>> Do you realise that your object header is somewhere in the vicinity 
>> of 40 bytes ? The average payload of a java object (at least for 
>> SPECjvm98) is 16 bytes. Many of the fields in the bootJVM object 
>> header are actually constant across all objects of a class.
>> The object table needs to go - I think this is a fundamental enough 
>> design feature that it needs to be removed as soon as possible. An 
>> object header should be 2 words at the most (3 for arrays), and they 
>> should be contiguous with the object.
>>> Any time a reference to that object is made, its object
>>> hash, of type 'jvm_object_hash' has a reference recorded
>>> or GC_OBJECT_MKREF_FROM_JVM(). These are for internal
>>> JVM references, references from Java object reference variables,
>>> and references from Java local method reference variables,
>>> respectively. When the reference is no longer needed, such as
>>> when an object is destroyed or when a local method returns,
>>> the reference is no longer needed. When this occurs,
>>> or GC_OBJECT_RMREF_FROM_JVM() are called, respectively. When
>>> an object is no longer used, then GC_OBJECT_DELETE() is called.
>> So these I guess are what we would refer to in the GC world as 
>> barriers. My confusion is that a) they don't have enough parameters, 
>> and b) there are too many of them. Am I right in assuming that you 
>> are invoking a barrier on every update to a local variable, every 
>> update to a stack ? This is going to be way too inefficient.
>> For a stop the world GC, what is needed is a barrier on a PUTFIELD 
>> and AASTORE, and a way that the GC can query the current pointers 
>> that the VM has into the heap when it starts a collection. 
>> Optionally, we can also have a PUTSTATIC barrier, and corresponding 
>> read barriers.
>> I also don't get the distinction between the classes and objects. 
>> Either a class is an object, and lives in the heap - in which case it 
>> can be treated as an object - or it isn't and lives in the VM.
>>> When a local methods is called, GC_STACK_NEW() is invoked to set
>>> up GC for that stack frame. Adding and removing objects is done
>>> per above. Adding and removing references to objects is done
>>> When it returns, GC_STACK_DELETE() is called.
>> This is way too heavyweight. Stack manipulation is fast-path code, 
>> and shouldn't have to call out to the GC all the time.
>>>> - There are many styles of collector, but the most common class uses
>>>> tracing, in which a root set of pointers is used to determine an 
>>>> initial
>>>> set of live objects, and the collector performs a transitive 
>>>> closure over
>>>> this set to establish the set of all live objects. The root set is
>>>> commonly the thread stacks and the static pointer fields.
>>> The macro GC_RUN() refers to the collector. The stub implementation
>>> shows what it is supposed to do. The OBJECT_STATUS_GCREQ status bit
>>> controls when an object needs collecting.
>> So where does OBJECT_STATUS_GCREQ get set ? It's the GC (and only the 
>> GC) that knows when an object is alive or not (unless we do 
>> reachability analysis in the compiler/interpreter).
>>> All of the above setting up and tearing down of objects and references
>>> to objects have equivalents for classes using the same rationale.
>>> (The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'.
>>> The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in
>>> 'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and
>>>> - The above is also complicated by
>>>> . Reference types
>>>> . Finalization
>>>> . Locks
>>>> . Address-based hashing
>>>> The solutions to these are all pretty well known, but complicate the
>>> design
>>>> This is pretty much it - the rest (45 years of research) is 
>>>> optimizing the
>>>> way this is all done.
>>>>> Perhaps I am taking it a bit too simplistically,
>>>>> but all I did was define a GC hook anywhere an object or
>>>>> array reference or a class load or unload happened. How
>>>>> does your concept differ? And what sort of approach should
>>>>> be used instead.
>>>> This I think brings us back to my initial question, asking what these
>>>> hooks were supposed to do. I guess you're saying you had a vague idea
>>>> that the GC might need to know about these events, so put hooks in for
>>>> them.
>>>> When I was looking around the code trying to find out where to start
>>>> hooking in a managed heap, I looked for the 'new' or 'alloc' 
>>>> operation,
>>>> and couldn't seem to find it.
>>> 'new' -- see object_instance_new() for objects,
>>> class_static_new() for classes.
>>> 'delete' -- see object_instance_delete() for objects,
>>> class_static_delete() for classes.
>> Again, what is it that calls 'delete' - isn't that the GC's job ? I'm 
>> still confused.
>>> All four of these functions perform a number of GC activities,
>>> espcially calling the GC_xxx() macros listed above.
>>> Notice that, throughout the body of the code, _any_ time a reference
>>> to an object (or class) is created or destroyed, one or more related
>>> GC functions are called.
>>> What is _done_ with these events is up to the implementor
>>> of the GC algorithm itself.
>> The GC and the VM need to be in bed a bit more closely than this, at 
>> least for an implementation with reasonable performance.
>> Attached is a tarball with 3 header files, and two (untested) GC 
>> implementations - unfortunately there's too much work for me to 
>> refactor the VM to match the interface and therefore to test. It's 
>> probably best treated as example code for now.
>> cheers
>> Robin
>> <gc.tgz>

View raw message