Return-Path: Delivered-To: apmail-incubator-harmony-dev-archive@www.apache.org Received: (qmail 55852 invoked from network); 31 Oct 2005 23:39:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 31 Oct 2005 23:39:58 -0000 Received: (qmail 85428 invoked by uid 500); 31 Oct 2005 23:39:50 -0000 Delivered-To: apmail-incubator-harmony-dev-archive@incubator.apache.org Received: (qmail 85353 invoked by uid 500); 31 Oct 2005 23:39:49 -0000 Mailing-List: contact harmony-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: harmony-dev@incubator.apache.org Delivered-To: mailing list harmony-dev@incubator.apache.org Received: (qmail 85342 invoked by uid 99); 31 Oct 2005 23:39:49 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Oct 2005 15:39:49 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [167.206.4.203] (HELO mta8.srv.hcvlny.cv.net) (167.206.4.203) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Oct 2005 15:39:45 -0800 Received: from [10.0.1.8] (ool-43560634.dyn.optonline.net [67.86.6.52]) by mta8.srv.hcvlny.cv.net (Sun Java System Messaging Server 6.2-2.06 (built May 11 2005)) with ESMTP id <0IP900DIB0CRN000@mta8.srv.hcvlny.cv.net> for harmony-dev@incubator.apache.org; Mon, 31 Oct 2005 18:38:51 -0500 (EST) Date: Mon, 31 Oct 2005 18:38:57 -0500 From: "Geir Magnusson Jr." Subject: Re: Questions about GC implementations In-reply-to: <43630F5A.3080407@anu.edu.au> To: harmony-dev@incubator.apache.org Message-id: <11323422-5C84-422B-BCCB-7EBAF0BB46F4@apache.org> MIME-version: 1.0 X-Mailer: Apple Mail (2.734) Content-type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Content-transfer-encoding: 7BIT References: <4120051042716650420@earthlink.net> <43630F5A.3080407@anu.edu.au> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Robin - can you post this to a JIRA please? geir On Oct 29, 2005, at 1:57 AM, Robin Garner wrote: > > >>> In the garbage collectors I've worked with, the essential design is: >>> >>> - 'new' allocates space on the heap. >>> - The header of each object contains a pointer (or equiv) to a >>> per-class >>> data structure, that points to the GC map for the object (and >>> virtual >>> dispatch tables etc etc) >>> - Reference fields in objects contain pointers directly to the heap >>> objects they reference. >>> - Pointer loads and stores are (optionally) performed via >>> barriers - the >>> terminology is a little confusing: these are not synchronization >>> barriers, >>> but opportunities for the GC to intercept the load/store and do some >>> additional processing. Write barriers are common, read barriers >>> less so. >>> >>> >> >> This is also the approach I have taken, so I think we are >> on the same page. I think we are just saying the same thing >> in different words. When an object is to be allocated, >> its pointer will be set by the allocator. This pointer >> is 'robject.pgarbage' and is found in 'jvm/src/object.h'. >> The reason you did not find one is because I have only >> provided a stub GC implementation. The 'new' operation >> is performed by object_instance_new() in 'jvm/src/object.c' >> and includes a call to GC_OBJECT_NEW(). >> >> > Ok, so now I understand what the 'robject.pgarbage' pointer is - > this is the pointer to the body of an object. So do I understand > correctly that the body of an array is pointed to by a different > pointer in an object header ? Do you realise that this means that > a GETFIELD requires two indirections ? > > Do you realise that your object header is somewhere in the vicinity > of 40 bytes ? The average payload of a java object (at least for > SPECjvm98) is 16 bytes. Many of the fields in the bootJVM object > header are actually constant across all objects of a class. > > The object table needs to go - I think this is a fundamental enough > design feature that it needs to be removed as soon as possible. An > object header should be 2 words at the most (3 for arrays), and > they should be contiguous with the object. > >> Any time a reference to that object is made, its object >> hash, of type 'jvm_object_hash' has a reference recorded >> by GC_OBJECT_MKREF_FROM_OBJECT() or GC_OBJECT_FIELD_MKREF() >> or GC_OBJECT_MKREF_FROM_JVM(). These are for internal >> JVM references, references from Java object reference variables, >> and references from Java local method reference variables, >> respectively. When the reference is no longer needed, such as >> when an object is destroyed or when a local method returns, >> the reference is no longer needed. When this occurs, >> then GC_OBJECT_RMREF_FROM_OBJECT() or GC_OBJECT_FIELD_RMREF() >> or GC_OBJECT_RMREF_FROM_JVM() are called, respectively. When >> an object is no longer used, then GC_OBJECT_DELETE() is called. >> >> > So these I guess are what we would refer to in the GC world as > barriers. My confusion is that a) they don't have enough > parameters, and b) there are too many of them. Am I right in > assuming that you are invoking a barrier on every update to a local > variable, every update to a stack ? This is going to be way too > inefficient. > > For a stop the world GC, what is needed is a barrier on a PUTFIELD > and AASTORE, and a way that the GC can query the current pointers > that the VM has into the heap when it starts a collection. > Optionally, we can also have a PUTSTATIC barrier, and corresponding > read barriers. > > I also don't get the distinction between the classes and objects. > Either a class is an object, and lives in the heap - in which case > it can be treated as an object - or it isn't and lives in the VM. > > >> When a local methods is called, GC_STACK_NEW() is invoked to set >> up GC for that stack frame. Adding and removing objects is done >> per above. Adding and removing references to objects is done >> with GC_STACK_MKREF_FROM_JVM() and GC_STACK_RMREF_FROM_JVM(). >> When it returns, GC_STACK_DELETE() is called. >> >> > This is way too heavyweight. Stack manipulation is fast-path code, > and shouldn't have to call out to the GC all the time. > > >>> - There are many styles of collector, but the most common class uses >>> tracing, in which a root set of pointers is used to determine an >>> initial >>> set of live objects, and the collector performs a transitive >>> closure over >>> this set to establish the set of all live objects. The root set is >>> commonly the thread stacks and the static pointer fields. >>> >>> >> >> The macro GC_RUN() refers to the collector. The stub implementation >> shows what it is supposed to do. The OBJECT_STATUS_GCREQ status bit >> controls when an object needs collecting. >> >> > So where does OBJECT_STATUS_GCREQ get set ? It's the GC (and only > the GC) that knows when an object is alive or not (unless we do > reachability analysis in the compiler/interpreter). > > >> All of the above setting up and tearing down of objects and >> references >> to objects have equivalents for classes using the same rationale. >> (The class GC pointer is 'rclass.pgarbage' in 'jvm/src/class.h'. >> The local method GC pointer is 'JVMREG_STACK_GC_OFFSET' in >> 'jvm/src/jvmreg.h' and is manipulated by GC_STACK_NEW() and >> GC_STACK_DELETE().) >> >> >> >>> - The above is also complicated by >>> . Reference types >>> . Finalization >>> . Locks >>> . Address-based hashing >>> The solutions to these are all pretty well known, but complicate >>> the >>> >>> >> design >> >> >>> This is pretty much it - the rest (45 years of research) is >>> optimizing the >>> way this is all done. >>> >>> >>> >>>> Perhaps I am taking it a bit too simplistically, >>>> but all I did was define a GC hook anywhere an object or >>>> array reference or a class load or unload happened. How >>>> does your concept differ? And what sort of approach should >>>> be used instead. >>>> >>>> >>> This I think brings us back to my initial question, asking what >>> these >>> hooks were supposed to do. I guess you're saying you had a vague >>> idea >>> that the GC might need to know about these events, so put hooks >>> in for >>> them. >>> >>> When I was looking around the code trying to find out where to start >>> hooking in a managed heap, I looked for the 'new' or 'alloc' >>> operation, >>> and couldn't seem to find it. >>> >>> >>> >> >> 'new' -- see object_instance_new() for objects, >> class_static_new() for classes. >> >> 'delete' -- see object_instance_delete() for objects, >> class_static_delete() for classes. >> >> > Again, what is it that calls 'delete' - isn't that the GC's job ? > I'm still confused. > > >> All four of these functions perform a number of GC activities, >> espcially calling the GC_xxx() macros listed above. >> >> Notice that, throughout the body of the code, _any_ time a reference >> to an object (or class) is created or destroyed, one or more related >> GC functions are called. >> >> What is _done_ with these events is up to the implementor >> of the GC algorithm itself. >> >> > The GC and the VM need to be in bed a bit more closely than this, > at least for an implementation with reasonable performance. > > Attached is a tarball with 3 header files, and two (untested) GC > implementations - unfortunately there's too much work for me to > refactor the VM to match the interface and therefore to test. It's > probably best treated as example code for now. > > cheers > Robin > > > -- Geir Magnusson Jr +1-203-665-6437 geirm@apache.org