Return-Path: Delivered-To: apmail-incubator-harmony-dev-archive@www.apache.org Received: (qmail 18799 invoked from network); 21 Oct 2005 04:14:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 21 Oct 2005 04:14:48 -0000 Received: (qmail 95432 invoked by uid 500); 21 Oct 2005 04:14:46 -0000 Delivered-To: apmail-incubator-harmony-dev-archive@incubator.apache.org Received: (qmail 94597 invoked by uid 500); 21 Oct 2005 04:14:44 -0000 Mailing-List: contact harmony-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: harmony-dev@incubator.apache.org Delivered-To: mailing list harmony-dev@incubator.apache.org Received: (qmail 94582 invoked by uid 99); 21 Oct 2005 04:14:44 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2005 21:14:44 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [207.69.195.66] (HELO pop-canoe.atl.sa.earthlink.net) (207.69.195.66) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2005 21:14:43 -0700 Received: from elwamui-norfolk.atl.sa.earthlink.net ([209.86.224.43]) by pop-canoe.atl.sa.earthlink.net with esmtp (Exim 3.36 #10) id 1ESoIQ-0002lN-00 for harmony-dev@incubator.apache.org; Fri, 21 Oct 2005 00:14:22 -0400 Message-ID: <5935926.1129868061997.JavaMail.root@elwamui-norfolk.atl.sa.earthlink.net> Date: Thu, 20 Oct 2005 23:14:21 -0500 (GMT-05:00) From: Apache Harmony Bootstrap JVM Reply-To: Apache Harmony Bootstrap JVM To: harmony-dev@incubator.apache.org Subject: Re: Some questions about the architecture Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: EarthLink Zoo Mail 1.0 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N -----Original Message----- From: Robin Garner Sent: Oct 20, 2005 3:08 PM To: Apache Harmony Bootstrap JVM Cc: harmony-dev@incubator.apache.org Subject: Re: Some questions about the architecture > Robin, Rodrigo, > > Perhaps the two of you could get your heads together > on GC issues? I think both of you have been thinking > along related lines on the structure of GC for this JVM. > What do you think? I think the current challenge is to get the GC people and the VM people thinking along the same lines when it comes to GC issues. I think we're both coming from the same place. --- Probably! --- > Further comments follow... > > -----Original Message----- > From: Rodrigo Kumpera > Sent: Oct 19, 2005 4:49 PM > To: harmony-dev@incubator.apache.org > Subject: Re: Some questions about the architecture > > On 10/19/05, Apache Harmony Bootstrap JVM wrote: >> >> >> -----Original Message----- >> From: Rodrigo Kumpera >> Sent: Oct 19, 2005 1:49 PM >> To: harmony-dev@incubator.apache.org, Apache Harmony Bootstrap JVM >> >> Subject: Re: Some questions about the architecture >> >> On 10/19/05, Apache Harmony Bootstrap JVM wrote: >> > > ...snip... >> >> Notice that in 'jvm/src/jvmcfg.h' there is a JVMCFG_GC_THREAD >> that is used in jvm_run() as a regular thread like any other. >> It calls gc_run() on a scheduled basis. Also, any time an object >> finalize() is done, gc_run() is possible. Yes, I treat GC as a >> stop-the-world process, but here is the key: Due to the lack >> of asynchronous native POSIX threads, there are no safe points >> required. The only thread is the SIGALRM target that sets the >> volatile boolean in timeslice_tick() for use by opcode_run() to >> test. This is the _only_ formally asynchrous data structure in >> the whole machine. (Bold if you use an HTML browser, otherwise >> clutter meant for emphasis.) Objects that contain no references can >> be GC'd since they are merely table entries. Depending on how the >> GC algorithm is done, gc_run() may or may not even need to look >> at a particular object. >> >> Notice also that classes are treated in the same way by the GC API. >> If a class is no longer referenced by any objects, it may be GC'd also. >> First, its intrinsic class object must be GC'd, then the class itself. >> This >> may take more than one pass of gc_run() to make it happen. There's a major misconception here. As I was describing it to someone a while ago, conceptually a garbage collected heap is actually simpler than an explicitly managed heap. The standard heap has 'malloc' and 'free'. A managed heap (with GC) just has 'malloc'. In practice it's more complex but the principle is the same. From the interpreter's point of view, you just allocate. Forever. Reclaiming free space is the GC's problem, because it's the only part of the VM that can know when something is dead. Things die when (or soon after) all references to them die. --- This design _only_ uses "heap.h" and friends for management of internal JVM data structures, and _never_ repeat _never_ is available or visible or controllable directly or indirectly by the effects of Java bytecodes with the exception of the functions object_instance_new() and object_instance_delete(), and then only for array objects and an array of 'jvalue' for the fields in a class (one for static fields, the other for instance fields), which then go to the heap for their data storage only (an 'jlong' array of 10 elements gets 10x8, or 80 bytes, the rest coming out of the object table, or say 5 fields, so 5 x 8, or 40 bytes). I have separated GC out as a _completely_ different issue that relates _only_ to the effects of Java bytecodes. In other words, I have two separate memory management domains so that, no matter what sort of GC is used by the virtual machine, the real-machine code that implements it is not affected by it at all. In this way, GC can have neither a positive nor a negative effect on the real-machine implementation of the JVM. This was by design, and whether or not this was a good design choice, I think that is for more experienced JVM architects than myself to decide. By virtue of having GC control when object references are reclaimed, the array and field storage ultimately falls under its control instead of being explicitly managed. The object_instance_new() and object_instance_delete() being controlled by 'new' and GC, respectively. *** WHAT SAY YOU JVM EXPERTS LURKING OUT THERE? I KNOW YOU'RE READING THIS! Please speak up! I would like to hear what your experience has been so we can create the best solution to the issue of real and virtual machine memory management. *** Now to be fair with a complete disclosure at this time, my object allocation is from a static array in the 'pjvm->object[]' array of 'robject', which has a fixed, maximum size. The same for classes with the 'pjvm->class[]' array of 'rclass'. The OBJECT() and CLASS() macros can be adjusted to reflect any different allocation mechanism that might be chosen for any implementation, either now or in the future, hopefully making this JVM _extremely_ modular (See also 'README' for a section on "Subsystem component abstraction".) Keep in mind that this JVM was _not_ designed with blinding speed in mind for its first cut, but with the Henry Ford approach: 1. Sweat blood and create a Model "A". 2. Sell enough to make it worth the while. 3. Work on improvements and create a Model "B". 4. Go from one failure to the next with no loss of enthusiasm (Quote from Mr. Ford) 5. Get down to the Model "K", which had some significant success. 6. Keep working until you build the Model "T", which sold by the million. I guess I'd like us to get the Model "A" out the door even as we look toward improvements such as are being suggested from a number of folks. If we need to adjust the heap and GC models, sure, we can do it. And perhaps a new and better GC interface would be appropriate (As Robin pointed out to me off the list). As he also pointed out, now is the time to make an API change like this before we get deeply into the project as a group. I would like to see what this JVM has going for it with its design in its Model "A" state, whether or not we adjust the GC interface paradigm. Part of the reason I didn't supply GC is (1) it is a crucial element, and (2) I've never done one, and (3) there are people like Robin who have written honours theses on GC and are therefore much more qualified. --- GC is triggered in two cases: 1) the user code calls System.gc(). 2) the heap fills up (for some suitable definition of 'fills up'). There is never any need for the VM code to call the garbage collector. A consequence is that every call to 'new' needs to be a gc safe point. If the heap is full, there's no way to keep executing until a timer event triggers. What the VM needs to do is to provide services that allow the GC to do its job. These are at core: - A way to allocate bulk memory (eg mmap) - A way to enumerate roots (this is where stack scanning happens) - A scheduling mechanism (especially for parallel GC) - A way to enumerate the pointers in an object - Notification (which the GC can ignore) for pointer read and write operations (read and write barriers) Understanding this will go a long way to getting past the disconnect we currently have over GC issues. When I propose the new gc interfaces, this should become more concrete. > That depends on the GC implementation. Look at 'jvm/src/gc_stub.c' > for the stub reference implementation. To see the mechanics of > how to fit it into the compile environment, look at the GC and heap > setup in 'config.sh' and at 'jvm/src/heap.h' for how multiple heap > implementations get configured in. As mentioned before, the heap *is* the GC. --- I think we are using the terms "heap" and "GC" with slightly different definitions. My definitions are stated above, where I think you are using the terms synonomously. Also, I have GC set up to meet the two conditions you state. But a 'new' event never needs a GC safe point in this implementation because of the outer/inner loop implementation on the _same_ real-machine thread, as described in other posts to this list. --- > The GC interface API that I defined may or may not be adequate > for everything. I basically set it up so that any time an object > reference > was added or deleted, I called a GC function. So is this a write barrier ? IE, are these functions called for every PUTFIELD, PUTSTATIC and AASTORE bytecode ? --- No. There are no barriers of any kind except the mutex mechanism for the time slice thread. The outer/inner loop interpreter structure precludes the need for it. Notice that the implementation will determine whether this is an efficient way to do it or not, especially since I distinguish between fields and local variables. --- > The same goes for > class loading and unloading. For local variables on the JVM stack for > each > thread, the GC functions are slightly different than for fields in an > object, > but the principle is the same. You can write the interface so that the GC needs to know when a new class is loaded (or not, but IMO it's a good design). As far as the GC is concerned, a class is alive as long as there are objects of that type in the heap. If the class data structures are actually in the heap, this becomes easy, but if you want to keep them on the VM side of the fence, you could potentially hijack the weak reference mechanism to get notified when the last object dies. --- I suspected that this might be a good idea for classes. Thanks for the confirmation. There should not be any reason to highjack the weak reference mechanism with this GC interface design as the GC mechanism is notified when a class is deleted, which can _only_ occur when there are no references to it. Maybe I should state something that I consider to be of value to this JVM design. Both Robin and Rodrigo have been sniffing around the edges of it in their critique. And they both have some _good_ points about what I have put together. And I am learning quite a bit as I think about their issues. I think this JVM design has some strong intrinsic features in that I explicitly do _not_ depend on a lot of heap allocation for my major runtime structures, that is, for those that exist over a significant part of the life of the JVM, namely the thread, class, and object tables. In their place, a somewhat static malloc-type allocation (huh?) is done for THREAD(), CLASS() and OBJECT() structures-- meaning that I do a single heap allocation for the whole of each table and keep it until the JVM shuts down. (These table designs, of course, may be changed as necessary.) I do _nothing_ fancy in the way of managing memory. Period. And Intentionally. This attitude probably comes from my experience in real-time embedded systems where the resources are limited and non-extensible. By applying what I consider to be _extremely_ conservative memory management tactics, I think that this design will have some inherent reliability and speed built into it that may not be obvious upon first blush. With that said, I am very interested in the numerous ideas for architectural changes and improvements, and I look forward to Robin's forthcoming suggestions for a new API for the GC mechanism. --- Regards, Robin Dan Lydick