harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <ge...@apache.org>
Subject Re: Some questions about the architecture
Date Fri, 21 Oct 2005 12:32:12 GMT
Gah! Dan (or should we call you AHBJ? :)  Can you turn on a standard  
quoting mechanism in your mailer to make threads easier to follow?

I keep running aground with this "-----" thing....

geir


On Oct 21, 2005, at 5:14 AM, Apache Harmony Bootstrap JVM wrote:

>
>
> -----Original Message-----
> From: Robin Garner <robin.garner@anu.edu.au>
> Sent: Oct 20, 2005 3:08 PM
> To: Apache Harmony Bootstrap JVM <bootjvm@earthlink.net>
> Cc: harmony-dev@incubator.apache.org
> Subject: Re: Some questions about the architecture
>
>
>> Robin, Rodrigo,
>>
>> Perhaps the two of you could get your heads together
>> on GC issues?  I think both of you have been thinking
>> along related lines on the structure of GC for this JVM.
>> What do you think?
>>
>
> I think the current challenge is to get the GC people and the VM  
> people
> thinking along the same lines when it comes to GC issues.  I think  
> we're
> both coming from the same place.
>
> ---
>
> Probably!
>
> ---
>
>
>> Further comments follow...
>>
>> -----Original Message-----
>> From: Rodrigo Kumpera <kumpera@gmail.com>
>> Sent: Oct 19, 2005 4:49 PM
>> To: harmony-dev@incubator.apache.org
>> Subject: Re: Some questions about the architecture
>>
>> On 10/19/05, Apache Harmony Bootstrap JVM <bootjvm@earthlink.net>  
>> wrote:
>>
>>>
>>>
>>> -----Original Message-----
>>> From: Rodrigo Kumpera <kumpera@gmail.com>
>>> Sent: Oct 19, 2005 1:49 PM
>>> To: harmony-dev@incubator.apache.org, Apache Harmony Bootstrap JVM
>>> <bootjvm@earthlink.net>
>>> Subject: Re: Some questions about the architecture
>>>
>>> On 10/19/05, Apache Harmony Bootstrap JVM <bootjvm@earthlink.net>  
>>> wrote:
>>>
>>>>
>>>>
>> ...snip...
>>
>>>
>>> Notice that in 'jvm/src/jvmcfg.h' there is a JVMCFG_GC_THREAD
>>> that is used in jvm_run() as a regular thread like any other.
>>> It calls gc_run() on a scheduled basis.  Also, any time an object
>>> finalize() is done, gc_run() is possible.  Yes, I treat GC as a
>>> stop-the-world process, but here is the key:  Due to the lack
>>> of asynchronous native POSIX threads, there are no safe points
>>> required.  The only thread is the SIGALRM target that sets the
>>> volatile boolean in timeslice_tick() for use by opcode_run() to
>>> test.  <b>This is the _only_ formally asynchrous data structure in
>>> the whole machine.</b>  (Bold if you use an HTML browser, otherwise
>>> clutter meant for emphasis.)  Objects that contain no references can
>>> be GC'd since they are merely table entries.  Depending on how the
>>> GC algorithm is done, gc_run() may or may not even need to look
>>> at a particular object.
>>>
>>> Notice also that classes are treated in the same way by the GC API.
>>> If a class is no longer referenced by any objects, it may be GC'd  
>>> also.
>>> First, its intrinsic class object must be GC'd, then the class  
>>> itself.
>>> This
>>> may take more than one pass of gc_run() to make it happen.
>>>
>
> There's a major misconception here.  As I was describing it to  
> someone a
> while ago, conceptually a garbage collected heap is actually  
> simpler than
> an explicitly managed heap.  The standard heap has 'malloc' and  
> 'free'.  A
> managed heap (with GC) just has 'malloc'.
>
> In practice it's more complex but the principle is the same.  From the
> interpreter's point of view, you just allocate.  Forever.   
> Reclaiming free
> space is the GC's problem, because it's the only part of the VM  
> that can
> know when something is dead.  Things die when (or soon after) all
> references to them die.
>
> ---
>
> This design _only_ uses "heap.h" and friends for management of
> internal JVM data structures, and _never_ repeat _never_ is available
> or visible or controllable directly or indirectly by the effects of  
> Java
> bytecodes with the exception of the functions object_instance_new()
> and object_instance_delete(), and then only for array objects and an
> array of 'jvalue' for the fields in a class (one for static fields,  
> the other
> for instance fields), which then go to the heap for their data storage
> only (an 'jlong' array of 10 elements gets 10x8, or 80 bytes, the rest
> coming out of the object table, or say 5 fields, so 5 x 8, or 40  
> bytes).
> I have separated GC out as a _completely_ different issue that relates
> _only_ to the effects of Java bytecodes.
>
> In other words, I have two separate memory management domains
> so that, no matter what sort of GC is used by the virtual machine,
> the real-machine code that implements it is not affected by it at all.
> In this way, GC can have neither a positive nor a negative effect
> on the real-machine implementation of the JVM.  This was by design,
> and whether or not this was a good design choice, I think that is for
> more experienced JVM architects than myself to decide.
>
> By virtue of having GC control when object references are reclaimed,
> the array and field storage ultimately falls under its control  
> instead of
> being explicitly managed.  The object_instance_new() and  
> object_instance_delete()
> being controlled by 'new' and GC, respectively.
>
> *** WHAT SAY YOU JVM EXPERTS LURKING OUT THERE?  I KNOW
>       YOU'RE READING THIS!  Please speak up!  I would like to hear
>       what your experience has been so we can create the best
>       solution to the issue of real and virtual machine memory  
> management.  ***
>
> Now to be fair with a complete disclosure at this time, my object  
> allocation
> is from a static array in the 'pjvm->object[]' array of 'robject',  
> which has a
> fixed, maximum size.  The same for classes with the 'pjvm->class[]'  
> array
> of 'rclass'.  The OBJECT() and CLASS() macros can be adjusted to  
> reflect
> any different allocation mechanism that might be chosen for any  
> implementation,
> either now or in the future, hopefully making this JVM _extremely_  
> modular (See
> also 'README' for a section on "Subsystem component abstraction".)
>
> Keep in mind that this JVM was _not_ designed with blinding speed  
> in mind for
> its first cut, but with the Henry Ford approach:
>
>     1.  Sweat blood and create a Model "A".
>     2.  Sell enough to make it worth the while.
>     3.  Work on improvements and create a Model "B".
>     4.  Go from one failure to the next with no loss of enthusiasm  
> (Quote from Mr. Ford)
>     5.  Get down to the Model "K", which had some significant success.
>     6.  Keep working until you build the Model "T", which sold by  
> the million.
>
> I guess I'd like us to get the Model "A" out the door even as we  
> look toward
> improvements such as are being suggested from a number of folks.   
> If we
> need to adjust the heap and GC models, sure, we can do it.  And  
> perhaps a
> new and better GC interface would be appropriate (As Robin pointed  
> out to me
> off the list).  As he also pointed out, now is the time to make an  
> API change
> like this before we get deeply into the project as a group.
>
> I would like to see what this JVM has going for it with its design  
> in its Model "A" state,
> whether or not we adjust the GC interface paradigm.  Part of the  
> reason I didn't
> supply GC is (1) it is a crucial element, and (2) I've never done  
> one, and (3) there
> are people like Robin who have written honours theses on GC and are  
> therefore
> much more qualified.
>
> ---
>
>
> GC is triggered in two cases: 1) the user code calls System.gc().   
> 2) the
> heap fills up (for some suitable definition of 'fills up').  There is
> never any need for the VM code to call the garbage collector.
>
> A consequence is that every call to 'new' needs to be a gc safe  
> point.  If
> the heap is full, there's no way to keep executing until a timer event
> triggers.
>
> What the VM needs to do is to provide services that allow the GC to  
> do its
> job.  These are at core:
> - A way to allocate bulk memory (eg mmap)
> - A way to enumerate roots (this is where stack scanning happens)
> - A scheduling mechanism (especially for parallel GC)
> - A way to enumerate the pointers in an object
> - Notification (which the GC can ignore) for pointer read and write
> operations (read and write barriers)
>
> Understanding this will go a long way to getting past the  
> disconnect we
> currently have over GC issues.  When I propose the new gc  
> interfaces, this
> should become more concrete.
>
>
>> That depends on the GC implementation.  Look at 'jvm/src/gc_stub.c'
>> for the stub reference implementation.  To see the mechanics of
>> how to fit it into the compile environment, look at the GC and heap
>> setup in 'config.sh' and at 'jvm/src/heap.h' for how multiple heap
>> implementations get configured in.
>>
>
> As mentioned before, the heap *is* the GC.
>
> ---
>
> I think we are using the terms "heap" and "GC" with slightly different
> definitions.  My definitions are stated above, where I think you  
> are using
> the terms synonomously.
>
> Also, I have GC set up to meet the two conditions you state.  But a  
> 'new'
> event never needs a GC safe point in this implementation because of  
> the
> outer/inner loop implementation on the _same_ real-machine thread, as
> described in other posts to this list.
>
> ---
>
>
>> The GC interface API that I defined may or may not be adequate
>> for everything.  I basically set it up so that any time an object
>> reference
>> was added or deleted, I called a GC function.
>>
>
> So is this a write barrier ?  IE, are these functions called for every
> PUTFIELD, PUTSTATIC and AASTORE bytecode ?
>
> ---
>
> No.  There are no barriers of any kind except the mutex mechanism  
> for the
> time slice thread.  The outer/inner loop interpreter structure  
> precludes the
> need for it.
>
> Notice that the implementation will determine whether this is an  
> efficient way
> to do it or not, especially since I distinguish between fields and  
> local variables.
>
> ---
>
>
>>                                                     The same goes for
>> class loading and unloading.  For local variables on the JVM stack  
>> for
>> each
>> thread, the GC functions are slightly different than for fields in an
>> object,
>> but the principle is the same.
>>
>
> You can write the interface so that the GC needs to know when a new  
> class
> is loaded (or not, but IMO it's a good design).  As far as the GC is
> concerned, a class is alive as long as there are objects of that  
> type in
> the heap.  If the class data structures are actually in the heap, this
> becomes easy, but if you want to keep them on the VM side of the  
> fence,
> you could potentially hijack the weak reference mechanism to get  
> notified
> when the last object dies.
>
> ---
>
> I suspected that this might be a good idea for classes.  Thanks for
> the confirmation.
>
> There should not be any reason to highjack the weak reference
> mechanism with this GC interface design as the GC mechanism
> is notified when a class is deleted, which can _only_ occur when
> there are no references to it.
>
> Maybe I should state something that I consider to be of value to this
> JVM design.  Both Robin and Rodrigo have been sniffing around the
> edges of it in their critique.  And they both have some _good_ points
> about what I have put together.  And I am learning quite a bit as I
> think about their issues.
>
> I think this JVM design has some strong intrinsic features in that I
> explicitly do _not_ depend on a lot of heap allocation for my major
> runtime structures, that is, for those that exist over a  
> significant part
> of the life of the JVM, namely the thread, class, and object tables.
> In their place, a somewhat static malloc-type allocation (huh?) is  
> done
> for THREAD(), CLASS() and OBJECT() structures-- meaning that I do
> a single heap allocation for the whole of each table and keep it until
> the JVM shuts down.  (These table designs, of course, may be
> changed as necessary.)  I do _nothing_ fancy in the way of
> managing memory.  Period.  And Intentionally.  This attitude probably
> comes from my experience in real-time embedded systems where the
> resources are limited and non-extensible.  By applying what I consider
> to be _extremely_ conservative memory management tactics, I think
> that this design will have some inherent reliability and speed  
> built into
> it that may not be obvious upon first blush.  With that said, I am  
> very
> interested in the numerous ideas for architectural changes and
> improvements, and I look forward to Robin's forthcoming suggestions
> for a new API for the GC mechanism.
>
> ---
>
>
> Regards,
> Robin
>
>
>
>
>
>
> Dan Lydick
>

-- 
Geir Magnusson Jr                                  +1-203-665-6437
geirm@apache.org



Mime
View raw message