harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Harmony Bootstrap JVM <boot...@earthlink.net>
Subject Re: Some questions about the architecture
Date Wed, 19 Oct 2005 20:26:36 GMT

-----Original Message-----
From: Rodrigo Kumpera <kumpera@gmail.com>
Sent: Oct 19, 2005 1:49 PM
To: harmony-dev@incubator.apache.org, Apache Harmony Bootstrap JVM <bootjvm@earthlink.net>
Subject: Re: Some questions about the architecture

On 10/19/05, Apache Harmony Bootstrap JVM <bootjvm@earthlink.net> wrote:
> Rodrigo,
> At some point, _somebody_ has to wait on I/O.  I agree
> that this is not the most efficient implementation, but one
> of the advantages it has is that it does not need _any_
> gc_safepoint() type calls for read or write barriers.
> I am _definitely_ interested in your suggestions, and
> I think others will agree with you, but let's get the code
> up and running as it stands so we can try other approaches
> and compare what good things they bring to the table
> instead of, or even in addition to, the existing approach.

I think I have not been clear enout. safepoints are needed by the
garbage collector to know when is safe to stop a given thread (in
bounded time) for a stop-the-world garbage collection. This have
nothing to do with read/write barriers.


Notice that in 'jvm/src/jvmcfg.h' there is a JVMCFG_GC_THREAD
that is used in jvm_run() as a regular thread like any other.
It calls gc_run() on a scheduled basis.  Also, any time an object
finalize() is done, gc_run() is possible.  Yes, I treat GC as a
stop-the-world process, but here is the key:  Due to the lack
of asynchronous native POSIX threads, there are no safe points
required.  The only thread is the SIGALRM target that sets the
volatile boolean in timeslice_tick() for use by opcode_run() to
test.  <b>This is the _only_ formally asynchrous data structure in
the whole machine.</b>  (Bold if you use an HTML browser, otherwise
clutter meant for emphasis.)  Objects that contain no references can
be GC'd since they are merely table entries.  Depending on how the
GC algorithm is done, gc_run() may or may not even need to look
at a particular object.

Notice also that classes are treated in the same way by the GC API.
If a class is no longer referenced by any objects, it may be GC'd also.
First, its intrinsic class object must be GC'd, then the class itself.  This
may take more than one pass of gc_run() to make it happen.


For exemple, as I understand, JikesRVM implements gc safepoints (the
points in the bytecode where gc maps are generated) at loop backedges
and method calls.

> The priorities that I set were (1) get the logic working
> without resorting to design changes such as multi-threading,
> then (2) optimize the implementation and evaluate
> improvements and architectural changes, then (3) implement
> improvements and architectural changes.  The same goes
> for the object model using the OBJECT() macro and the
> 'robject' structure in 'jvm/src/object.h'.  And the CLASS()
> macro, and the STACK() macro, and other components
> that I have tried to implement in a modular fashion (see 'README'
> for a discussion of this issue).  Let's get it working, then look into
> design changes, even having more than one option available at
> configuration time, compile time, or even run time, such as is
> now the case with the HEAP_xxx() macros and the GC_xxx()
> macros that Robin Garner has been asking about.
> As to the 'jvm/src/timeslice.c' code, notice that each
> time that SIGALRM is received, the handler sets a
> volatile boolean that is read by the JVM inner loop
> in 'while ( ... || (rfalse == pjvm->timeslice_expired))'
> in 'jvm/src/opcode.c' to check if it is time to give the
> next thread some time.  I don't expect this to be the
> most efficient check, but it _should_ work properly
> since I have unit tested the time slicing code, both
> the while() test and the setting of the boolean in
> timeslice_tick().  One thing I have heard on this
> list is that one of the implementations, I think it was
> IBM's Jikes (?), was that they chose an interpreter
> over a JIT.  Now that is not directly related to time
> slicing, but it does mean that a mechanism like what I
> implemented does not have to have compile-time
> support.
> *** How about you JVM experts out there?  Do you have
>       any wisdom for me on the subject of time slicing
>       on an outer/inner interpreter loop interpreter
>       implementation?  And compared to JIT?  Archie Cobb,
>       what do you think?  How about you lurkers out there? ***

All open source JVMs I checked use native threads, you can take a look
at how IBM did with Native POSIX Threading Library (NPTL), as it
implement userland threads on linux.


I would be interested in your evaluation of the existing implementation
against what could be done to implement such an approach.


> As to your question about setjmp/longjmp, I agree that
> there are other ways to do it.  In fact, I originally used
> stack walking in one sense to return from fatal errors
> instead for my original implementation of the heap
> allocator, which used malloc/free.  If I got an error
> from malloc(), I simply returned a NULL pointer, which
> I tested from the calling function.  If I got this error,
> I returned to its caller with an error, and so on, all the
> way up.  However, what happens when you have a
> normally (void) return?  Use TRUE/FALSE instead?
> Could be.  But the more I developed the code, the
> harder this became to support.  Therefore, since fatal
> errors kill the application anyway, I decided to _VASTLY_
> simplify the code by using what is effectively the OO concept
> of an exception as available in the 'C' runtime library
> with setjmp/longjmp.  Notice that many complicated models
> can end up with irresolvable terminal conditions and that
> the simplest way to escape is back to a known good state.
> This is the purpose of setjmp/longjmp.  Try this on for size
> with any communication protocol implementation, such as
> TCP/IP some time.  When you get to a snarled condition where
> there just is not any graceful way out, the non-local character
> of setjmp/longjmp cuts that knot instead of untying it with
> horrible error code checking back up the stack.  This is why
> I finally decided to go this way.  (Does this answer your main
> question here?)

It does, but by stack walking I meant not returning null, but having
the code analise the call stack for a proper IP address to use.

What do you mean by 'IP address' in this context?  I think I am
missing something.

> Also, I sort of get the impression that you may be blurring the
> distinction between the native 'C' code runtime environment
> and the virtual Java runtime environment when you talk
> about serialization, security, GC, and JNI.  (This is _very_
> easy to do!  This is why I begin my real-machine data types
> with 'r' and Java data types with 'j'.  I was confusing myself
> all the time!)  Obviously, there is no such thing as setjmp/longjmp
> in the OO paradigm, but they do have a better method,
> namely, the concept of the exception.  That is effectively
> what I have tried to implement here in the native 'C' code
> on the real platform, to use OO terms.  Did I misunderstand you?

Not exactly, GC must walk the stack to find the root set;
Serialization needs to find what is the last user class loader on
stack (since it's the one used to lookup classes for deserialization);
Security needs to walk the stack for performing checks on the code
base of each method on on; and JNI needs this as exceptions are queued
for using by the ExceptionOccurred call.

I did look at opcode.c and thread.c but I could not find the stack
unwinding code, could you  point me where it is located?


Which stack to you mean?  A thread's JVM stack?  The real machine
stack?  I think I'm confused.


> Thanks,
> Dan Lydick

Dan Lydick

View raw message