harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Blackburn <Steve.Blackb...@anu.edu.au>
Subject Re: I hope the JVM implements most using Java itself
Date Fri, 13 May 2005 06:31:34 GMT
Hi Dmitry,

> <constructive_interest> 

[...]

> First one is of the chicken-vs-egg variety -- as the GC algorithm 
> written in Java executes, won't it generate garbage of its own, and 
> won't it then need to be stopped and the tiny little "real" garbage 
> collector run to clean up after it? I can only see two alternatives -- 
> either it is going to cleanup its own garbage, which would be truly 
> fantastacal... Or it will somehow not generate any garbage, which I 
> think is not realistic for a Java program...

This is a very important issue.

The short answer is as follows:
    a) Within the GC code itself we don't really use Java, we use
       a special subset of Java and a few extensions.
    b) We never call new() within the GC at runtime
    c) We try not to collect ourselves ;-)

You will find the long answer buried in the source code and a somewhat 
out of date paper:

http://cvs.sourceforge.net/viewcvs.py/jikesrvm/MMTk/
http://jikesrvm.sourceforge.net/api/org/vmmagic/unboxed/package-summary.html
http://jikesrvm.sourceforge.net/api/org/vmmagic/pragma/package-summary.html
http://cs.anu.edu.au/~Steve.Blackburn/pubs/abstracts.html#mmtk-icse-2004

I'll try to give a more succinct answer here:

As for a), we essentially apply a few design patterns and idioms for 
correctness and performance (more on performance later).  We don't use 
patterns that depend on allocating instances.  In fact the only 
instances we create are per-thread metadata instances which drive the 
GC.  These are allocated only when new threads are instantiated 
(actually these are per posix thread, Jikes RVM uses an N-M threading 
model).

As for b), there is not much call for dynamic memory management within a 
GC.  The exceptions are a) short-lived metadata such as work queues, and 
b) per-object metadata such as that associated with free lists and mark 
bits etc etc.  We solve this by explicitly managing these special cases 
from within our own framework.  We have a queue mechanism that works off 
raw memory and a mechanism for associating metadata with allocated 
space.  The details are beyond the scope of this email.

Actually c) is one of the hardest parts.  It is essential that heap 
objects associated with the VM and the GC are not inadvertently 
collected.  This requires some very careful thought (remembering that 
the compiler will place our *code* into the heap too!).

As to whether this is feasible, its been done at least three times 
over.  First in the original Jalapeno, then in GCTk (developed while I 
was at UMass) and now MMTk.  Right now I am working with my students 
here to push the MMTk design even cleaner while not sacrificing 
performance---fun!

So, can it perform?  Well it is very hard to do apples to apples 
comparisons, but we measure the performance of our raw mechanisms with C 
implementations as a milestone and we do very well (by this I mean we 
can beat glibc's malloc for allocation performance, but this claim needs 
to be covered with caveats because it is very hard to make fair 
comparisons).  So the raw mechanisms perform well.  But then the 
software engineering benefits of Java come to the fore and our capacity 
to implement a toolkit and thus have a choice of many different GC 
algorithms gives us a real advantage (the GC mechanism/algorithm thing 
was the subject of a previous thread).

I've glossed over a huge amount of important stuff (like how we get raw 
memory from the OS, how we introduce type safe pointers and object 
references, etc etc).

> To summarize (and to get to the question already) - the point is that 
> language shapes thought. In other words, a program designed in Java 
> will naturally tend to be slower then a program designed in C, simply 
> because Java most naturally expresses slower designs then C does. And 
> the question is - does this agree with anyone elses experiences? And 
> if it does, is it a valid argument against using Java for the design 
> and implementation of the VM?

OK so there is already at least one response to this, but let me add my 
experience.

I am very focused on performance.  The approach Perry Cheng and I took 
when writing the code for MMTk was very much that premature optimization 
is indeed the root of all evil.  Moreover, we placed enormous faith in 
the optimizing compiler.  The philosophy was to assume the optimizing 
compiler was smart enough to optimize around our coding abstractions, 
and then to do careful performance analysis after the fact and see where 
we were being let down.  In some cases the compiler was improved to deal 
with our approach, other times we modified our approach.

Over time we learned certain idioms which on one hand meant we tended to 
get reasonable performance first shot, but on the other may have 
undermined the natural Java style we started with.

While I understand what you mean when you say: "a program designed in 
Java will naturally tend to be slower then a program designed in C", 
addressing that concern is one of the most important challenges of 
language implementation, and is why Java performance has improved so 
greatly over the past five years.

> </constructive_interest>


Cheers,

--Steve

Mime
View raw message