harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Blackburn <Steve.Blackb...@anu.edu.au>
Subject Re: Harmony Project Structure Attempt
Date Tue, 17 May 2005 21:36:27 GMT
I appeal to everyone to get the facts on the Java-in-Java issue.

It is an important issue and despite a number of comprehensive posts to 
the list from a variety of writers, people still perpetuate ideas which 
have been debunked on the list.

Listreader wrote:

>>A lot of people have expressed interest in a JVM written in Java providing performance
is adequate. There has been substantial evidence present to show that it can be.
>>    
>>
>
>I simply collected what I felt was the opinion of the majority of posts
>on the list thus far. Personally, I care little about the exact language
>as long as it is relevant and optimal for writing the compiler.
>
>Since Java lacks low-level memory management capabilities, and the 
>JVM obviously needs to deal with these issues, I would be somewhat 
>hesitant to write the full JVM in Java myself. 
>  
>
Please read through the earlier posts.

1. Java-in-Java VMs deal with low-level memory management through a 
small set of type safe extensions, supported by the compiler, which 
allow typed access to memory in a variety of ways.

http://jikesrvm.sourceforge.net/api/org/vmmagic/unboxed/package-summary.html

2. Having written an extensive high performance memory management 
toolkit, which includes a wide range of GC algorithms, and which can 
outperform the standard glibc malloc(), I can assure you (as I and 
others have stated in previous posts), that a Java-in-Java VM is not 
encumbered by any such limitations.

3. The Java-in-Java VM has some performance *advantages* through the 
lack of impedence mismatch between the supported language and the 
implementation language (this is one of the reasons for 2. above).

4. There are a number of successful instances of this technology, 
including OVM, Jikes RVM, and Bartok (a C# in C# VM at MSR which can 
match the MS product VM on some benchmarks, despite having vastly less 
resoures applied to it).

If the community decides that it would be more helpful to build the VM 
in C++ that's fine. I for one hope that the component-based architecture 
will support a variety of implementation languages.

But lets not make such a choice from a position of ignorance.

--Steve

> Hi Dmitry,
>
>> <constructive_interest> 
>
>
> [...]
>
>> First one is of the chicken-vs-egg variety -- as the GC algorithm 
>> written in Java executes, won't it generate garbage of its own, and 
>> won't it then need to be stopped and the tiny little "real" garbage 
>> collector run to clean up after it? I can only see two alternatives 
>> -- either it is going to cleanup its own garbage, which would be 
>> truly fantastacal... Or it will somehow not generate any garbage, 
>> which I think is not realistic for a Java program...
>
>
> This is a very important issue.
>
> The short answer is as follows:
>    a) Within the GC code itself we don't really use Java, we use
>       a special subset of Java and a few extensions.
>    b) We never call new() within the GC at runtime
>    c) We try not to collect ourselves
>
> You will find the long answer buried in the source code and a somewhat 
> out of date paper:
>
> http://cvs.sourceforge.net/viewcvs.py/jikesrvm/MMTk/
> http://jikesrvm.sourceforge.net/api/org/vmmagic/unboxed/package-summary.html 
>
> http://jikesrvm.sourceforge.net/api/org/vmmagic/pragma/package-summary.html 
>
> http://cs.anu.edu.au/~Steve.Blackburn/pubs/abstracts.html#mmtk-icse-2004
>
> I'll try to give a more succinct answer here:
>
> As for a), we essentially apply a few design patterns and idioms for 
> correctness and performance (more on performance later).  We don't use 
> patterns that depend on allocating instances.  In fact the only 
> instances we create are per-thread metadata instances which drive the 
> GC.  These are allocated only when new threads are instantiated 
> (actually these are per posix thread, Jikes RVM uses an N-M threading 
> model).
>
> As for b), there is not much call for dynamic memory management within 
> a GC.  The exceptions are a) short-lived metadata such as work queues, 
> and b) per-object metadata such as that associated with free lists and 
> mark bits etc etc.  We solve this by explicitly managing these special 
> cases from within our own framework.  We have a queue mechanism that 
> works off raw memory and a mechanism for associating metadata with 
> allocated space.  The details are beyond the scope of this email.
>
> Actually c) is one of the hardest parts.  It is essential that heap 
> objects associated with the VM and the GC are not inadvertently 
> collected.  This requires some very careful thought (remembering that 
> the compiler will place our *code* into the heap too!).
>
> As to whether this is feasible, its been done at least three times 
> over.  First in the original Jalapeno, then in GCTk (developed while I 
> was at UMass) and now MMTk.  Right now I am working with my students 
> here to push the MMTk design even cleaner while not sacrificing 
> performance---fun!
>
> So, can it perform?  Well it is very hard to do apples to apples 
> comparisons, but we measure the performance of our raw mechanisms with 
> C implementations as a milestone and we do very well (by this I mean 
> we can beat glibc's malloc for allocation performance, but this claim 
> needs to be covered with caveats because it is very hard to make fair 
> comparisons).  So the raw mechanisms perform well.  But then the 
> software engineering benefits of Java come to the fore and our 
> capacity to implement a toolkit and thus have a choice of many 
> different GC algorithms gives us a real advantage (the GC 
> mechanism/algorithm thing was the subject of a previous thread).
>
> I've glossed over a huge amount of important stuff (like how we get 
> raw memory from the OS, how we introduce type safe pointers and object 
> references, etc etc).
>
>> To summarize (and to get to the question already) - the point is that 
>> language shapes thought. In other words, a program designed in Java 
>> will naturally tend to be slower then a program designed in C, simply 
>> because Java most naturally expresses slower designs then C does. And 
>> the question is - does this agree with anyone elses experiences? And 
>> if it does, is it a valid argument against using Java for the design 
>> and implementation of the VM?
>
>
> OK so there is already at least one response to this, but let me add 
> my experience.
>
> I am very focused on performance.  The approach Perry Cheng and I took 
> when writing the code for MMTk was very much that premature 
> optimization is indeed the root of all evil.  Moreover, we placed 
> enormous faith in the optimizing compiler.  The philosophy was to 
> assume the optimizing compiler was smart enough to optimize around our 
> coding abstractions, and then to do careful performance analysis after 
> the fact and see where we were being let down.  In some cases the 
> compiler was improved to deal with our approach, other times we 
> modified our approach.
>
> Over time we learned certain idioms which on one hand meant we tended 
> to get reasonable performance first shot, but on the other may have 
> undermined the natural Java style we started with.
>
> While I understand what you mean when you say: "a program designed in 
> Java will naturally tend to be slower then a program designed in C", 
> addressing that concern is one of the most important challenges of 
> language implementation, and is why Java performance has improved so 
> greatly over the past five years.
>
>> </constructive_interest>
>
>
>
> Cheers,
>
> --Steve 



Mime
View raw message