hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: Java Versions and Hadoop
Date Mon, 10 Oct 2011 18:57:45 GMT

On 10/10/11 3:45 AM, "Steve Loughran" <stevel@apache.org> wrote:

>On 08/10/11 19:19, Ted Dunning wrote:
>> I hate to sound like the folks who only recently stopped using 1.4, but
>>I am
>> afraid that Todd is right on here.
>> The folks who are desperate for new features are being siphoned off by
>> and Clojure which is leaving a core of recalcitrant termagants like me.
>Language wise, you are right. I actually use Groovy under Java6 for a
>lot of work these days, as it offers many of the features of the Java7
>language today yet runs on Java6, and lets you subclass java classesand
>vice versa.
>Here, for example, is a MiniDFSCluster for Hadoop 0.20.20x that fixes
>its need for a JVM property to set the dest dir for data (fixed in
>trunk, BTW)
>> I
>> think that it is going to take something as major as the EOL of Java 6
>> get big projects to allow code that requires Java 7.
>JVM-wise, people have problems with the hotspot compiler, which means
>"don't use Hadoop with it"
>I don't know if the updates fix this. Eventually they shall.

It was fixed days before that blog post in OpenJDK trunk and for the JRE6
and JRE7 'next' releases (due out very soon now).  Part of those bugs
exist in JRE 6, but are not exposed due to the default configuration
parameters differing.  JDK7 is safe if you disable the loop predication
optimizations (which first appeared ~JRE 6u21).

>For people to be confident that it works someone needs to bring up a
>large cluster and run it on Hadoop long enough -with a more complex
>workload than terasort -to find the bugs, the race conditions, the
>problems that only surface once you serve up 2+PB of data.

This is the same requirement for a normal Hadoop release.  The testing
needs to be done, regardless of the JRE version.  Simply testing 0.23 or
0.24 with JRE 7u2 (which has the bugfixes above) or later is all that is

What JRE (6 update ?) is planned to be used when testing 0.23 at scale?
Should JRE 7u2 also be tested?  Both a new update to JRE 6 and 7 is due
out very soon.  0.23 will be code complete after that.  If I had enough
resources and time, I'd test both the latest JRE 6 and JRE 7.

>I hope someone volunteers for this. As Oracle have announced a
>Hadoop-based system, they could be the people to step up here, or they
>could pay someone else to do the work.
>In an ideal world, the Hadoop stack would be part of the test stack for
>a JVM release.

A performance regression for Hadoop's pure java CRC32 happened in a recent
JRE6 update, and a bug was filed, and they fixed it and now include that
algorithm in their test suite.  JVM releases don't include whole stacks,
but someone could engage the OpenJDK developers to find out what kind of
contributions OpenJDK can accept for test code -- I'm not sure how
compatible it is with Apache.

>In the meantime, even if Oracle say Java6 is EOL, if people pay money to
>keep it alive -and they will have to in any project you don't want to
>have to requalify for java7- then it may keep going for longer, except
>the updates won't be so widely available.

You can always keep running on the old JVM with the old version of Hadoop
you have had in your cluster, but if you upgrade Hadoop to a new version,
you might as well upgrade your JVM at the same time and pay the testing
cost once.

>I don't know if anyone with a Hadoop cluster has a support contract with
>Oracle for it. I know Yahoo! did, but I don't know it's current state.

View raw message