spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Graves <tgraves...@yahoo.com.INVALID>
Subject Re: [discuss] ending support for Java 7 in Spark 2.0
Date Wed, 30 Mar 2016 13:46:14 GMT
Steve, those are good points, I had forgotten Hadoop had those issues.    We run with jdk
8, hadoop is built for jdk7 compatibility, we are running hadoop 2.7 on our clusters and by
the time Spark 2.0 is out I would expected a mix of Hadoop 2.7 and 2.8.  We also don't use
spnego.
I didn't quite follow what you were saying with the hadoop services being on jdk7.  Are you
saying building spark with say hadoop 2.8 libraries but your hadoop cluster is running hadoop
2.6 or less? If so I would agree that isn't a good idea.
Personally and from Yahoo point I'm still fine with going to jdk8 but I could see where other
people are on older versions of Hadoop where it might be a problem.
Tom 

    On Wednesday, March 30, 2016 5:42 AM, Steve Loughran <stevel@hortonworks.com> wrote:
 

 
Can I note that if Spark 2.0 is going to be Java 8+ only, then that means Hadoop 2.6.x should
be the minimum Hadoop version.
https://issues.apache.org/jira/browse/HADOOP-11090
Where things get complicated, is that situation of: Hadoop services on Java 7, Spark on Java
8 in its own JVM
I'm not sure that you could get away with having the newer version of the Hadoop classes in
the spark assembly/lib dir, without coming up against incompatibilities with the Hadoop JNI
libraries. These are currently backwards compatible, but trying to link up Hadoop 2.7 against
a Hadoop 2.6 hadoop lib will generate an UnsatisfiedLinkException. Meaning: the whole cluster's
hadoop libs have to be in sync, or at least the main cluster release in a version of hadoop
2.x >= the spark bundled edition.
Ignoring that detail, 
Hadoop 2.6.1+Guava >= 15? 17?
 I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug reports will
be met with a "please upgrade, re-open if the problem is still there". 
Kerberos is  a particular troublespot here : You need Hadoop 2.6.1+ for Kerberos to work
in Java 8 and recent versions of Java 7 (HADOOP-10786)
Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about pulling that
into 2.7.x, though I'm reluctant to go near 2.6 just to keep that extra stable.

Thomas: you've got the big clusters, what versions of Hadoop will they be on by the time you
look at Spark 2.0?
-Steve


 

  
Mime
View raw message