spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: [discuss] ending support for Java 7 in Spark 2.0
Date Thu, 24 Mar 2016 12:41:13 GMT

> On 24 Mar 2016, at 07:27, Reynold Xin <> wrote:
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am wondering if we
should also just drop Java 7 support in Spark 2.0 (i.e. Spark 2.0 would require Java 8 to
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and removed public
downloads for JDK 7 in July 2015.

Still there, Jan 2016 was the last public one.

> In the past I've actually been against dropping Java 8, but today I ran into an issue
with the new Dataset API not working well with Java 8 lambdas, and that changed my opinion
on this.
> I've been thinking more about this issue today and also talked with a lot people offline
to gather feedback, and I actually think the pros outweighs the cons, for the following reasons
(in some rough order of importance):
> 1. It is complicated to test how well Spark APIs work for Java lambdas if we support
Java 7. Jenkins machines need to have both Java 7 and Java 8 installed and we must run through
a set of test suites in 7, and then the lambda tests in Java 8. This complicates build environments/scripts,
and makes them less robust. Without good testing infrastructure, I have no confidence in building
good APIs for Java 8.

+complicates the test matrix for problems: if something works on java 8 and fails on java
7, is that a java 8 problem or a java 7 one?
+most developers would want to be on java 8 on their desktop if they could; the risk is that
people accidentally code for java 8 even if they don't realise it just by using java 8 libraries,

> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7. The primary
APIs we want users to use in Spark 2.x are Dataset/DataFrame, and this impacts pretty much
everything from machine learning to structured streaming. We have made great progress in their
performance through extensive use of code generation. (In many dimensions Spark 2.0 with DataFrames/Datasets
looks more like a compiler than a MapReduce or query engine.) These optimizations don't work
well in Java 7 due to broken code cache flushing. This problem has been fixed by Oracle in
Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.
> 3. Scala 2.12 will come out soon, and we will want to add support for that. Scala 2.12
only works on Java 8. If we do support Java 7, we'd have a fairly complicated compatibility
matrix and testing infrastructure.
> 4. There are libraries that I've looked into in the past that support only Java 8. This
is more common in high performance libraries such as Aeron (a messaging library). Having to
support Java 7 means we are not able to use these. It is not that big of a deal right now,
but will become increasingly more difficult as we optimize performance.
> The downside of not supporting Java 7 is also obvious. Some organizations are stuck with
Java 7, and they wouldn't be able to use Spark 2.0 without upgrading Java.

One thing you have to consider here is : will the organisations that don't want to upgrade
to java 8 want to be upgrading to spark 2.0 anyway? 

If there is a price, it means all apps that use any remote Spark APIs will also have to be
java 8. Something like a REST API is less of an issue, but anything loading an JAR in the
group org.apache.spark will have to be Java 8+. That's what held hadoop back on Java 7 in
2015 : twitter made the case that it shouldn't be the hadoop cluster forcing them to upgrade
all their client apps just to use the IPC and filesystem code.I don't believe that's so much
of a constraint on Spark.

Finally, Java 8 lines you up better for worrying about Java 9, which is on the horizon.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message