spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raymond Honderdors <Raymond.Honderd...@sizmek.com>
Subject RE: [discuss] ending support for Java 7 in Spark 2.0
Date Thu, 24 Mar 2016 07:42:18 GMT
Very good points

Going to support java 8 looks like a good direction
2.0 would be a good release to start with that

Raymond Honderdors
Team Lead Analytics BI
Business Intelligence Developer
raymond.honderdors@sizmek.com<mailto:raymond.honderdors@sizmek.com>
T +972.7325.3569
Herzliya

From: Reynold Xin [mailto:rxin@databricks.com]
Sent: Thursday, March 24, 2016 9:37 AM
To: dev@spark.apache.org
Subject: Re: [discuss] ending support for Java 7 in Spark 2.0

One other benefit that I didn't mention is that we'd be able to use Java 8's Optional class
to replace our built-in Optional.


On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <rxin@databricks.com<mailto:rxin@databricks.com>>
wrote:
About a year ago we decided to drop Java 6 support in Spark 1.5. I am wondering if we should
also just drop Java 7 support in Spark 2.0 (i.e. Spark 2.0 would require Java 8 to run).

Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and removed public downloads
for JDK 7 in July 2015. In the past I've actually been against dropping Java 8, but today
I ran into an issue with the new Dataset API not working well with Java 8 lambdas, and that
changed my opinion on this.

I've been thinking more about this issue today and also talked with a lot people offline to
gather feedback, and I actually think the pros outweighs the cons, for the following reasons
(in some rough order of importance):

1. It is complicated to test how well Spark APIs work for Java lambdas if we support Java
7. Jenkins machines need to have both Java 7 and Java 8 installed and we must run through
a set of test suites in 7, and then the lambda tests in Java 8. This complicates build environments/scripts,
and makes them less robust. Without good testing infrastructure, I have no confidence in building
good APIs for Java 8.

2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7. The primary APIs
we want users to use in Spark 2.x are Dataset/DataFrame, and this impacts pretty much everything
from machine learning to structured streaming. We have made great progress in their performance
through extensive use of code generation. (In many dimensions Spark 2.0 with DataFrames/Datasets
looks more like a compiler than a MapReduce or query engine.) These optimizations don't work
well in Java 7 due to broken code cache flushing. This problem has been fixed by Oracle in
Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.

3. Scala 2.12 will come out soon, and we will want to add support for that. Scala 2.12 only
works on Java 8. If we do support Java 7, we'd have a fairly complicated compatibility matrix
and testing infrastructure.

4. There are libraries that I've looked into in the past that support only Java 8. This is
more common in high performance libraries such as Aeron (a messaging library). Having to support
Java 7 means we are not able to use these. It is not that big of a deal right now, but will
become increasingly more difficult as we optimize performance.


The downside of not supporting Java 7 is also obvious. Some organizations are stuck with Java
7, and they wouldn't be able to use Spark 2.0 without upgrading Java.



Mime
View raw message