Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CAPh_B=ZLfqtf_B9K-_r8ffCem0azoO487_zhJFmT_qFQnLRQCQ@mail.gmail.com>
References: 
 <CAPh_B=ZLfqtf_B9K-_r8ffCem0azoO487_zhJFmT_qFQnLRQCQ@mail.gmail.com>
From: Reynold Xin <rxin@databricks.com>
Date: Thu, 24 Mar 2016 00:36:58 -0700
Message-ID: 
 <CAPh_B=YQJyGiKVdBCx9qpSyvd5+a4Aofq+FiykafLM9aVojrQA@mail.gmail.com>
Subject: Re: [discuss] ending support for Java 7 in Spark 2.0
To: "dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=047d7b417e631bc898052ec684df

--047d7b417e631bc898052ec684df
Content-Type: text/plain; charset=UTF-8

One other benefit that I didn't mention is that we'd be able to use Java
8's Optional class to replace our built-in Optional.


On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <rxin@databricks.com> wrote:

> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> Spark 2.0 would require Java 8 to run).
>
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> removed public downloads for JDK 7 in July 2015. In the past I've actually
> been against dropping Java 8, but today I ran into an issue with the new
> Dataset API not working well with Java 8 lambdas, and that changed my
> opinion on this.
>
> I've been thinking more about this issue today and also talked with a lot
> people offline to gather feedback, and I actually think the pros outweighs
> the cons, for the following reasons (in some rough order of importance):
>
> 1. It is complicated to test how well Spark APIs work for Java lambdas if
> we support Java 7. Jenkins machines need to have both Java 7 and Java 8
> installed and we must run through a set of test suites in 7, and then the
> lambda tests in Java 8. This complicates build environments/scripts, and
> makes them less robust. Without good testing infrastructure, I have no
> confidence in building good APIs for Java 8.
>
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java
> 7. The primary APIs we want users to use in Spark 2.x are
> Dataset/DataFrame, and this impacts pretty much everything from machine
> learning to structured streaming. We have made great progress in their
> performance through extensive use of code generation. (In many dimensions
> Spark 2.0 with DataFrames/Datasets looks more like a compiler than a
> MapReduce or query engine.) These optimizations don't work well in Java 7
> due to broken code cache flushing. This problem has been fixed by Oracle in
> Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.
>
> 3. Scala 2.12 will come out soon, and we will want to add support for
> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a
> fairly complicated compatibility matrix and testing infrastructure.
>
> 4. There are libraries that I've looked into in the past that support only
> Java 8. This is more common in high performance libraries such as Aeron (a
> messaging library). Having to support Java 7 means we are not able to use
> these. It is not that big of a deal right now, but will become increasingly
> more difficult as we optimize performance.
>
>
> The downside of not supporting Java 7 is also obvious. Some organizations
> are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without
> upgrading Java.
>
>
>

--047d7b417e631bc898052ec684df
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">One other benefit that I didn&#39;t mention is that we&#39=
;d be able to use Java 8&#39;s Optional class to replace our built-in Optio=
nal.<div><br></div></div><div class=3D"gmail_extra"><br><div class=3D"gmail=
_quote">On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <span dir=3D"ltr">&lt=
;<a href=3D"mailto:rxin@databricks.com" target=3D"_blank">rxin@databricks.c=
om</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"=
>About a year ago we decided to drop Java 6 support in Spark 1.5. I am wond=
ering if we should also just drop Java 7 support in Spark 2.0 (i.e. Spark 2=
.0 would require Java 8 to run).<div><br></div><div>Oracle ended public upd=
ates for JDK 7 in one year ago (Apr 2015), and removed public downloads for=
 JDK 7 in July 2015. In the past I&#39;ve actually been against dropping Ja=
va 8, but today I ran into an issue with the new Dataset API not working we=
ll with Java 8 lambdas, and that changed my opinion on this.</div><div><br>=
</div><div>I&#39;ve been thinking more about this issue today and also talk=
ed with a lot people offline to gather feedback, and I actually think the p=
ros outweighs the cons, for the following reasons (in some rough order of i=
mportance):</div><div><br></div><div>1. It is complicated to test how well =
Spark APIs work for Java lambdas if we support Java 7. Jenkins machines nee=
d to have both Java 7 and Java 8 installed and we must run through a set of=
 test suites in 7, and then the lambda tests in Java 8. This complicates bu=
ild environments/scripts, and makes them less robust. Without good testing =
infrastructure, I have no confidence in building good APIs for Java 8.</div=
><div><br></div><div>2. Dataset/DataFrame performance will be between 1x to=
 10x slower in Java 7. The primary APIs we want users to use in Spark 2.x a=
re Dataset/DataFrame, and this impacts pretty much everything from machine =
learning to structured streaming. We have made great progress in their perf=
ormance through extensive use of code generation. (In many dimensions Spark=
 2.0 with DataFrames/Datasets looks more like a compiler than a MapReduce o=
r query engine.) These optimizations don&#39;t work well in Java 7 due to b=
roken code cache flushing. This problem has been fixed by Oracle in Java 8.=
 In addition, Java 8 comes with better support for Unsafe and SIMD.</div><d=
iv><br></div><div>3. Scala 2.12 will come out soon, and we will want to add=
 support for that. Scala 2.12 only works on Java 8. If we do support Java 7=
, we&#39;d have a fairly complicated compatibility matrix and testing infra=
structure.</div><div><br></div><div>4. There are libraries that I&#39;ve lo=
oked into in the past that support only Java 8. This is more common in high=
 performance libraries such as Aeron (a messaging library). Having to suppo=
rt Java 7 means we are not able to use these. It is not that big of a deal =
right now, but will become increasingly more difficult as we optimize perfo=
rmance.</div><div><br></div><div><br></div><div>The downside of not support=
ing Java 7 is also obvious. Some organizations are stuck with Java 7, and t=
hey wouldn&#39;t be able to use Spark 2.0 without upgrading Java.</div><div=
><br></div><div><br></div></div>
</blockquote></div><br></div>

--047d7b417e631bc898052ec684df--