Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2F78B18082 for ; Thu, 24 Mar 2016 07:37:24 +0000 (UTC) Received: (qmail 89998 invoked by uid 500); 24 Mar 2016 07:37:22 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 89876 invoked by uid 500); 24 Mar 2016 07:37:22 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 89864 invoked by uid 99); 24 Mar 2016 07:37:21 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Mar 2016 07:37:21 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 75A3E18048B for ; Thu, 24 Mar 2016 07:37:21 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.279 X-Spam-Level: * X-Spam-Status: No, score=1.279 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=databricks-com.20150623.gappssmtp.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id D0XFQS63QQwX for ; Thu, 24 Mar 2016 07:37:19 +0000 (UTC) Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 5D4BB5F47C for ; Thu, 24 Mar 2016 07:37:19 +0000 (UTC) Received: by mail-ob0-f178.google.com with SMTP id m7so30964625obh.3 for ; Thu, 24 Mar 2016 00:37:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=databricks-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=cCsOvugmDrN0QCPC0CnqRb6pYSozSWS2da7KI/sCm+E=; b=IcxtR8d/vUuu9h4Rf5P8PdeMFBCFllplqXMK6PHM+Fgao/nEX11sPa7mk+3Bl17YOs 6gh514uUpYJc46AQKs2ISib3vxQq6Nk/c/q4C1Jva2fTsEF57ECmIdzNVIpQmqv9FtQf K4SvA5oMcBT7BFoxIxqZbH1MJNM7CC6aNKeS/y5ugfmhgyJ7M9ccmG9HwZaClFwi9W82 OdgV7Ln7aqMU3mxw+tznHiVVI6AozR+KEa/5Tqkwp+uM56rlZqf8eWZnFVBNDOJn9w/K ZjyWOnq0DfWZiOTxnK9QrzUEZXYsf7knj7aDTFqRi5lMao4epuNWIEObxtog1l5aihJi UaGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=cCsOvugmDrN0QCPC0CnqRb6pYSozSWS2da7KI/sCm+E=; b=NZBWHosWFjF7dfbVHbGEvAXOWE56UpYVtIiSf4KcaxN4eZbxIERhXyvDEFetnZic5v xjHEzo+wBNOpY/REPsNwpXA7d1DOMYnZ1pG+uEuaSxNlEC5NfBnwPYnue3DRQNYWPSdI esBsnIy6459fqB9pLjq20aszIz2NZgK3rLuPwM5bXb/ANJUdkngvyHYxX+496T686Tbq 3YKhDeoCFMCwCU1y1pJaRjE4mBmg1iEQFQI2yW9Xrm/EspzzIQ59oe1FDkKGupDh5l6J lJb813JCowL9LOVerqIJ3W6G/Dh41wbLGBg8MzpnNkJ9xyabJM6w2YtyUJn2XarVEorb Ip8w== X-Gm-Message-State: AD7BkJJk8xfY2W10H222MQwMyrNVVrUF08cSNg3LxnJK90+XMGu/MjQaxuiSfJc1t52fDEAjXAJXPnpXSBco2w== X-Received: by 10.60.117.102 with SMTP id kd6mr3548651oeb.73.1458805038362; Thu, 24 Mar 2016 00:37:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.202.226.75 with HTTP; Thu, 24 Mar 2016 00:36:58 -0700 (PDT) In-Reply-To: References: From: Reynold Xin Date: Thu, 24 Mar 2016 00:36:58 -0700 Message-ID: Subject: Re: [discuss] ending support for Java 7 in Spark 2.0 To: "dev@spark.apache.org" Content-Type: multipart/alternative; boundary=047d7b417e631bc898052ec684df --047d7b417e631bc898052ec684df Content-Type: text/plain; charset=UTF-8 One other benefit that I didn't mention is that we'd be able to use Java 8's Optional class to replace our built-in Optional. On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin wrote: > About a year ago we decided to drop Java 6 support in Spark 1.5. I am > wondering if we should also just drop Java 7 support in Spark 2.0 (i.e. > Spark 2.0 would require Java 8 to run). > > Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and > removed public downloads for JDK 7 in July 2015. In the past I've actually > been against dropping Java 8, but today I ran into an issue with the new > Dataset API not working well with Java 8 lambdas, and that changed my > opinion on this. > > I've been thinking more about this issue today and also talked with a lot > people offline to gather feedback, and I actually think the pros outweighs > the cons, for the following reasons (in some rough order of importance): > > 1. It is complicated to test how well Spark APIs work for Java lambdas if > we support Java 7. Jenkins machines need to have both Java 7 and Java 8 > installed and we must run through a set of test suites in 7, and then the > lambda tests in Java 8. This complicates build environments/scripts, and > makes them less robust. Without good testing infrastructure, I have no > confidence in building good APIs for Java 8. > > 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java > 7. The primary APIs we want users to use in Spark 2.x are > Dataset/DataFrame, and this impacts pretty much everything from machine > learning to structured streaming. We have made great progress in their > performance through extensive use of code generation. (In many dimensions > Spark 2.0 with DataFrames/Datasets looks more like a compiler than a > MapReduce or query engine.) These optimizations don't work well in Java 7 > due to broken code cache flushing. This problem has been fixed by Oracle in > Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD. > > 3. Scala 2.12 will come out soon, and we will want to add support for > that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a > fairly complicated compatibility matrix and testing infrastructure. > > 4. There are libraries that I've looked into in the past that support only > Java 8. This is more common in high performance libraries such as Aeron (a > messaging library). Having to support Java 7 means we are not able to use > these. It is not that big of a deal right now, but will become increasingly > more difficult as we optimize performance. > > > The downside of not supporting Java 7 is also obvious. Some organizations > are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without > upgrading Java. > > > --047d7b417e631bc898052ec684df Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
One other benefit that I didn't mention is that we'= ;d be able to use Java 8's Optional class to replace our built-in Optio= nal.


On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <= ;rxin@databricks.c= om> wrote:
About a year ago we decided to drop Java 6 support in Spark 1.5. I am wond= ering if we should also just drop Java 7 support in Spark 2.0 (i.e. Spark 2= .0 would require Java 8 to run).

Oracle ended public upd= ates for JDK 7 in one year ago (Apr 2015), and removed public downloads for= JDK 7 in July 2015. In the past I've actually been against dropping Ja= va 8, but today I ran into an issue with the new Dataset API not working we= ll with Java 8 lambdas, and that changed my opinion on this.

=
I've been thinking more about this issue today and also talk= ed with a lot people offline to gather feedback, and I actually think the p= ros outweighs the cons, for the following reasons (in some rough order of i= mportance):

1. It is complicated to test how well = Spark APIs work for Java lambdas if we support Java 7. Jenkins machines nee= d to have both Java 7 and Java 8 installed and we must run through a set of= test suites in 7, and then the lambda tests in Java 8. This complicates bu= ild environments/scripts, and makes them less robust. Without good testing = infrastructure, I have no confidence in building good APIs for Java 8.

2. Dataset/DataFrame performance will be between 1x to= 10x slower in Java 7. The primary APIs we want users to use in Spark 2.x a= re Dataset/DataFrame, and this impacts pretty much everything from machine = learning to structured streaming. We have made great progress in their perf= ormance through extensive use of code generation. (In many dimensions Spark= 2.0 with DataFrames/Datasets looks more like a compiler than a MapReduce o= r query engine.) These optimizations don't work well in Java 7 due to b= roken code cache flushing. This problem has been fixed by Oracle in Java 8.= In addition, Java 8 comes with better support for Unsafe and SIMD.

3. Scala 2.12 will come out soon, and we will want to add= support for that. Scala 2.12 only works on Java 8. If we do support Java 7= , we'd have a fairly complicated compatibility matrix and testing infra= structure.

4. There are libraries that I've lo= oked into in the past that support only Java 8. This is more common in high= performance libraries such as Aeron (a messaging library). Having to suppo= rt Java 7 means we are not able to use these. It is not that big of a deal = right now, but will become increasingly more difficult as we optimize perfo= rmance.


The downside of not support= ing Java 7 is also obvious. Some organizations are stuck with Java 7, and t= hey wouldn't be able to use Spark 2.0 without upgrading Java.



--047d7b417e631bc898052ec684df--