spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <>
Subject Re: RFC: removing Scala 2.10
Date Tue, 07 Mar 2017 05:51:42 GMT
Hit sent too soon.

Actually my chart included only clusters on Spark 2.x, ie I excluded 1.x. I
also did one with Spark 1.x and I saw no substantial difference in
distribution for Scala versions. On the question of how many "would be
unable to" upgrade to Scala 2.12, I have no way to find out unless I go
talk to every one of them which is too expensive. My experience with Scala
upgrade, having done a few of them for Spark and for other projects, is
that it is very difficult and frustrating experience.

On Databricks this is actually not an issue at all because our customers
can manage multiple clusters with different versions of Spark easily
(select an old version of Spark with Scala 2.10 in one click).

As engineers, we all love to delete old code and simplify the build (5000
line gone!). In a previous email I said we never deprecated it. After
looking at it more, I realized this we did deprecate it partially: We
updated the docs and added a warning in SparkContext, but didn't announce
it in the release notes (mostly my fault).As a result, even I thought Scala
2.10 wasn't deprecated when I saw no mention of it in the release notes.

(Given we had partially deprecated Scala 2.10 support in Spark 2.1, I feel
less strongly about keeping it.)

Now look at the cost of keeping Scala 2.10: The part that defines Scala
2.10/2.11 support rarely changes, at least until we want to add support for
Scala 2.12 (and we are not adding 2.12 support in Spark 2.2). The actually
cost, which annoys some of us, is just the occasional build breaks (mostly
due to the use of Option.contains). It looks like this happened roughly
once a mont,h and each time it took just a few mins to resolve.

So the cost seems very low. Perhaps we should just deprecate it more
formally in 2.2 given the whole system is set to have it working, and kill
it next release.

On Mon, Mar 6, 2017 at 9:23 PM, Reynold Xin <> wrote:

> Actually my chart included only clusters on Spark 2.x, ie I excluded 1.x.
> On Mon, Mar 6, 2017 at 8:34 PM Stephen Boesch <> wrote:
>> Hi Reynold,
>>  This is not necessarily convincing.  Many installations are still on
>> spark 1.X - including at the large company I work at.  When moving to 2.2 -
>> whenever that might happen -  it would be a reasonable expectation to also
>> move off of an old version of scala.  Of the 30% of customers shown I
>> wonder how many are both ( a) on spark 2.X/scala 2.10 *now **and* ( b)
>> would be unable to manage a transition to scala 2.11/2.12 whenever the move
>> to  spark 2.2 were to happen.
>> stephenb
>> 2017-03-06 19:04 GMT-08:00 Reynold Xin <>:
>> For some reason the previous email didn't show up properly. Trying again.
>> ---------- Forwarded message ----------
>> From: *Reynold Xin*
>> Date: Mon, Mar 6, 2017 at 6:37 PM
>> Subject: Re: RFC: removing Scala 2.10
>> To: Sean Owen <>
>> Cc: dev <>
>> Thanks for sending an email. I was going to +1 but then I figured I
>> should be data driven. I took a look at the distribution of Scala versions
>> across all the clusters Databricks runs (which is a very high number across
>> a variety of tech startups, SMBs, large enterprises, and this is the chart:
>> [image: Inline image 1]
>> Given 30% are still on Scala 2.10, I'd say we should officially deprecate
>> Scala 2.10 in Spark 2.2 and remove the support in a future release (e.g.
>> 2.3). Note that in the past we only deprecated Java 7 / Python 2.6 in 2.0,
>> and didn't do anything with Scala 2.10.
>> On Mon, Mar 6, 2017 at 1:18 AM, Sean Owen <> wrote:
>> Another call for comments on removal of Scala 2.10 support, if you
>> haven't already. See
>> I've heard several votes in support and no specific objections at this
>> point, but wanted to make another call to check for any doubts before I go
>> ahead for Spark 2.2.

View raw message