mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Palumbo <ap....@outlook.com>
Subject RE: [DISCUSS] Naming convention for multiple spark/scala combos
Date Sat, 08 Jul 2017 21:30:32 GMT
+1 if so  (sbt naming re: pats comment).

Also +1 on Zeppelin integration being non-trivial.



Sent from my Verizon Wireless 4G LTE smartphone


-------- Original message --------
From: Pat Ferrel <pat@occamsmachete.com>
Date: 07/07/2017 10:35 PM (GMT-08:00)
To: dev@mahout.apache.org
Cc: Holden Karau <holden.karau@gmail.com>, user@mahout.apache.org, Dmitriy Lyubimov
<dlieu.7@gmail.com>, Andrew Palumbo <apalumbo@apache.org>
Subject: Re: [DISCUSS] Naming convention for multiple spark/scala combos

IIRC these all fit sbt’s conventons?


On Jul 7, 2017, at 2:05 PM, Trevor Grant <trevor.d.grant@gmail.com> wrote:

So to tie all of this together-

org.apache.mahout:mahout-spark_2.10:0.13.1_spark_1_6
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2_0
org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2_1

org.apache.mahout:mahout-spark_2.11:0.13.1_spark_1_6
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2_0
org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2_1

(will jars compiled with 2.1 dependencies run on 2.0? I assume not, but I
don't know) (afaik, mahout compiled for spark 1.6.x tends to work with
spark 1.6.y, anecdotal)

A non-trivial motivation here, is we would like all of these available to
tighten up the Apache Zeppelin integration, where the user could have a
number of different spark/scala combos going on and we want it to 'just
work' out of the box (which means a wide array of binaries available, to
dmitriy's point).

I'm +1 on this, and as RM will begin cutting a provisional RC, just to try
to figure out how all of this will work (it's my first time as release
master, and this is a new thing we're doing).

72 hour lazy consensus. (will probably take me 72 hours to figure out
anyway ;) )

If no objections expect an RC on Monday evening.

tg

On Fri, Jul 7, 2017 at 3:24 PM, Holden Karau <holden.karau@gmail.com> wrote:

> Trevor looped me in on this since I hadn't had a chance to subscribe to
> the list yet (on now :)).
>
> Artifacts from cross spark-version building isn't super standardized (and
> their are two sort of very different types of cross-building).
>
> For folks who just need to build for the 1.X and 2.X and branches
> appending _spark1 & _spark2 to the version string is indeed pretty common
> and the DL4J folks do something pretty similar as Trevor pointed out.
>
> The folks over at hammerlab have made some sbt specific tooling to make
> this easier to do on the publishing side (see https://github.com/hammer
> lab/sbt-parent )
>
> It is true some people build Scala 2.10 artifacts for Spark 1.X series and
> 2.11 artifacts for Spark 2.X series only and use that to differentiate (I
> don't personally like this approach since it is super opaque and someone
> could upgrade their Scala version and then accidentally be using a
> different version of Spark which would likely not go very well).
>
> For folks who need to hook into internals and cross build against
> different minor versions there is much less of a consistent pattern,
> personally spark-testing-base is released as:
>
> [artifactname]_[scalaversion]:[sparkversion]_[artifact releaseversion]
>
> But this really only makes sense when you have to cross-build for lots of
> different Spark versions (which should be avoidable for Mahout).
>
> Since you are likely not depending on the internals of different point
> releases, I'd think the _spark1 / _spark2 is probably the right way (or
> _spark_1 / _spark_2 is fine too).
>
>
> On Fri, Jul 7, 2017 at 11:43 AM, Trevor Grant <trevor.d.grant@gmail.com>
> wrote:
>
>>
>> ---------- Forwarded message ----------
>> From: Andrew Palumbo <ap.dev@outlook.com>
>> Date: Fri, Jul 7, 2017 at 12:28 PM
>> Subject: Re: [DISCUSS] Naming convention for multiple spark/scala combos
>> To: "dev@mahout.apache.org" <dev@mahout.apache.org>
>>
>>
>> another option for artifact names (using jars for example here):
>>
>>
>> mahout-spark-2.11_2.10-0.13.1.jar
>> mahout-spark-2.11_2.11-0.13.1.jar
>> mahout-math-scala-2.11_2.10-0.13.1.jar
>>
>>
>> i.e. <module>-<spark version>-<scala version>-<mahout-version>.jar
>>
>>
>> not exactly pretty.. I somewhat prefer Trevor's idea of Dl4j convention.
>>
>> ________________________________
>> From: Trevor Grant <trevor.d.grant@gmail.com>
>> Sent: Friday, July 7, 2017 11:57:53 AM
>> To: Mahout Dev List; user@mahout.apache.org
>> Subject: [DISCUSS] Naming convention for multiple spark/scala combos
>>
>> Hey all,
>>
>> Working on releasing 0.13.1 with multiple spark/scala combos.
>>
>> Afaik, there is no 'standard' for multiple spark versions (but I may be
>> wrong, I don't claim expertise here).
>>
>> One approach is simply only release binaries for:
>> Spark-1.6 + Scala 2.10
>> Spark-2.1 + Scala 2.11
>>
>> OR
>>
>> We could do like dl4j
>>
>> org.apache.mahout:mahout-spark_2.10:0.13.1_spark_1
>> org.apache.mahout:mahout-spark_2.11:0.13.1_spark_1
>>
>> org.apache.mahout:mahout-spark_2.10:0.13.1_spark_2
>> org.apache.mahout:mahout-spark_2.11:0.13.1_spark_2
>>
>> OR
>>
>> some other option I don't know of.
>>
>>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message