spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <>
Subject Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Date Tue, 25 Feb 2014 21:24:37 GMT
On Fri, Feb 21, 2014 at 11:11AM, Patrick Wendell wrote:
> Kos - thanks for chiming in. Could you be more specific about what is
> available in maven and not in sbt for these issues? I took a look at
> the bigtop code relating to Spark. As far as I could tell [1] was the
> main point of integration with the build system (maybe there are other
> integration points)?
> >   - in order to integrate Spark well into existing Hadoop stack it was
> >     necessary to have a way to avoid transitive dependencies duplications and
> >     possible conflicts.
> >
> >     E.g. Maven assembly allows us to avoid adding _all_ Hadoop libs and later
> >     merely declare Spark package dependency on standard Bigtop Hadoop
> >     packages. And yes - Bigtop packaging means the naming and layout would be
> >     standard across all commercial Hadoop distributions that are worth
> >     mentioning: ASF Bigtop convenience binary packages, and Cloudera or
> >     Hortonworks packages. Hence, the downstream user doesn't need to spend any
> >     effort to make sure that Spark "clicks-in" properly.
> The sbt build also allows you to plug in a Hadoop version similar to
> the maven build.

I am actually talking about an ability to exclude a set of dependencies from an
assembly, similarly to what's happening in dependencySet sections of  
If there is a comparable functionality in Sbt, that would help quite a bit,


> >   - Maven provides a relatively easy way to deal with the jar-hell problem,
> >     although the original maven build was just Shader'ing everything into a
> >     huge lump of class files. Oftentimes ending up with classes slamming on
> >     top of each other from different transitive dependencies.
> AFIAK we are only using the shade plug-in to deal with conflict
> resolution in the assembly jar. These are dealt with in sbt via the
> sbt assembly plug-in in an identical way. Is there a difference?

I am bringing up the Sharder, because it is an awful hack, which is can't be
used in real controlled deployment.


> [1];a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master

View raw message