spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <>
Subject Re: Discussion: Consolidating Spark's build system
Date Tue, 16 Jul 2013 01:23:17 GMT
Hi Matei.

The reason I am using Maven for Bigtop packaging and not SBT is because the
the former's dependency management is clean and let me build a proper assembly
with only relevant dependencies: e.g. no Hadoop if I don't need to, etc.

I don't hold onto the packaging the way it is done in the current maven build,
because of the use of the Shader plugin: I believe flattening project
dependencies is a suboptimal way to go. 

I am glad that you're calling to cease the use of classifiers. Big +1 on that!
Using alternative names or versions to reflect dependency differences is
certainly a great idea!

I, perhaps, don't know much about SBT, but I think it is trying to solve Maven
rigidity the way the Gradle did. However, the latter is introducing a
well-defined DSL and integrates with Maven/Ant more transparently than SBT

That said, I would love to stick with more mature build system, that is also
wider accepted in Java community. But if the people involved into the project
want to go with SBT as a build platform - that will work from Bigtop
standpoint of view as far as we'd able to get a sensible set of libraries for
further packaging (a-la

Hope it helps,

On Mon, Jul 15, 2013 at 05:41PM, Matei Zaharia wrote:
> Hi all,
> I wanted to bring up a topic that there isn't a 100% perfect solution for,
> but that's been bothering the team at Berkeley for a while: consolidating
> Spark's build system. Right now we have two build systems, Maven and SBT,
> that need to be maintained together on each change. We added Maven a while
> back to try it as an alternative to SBT and to get some better publishing
> options, like Debian packages and classifiers, but we've found that 1) SBT
> has actually been fairly stable since then (unlike the rapid release cycle
> before) and 2) classifiers don't actually seem to work for publishing
> versions of Spark with different dependencies (you need to give them
> different artifact names). More importantly though, because maintaining two
> systems is confusing, it would be good to converge to just one soon, or to
> find a better way of maintaining the builds.
> In terms of which system to go for, neither is perfect, but I think many of
> us are leaning toward SBT, because it's noticeably faster and it has less
> code to maintain. If we do this, however, I'd really like to understand the
> use cases for Maven, and make sure that either we can support them in SBT or
> we can do them externally. Can people say a bit about that? The ones I've
> thought of are the following:
> - Debian packaging -- this is certainly nice, but there are some plugins for
> SBT too so may be possible to migrate.  - BigTop integration; I'm not sure
> how much this relies on Maven but Cos has been using it.
> - Classifiers for hadoop1 and hadoop2 -- as far as I can tell, these don't
> really work if you want to publish to Maven Central; you still need two
> artifact names because the artifacts have different dependencies. However,
> more importantly, we'd like to make Spark work with all Hadoop versions by
> using hadoop-client and a bit of reflection, similar to how projects like
> Parquet handle this.
> Are there other things I'm missing here, or other ways to handle this
> problem that I'm missing? For example, one possibility would be to keep the
> Maven build scripts in a separate repo managed by the people who want to use
> them, or to have some dedicated maintainers for them. But because this is
> often an issue, I do think it would be simpler for the project to have one
> build system in the long term. In either case though, we will keep the
> project structure compatible with Maven, so people who want to use it
> internally should be fine; I think that we've done this well and, if
> anything, we've simplified the Maven build process lately by removing Twirl.
> Anyway, as I said, I don't think any solution is perfect here, but I'm
> curious to hear your input.
> Matei

View raw message