spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Chan ...@ooyala.com>
Subject Re: [DISCUSS] Necessity of Maven *and* SBT Build in Spark
Date Wed, 26 Feb 2014 20:04:29 GMT
Can't maven pom's include other ones?  So what if we remove the
artifact specs from the main pom, have them generated by sbt make-pom,
and include the generated file in the main pom.xml?    I guess, just
trying to figure out how much this would help (it seems at least it
would remove the issue of maintaining and translating dependencies and
exclusions).   If the burden of maintaining the plugins turns out to
be the heavier commitment then maybe it's not worth it.

On Wed, Feb 26, 2014 at 11:55 AM, Mark Hamstra <mark@clearstorydata.com> wrote:
> Yes, but the POM generated in that fashion is only sufficient for linking
> with Spark, not for building Spark or serving as a basis from which to
> build a customized Spark with Maven.  So, starting from SparkBuild.scala
> and generating a POM with make-pom, those who wish to build a customized
> Spark with Maven will have to figure out how to add various Maven plugins
> and other stuff to the generated POM to actually have something useful.
>  Going the other way, starting from a POM that is sufficient to build Spark
> and generating an SBT build with sbt-pom-reader, the Maven plugins in the
> POM appear to be ignored cleanly, but then the developer wishing to build
> Spark using SBT has the burden of figuring out how to add the equivalent of
> the Maven plugins in order to build the assemblies, among other things.
>  Neither way looks completely obvious to me to do programmatically.  Either
> should be do-able given sufficient development and maintenance resources,
> but that could be a pretty heavy commitment (and when Josh Suereth says wrt
> to sbt-pom-reader that mapping maven plugins into sbt is practically a
> failed task, I have every expectation that generating a completely
> satisfactory SBT build from a Maven build would be quite challenging.)
>
>
> On Wed, Feb 26, 2014 at 11:34 AM, Evan Chan <ev@ooyala.com> wrote:
>
>> Mark,
>>
>> No, I haven't tried this myself yet  :-p   Also I would expect that
>> sbt-pom-reader does not do assemblies at all .... because that is an
>> SBT plugin, so we would still need code to include sbt-assembly.
>> There is also the trick question of how to include the assembly stuff
>> into sbt-pom-reader generated projects.  So, needs much more
>> investigation.....
>>
>> My hunch is that it's easier to generate the pom from SBT (make-pom)
>> than the other way around.
>>
>> On Wed, Feb 26, 2014 at 10:54 AM, Mark Hamstra <mark@clearstorydata.com>
>> wrote:
>> > Evan,
>> >
>> > Have you actually tried to build Spark using its POM file and
>> sbt-pom-reader?
>> >  I just made a first, naive attempt, and I'm still sorting through just
>> > what this did and didn't produce.  It looks like the basic jar files are
>> at
>> > least very close to correct, and may be just fine, but that building the
>> > assembly jars failed completely.
>> >
>> > It's not completely obvious to me how to proceed with what sbt-pom-reader
>> > produces in order build the assemblies, run the test suites, etc., so I'm
>> > wondering if you have already worked out what that requires?
>> >
>> >
>> > On Wed, Feb 26, 2014 at 9:31 AM, Evan Chan <ev@ooyala.com> wrote:
>> >
>> >> I'd like to propose the following way to move forward, based on the
>> >> comments I've seen:
>> >>
>> >> 1.  Aggressively clean up the giant dependency graph.   One ticket I
>> >> might work on if I have time is SPARK-681 which might remove the giant
>> >> fastutil dependency (~15MB by itself).
>> >>
>> >> 2.  Take an intermediate step by having only ONE source of truth
>> >> w.r.t. dependencies and versions.  This means either:
>> >>    a)  Using a maven POM as the spec for dependencies, Hadoop version,
>> >> etc.   Then, use sbt-pom-reader to import it.
>> >>    b)  Using the build.scala as the spec, and "sbt make-pom" to
>> >> generate the pom.xml for the dependencies
>> >>
>> >>     The idea is to remove the pain and errors associated with manual
>> >> translation of dependency specs from one system to another, while
>> >> still maintaining the things which are hard to translate (plugins).
>> >>
>> >>
>> >> On Wed, Feb 26, 2014 at 7:17 AM, Koert Kuipers <koert@tresata.com>
>> wrote:
>> >> > We maintain in house spark build using sbt. We have no problem using
>> sbt
>> >> > assembly. We did add a few exclude statements for transitive
>> >> dependencies.
>> >> >
>> >> > The main enemy of assemblies are jars that include stuff they
>> shouldn't
>> >> > (kryo comes to mind, I think they include logback?), new versions of
>> jars
>> >> > that change the provider/artifact without changing the package (asm),
>> and
>> >> > incompatible new releases (protobuf). These break the transitive
>> >> resolution
>> >> > process. I imagine that's true for any build tool.
>> >> >
>> >> > Besides shading I don't see anything maven can do sbt cannot, and if
I
>> >> > understand it correctly shading is not done currently using the build
>> >> tool.
>> >> >
>> >> > Since spark is primarily scala/akka based the main developer base
>> will be
>> >> > familiar with sbt (I think?). Switching build tool is always painful.
>> I
>> >> > personally think it is smarter to put this burden on a limited number
>> of
>> >> > upstream integrators than on the community. However that said I don't
>> >> think
>> >> > its a problem for us to maintain an sbt build in-house if spark
>> switched
>> >> to
>> >> > maven.
>> >> > The problem is, the complete spark dependency graph is fairly large,
>> >> > and there are lot of conflicting versions in there.
>> >> > In particular, when we bump versions of dependencies - making managing
>> >> > this messy at best.
>> >> >
>> >> > Now, I have not looked in detail at how maven manages this - it might
>> >> > just be accidental that we get a decent out-of-the-box assembled
>> >> > shaded jar (since we dont do anything great to configure it).
>> >> > With current state of sbt in spark, it definitely is not a good
>> >> > solution : if we can enhance it (or it already is ?), while keeping
>> >> > the management of the version/dependency graph manageable, I dont have
>> >> > any objections to using sbt or maven !
>> >> > Too many exclude versions, pinned versions, etc would just make things
>> >> > unmanageable in future.
>> >> >
>> >> >
>> >> > Regards,
>> >> > Mridul
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Feb 26, 2014 at 8:56 AM, Evan chan <ev@ooyala.com> wrote:
>> >> >> Actually you can control exactly how sbt assembly merges or resolves
>> >> > conflicts.  I believe the default settings however lead to order which
>> >> > cannot be controlled.
>> >> >>
>> >> >> I do wish for a smarter fat jar plugin.
>> >> >>
>> >> >> -Evan
>> >> >> To be free is not merely to cast off one's chains, but to live
in a
>> way
>> >> > that respects & enhances the freedom of others. (#NelsonMandela)
>> >> >>
>> >> >>> On Feb 25, 2014, at 6:50 PM, Mridul Muralidharan <mridul@gmail.com>
>> >> > wrote:
>> >> >>>
>> >> >>>> On Wed, Feb 26, 2014 at 5:31 AM, Patrick Wendell <
>> pwendell@gmail.com>
>> >> > wrote:
>> >> >>>> Evan - this is a good thing to bring up. Wrt the shader
plug-in -
>> >> >>>> right now we don't actually use it for bytecode shading
- we simply
>> >> >>>> use it for creating the uber jar with excludes (which sbt
supports
>> >> >>>> just fine via assembly).
>> >> >>>
>> >> >>>
>> >> >>> Not really - as I mentioned initially in this thread, sbt's
assembly
>> >> >>> does not take dependencies into account properly : and can
overwrite
>> >> >>> newer classes with older versions.
>> >> >>> From an assembly point of view, sbt is not very good : we are
yet to
>> >> >>> try it after 2.10 shift though (and probably wont, given the
mess it
>> >> >>> created last time).
>> >> >>>
>> >> >>> Regards,
>> >> >>> Mridul
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>>
>> >> >>>> I was wondering actually, do you know if it's possible
to added
>> shaded
>> >> >>>> artifacts to the *spark jar* using this plug-in (e.g. not
an uber
>> >> >>>> jar)? That's something I could see being really handy in
the
>> future.
>> >> >>>>
>> >> >>>> - Patrick
>> >> >>>>
>> >> >>>>> On Tue, Feb 25, 2014 at 3:39 PM, Evan Chan <ev@ooyala.com>
wrote:
>> >> >>>>> The problem is that plugins are not equivalent.  There
is AFAIK no
>> >> >>>>> equivalent to the maven shader plugin for SBT.
>> >> >>>>> There is an SBT plugin which can apparently read POM
XML files
>> >> >>>>> (sbt-pom-reader).   However, it can't possibly handle
plugins,
>> which
>> >> >>>>> is still problematic.
>> >> >>>>>
>> >> >>>>>> On Tue, Feb 25, 2014 at 3:31 PM, yao <yaoshengzhe@gmail.com>
>> wrote:
>> >> >>>>>> I would prefer keep both of them, it would be better
even if that
>> >> > means
>> >> >>>>>> pom.xml will be generated using sbt. Some company,
like my
>> current
>> >> > one,
>> >> >>>>>> have their own build infrastructures built on top
of maven. It is
>> >> not
>> >> > easy
>> >> >>>>>> to support sbt for these potential spark clients.
But I do agree
>> to
>> >> > only
>> >> >>>>>> keep one if there is a promising way to generate
correct
>> >> > configuration from
>> >> >>>>>> the other.
>> >> >>>>>>
>> >> >>>>>> -Shengzhe
>> >> >>>>>>
>> >> >>>>>>
>> >> >>>>>>> On Tue, Feb 25, 2014 at 3:20 PM, Evan Chan
<ev@ooyala.com>
>> wrote:
>> >> >>>>>>>
>> >> >>>>>>> The correct way to exclude dependencies in
SBT is actually to
>> >> declare
>> >> >>>>>>> a dependency as "provided".   I'm not familiar
with Maven or its
>> >> >>>>>>> dependencySet, but provided will mark the entire
dependency
>> tree as
>> >> >>>>>>> excluded.   It is also possible to exclude
jar by jar, but this
>> is
>> >> >>>>>>> pretty error prone and messy.
>> >> >>>>>>>
>> >> >>>>>>>> On Tue, Feb 25, 2014 at 2:45 PM, Koert
Kuipers <
>> koert@tresata.com
>> >> >
>> >> > wrote:
>> >> >>>>>>>> yes in sbt assembly you can exclude jars
(although i never had
>> a
>> >> > need for
>> >> >>>>>>>> this) and files in jars.
>> >> >>>>>>>>
>> >> >>>>>>>> for example i frequently remove log4j.properties,
because for
>> >> > whatever
>> >> >>>>>>>> reason hadoop decided to include it making
it very difficult to
>> >> use
>> >> > our
>> >> >>>>>>> own
>> >> >>>>>>>> logging config.
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>
>> >> >>>>>>>>> On Tue, Feb 25, 2014 at 4:24 PM, Konstantin
Boudnik <
>> >> cos@apache.org
>> >> >>
>> >> >>>>>>>> wrote:
>> >> >>>>>>>>
>> >> >>>>>>>>>> On Fri, Feb 21, 2014 at 11:11AM,
Patrick Wendell wrote:
>> >> >>>>>>>>>> Kos - thanks for chiming in. Could
you be more specific about
>> >> > what is
>> >> >>>>>>>>>> available in maven and not in sbt
for these issues? I took a
>> >> look
>> >> > at
>> >> >>>>>>>>>> the bigtop code relating to Spark.
As far as I could tell [1]
>> >> was
>> >> > the
>> >> >>>>>>>>>> main point of integration with
the build system (maybe there
>> are
>> >> > other
>> >> >>>>>>>>>> integration points)?
>> >> >>>>>>>>>>
>> >> >>>>>>>>>>>  - in order to integrate Spark
well into existing Hadoop
>> stack
>> >> it
>> >> >>>>>>> was
>> >> >>>>>>>>>>>    necessary to have a way
to avoid transitive dependencies
>> >> >>>>>>>>> duplications and
>> >> >>>>>>>>>>>    possible conflicts.
>> >> >>>>>>>>>>>
>> >> >>>>>>>>>>>    E.g. Maven assembly allows
us to avoid adding _all_
>> Hadoop
>> >> > libs
>> >> >>>>>>>>> and later
>> >> >>>>>>>>>>>    merely declare Spark package
dependency on standard
>> Bigtop
>> >> >>>>>>> Hadoop
>> >> >>>>>>>>>>>    packages. And yes - Bigtop
packaging means the naming and
>> >> > layout
>> >> >>>>>>>>> would be
>> >> >>>>>>>>>>>    standard across all commercial
Hadoop distributions that
>> are
>> >> >>>>>>> worth
>> >> >>>>>>>>>>>    mentioning: ASF Bigtop convenience
binary packages, and
>> >> >>>>>>> Cloudera or
>> >> >>>>>>>>>>>    Hortonworks packages. Hence,
the downstream user doesn't
>> >> need
>> >> > to
>> >> >>>>>>>>> spend any
>> >> >>>>>>>>>>>    effort to make sure that
Spark "clicks-in" properly.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> The sbt build also allows you to
plug in a Hadoop version
>> >> similar
>> >> > to
>> >> >>>>>>>>>> the maven build.
>> >> >>>>>>>>>
>> >> >>>>>>>>> I am actually talking about an ability
to exclude a set of
>> >> > dependencies
>> >> >>>>>>>>> from an
>> >> >>>>>>>>> assembly, similarly to what's happening
in dependencySet
>> sections
>> >> > of
>> >> >>>>>>>>>    assembly/src/main/assembly/assembly.xml
>> >> >>>>>>>>> If there is a comparable functionality
in Sbt, that would help
>> >> > quite a
>> >> >>>>>>> bit,
>> >> >>>>>>>>> apparently.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Cos
>> >> >>>>>>>>>
>> >> >>>>>>>>>>>  - Maven provides a relatively
easy way to deal with the
>> >> jar-hell
>> >> >>>>>>>>> problem,
>> >> >>>>>>>>>>>    although the original maven
build was just Shader'ing
>> >> > everything
>> >> >>>>>>>>> into a
>> >> >>>>>>>>>>>    huge lump of class files.
Oftentimes ending up with
>> classes
>> >> >>>>>>>>> slamming on
>> >> >>>>>>>>>>>    top of each other from different
transitive dependencies.
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> AFIAK we are only using the shade
plug-in to deal with
>> conflict
>> >> >>>>>>>>>> resolution in the assembly jar.
These are dealt with in sbt
>> via
>> >> > the
>> >> >>>>>>>>>> sbt assembly plug-in in an identical
way. Is there a
>> difference?
>> >> >>>>>>>>>
>> >> >>>>>>>>> I am bringing up the Sharder, because
it is an awful hack,
>> which
>> >> is
>> >> >>>>>>> can't
>> >> >>>>>>>>> be
>> >> >>>>>>>>> used in real controlled deployment.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Cos
>> >> >>>>>>>>>
>> >> >>>>>>>>>> [1]
>> >> >>>>>>>
>> >> >
>> >>
>> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob;f=bigtop-packages/src/common/spark/do-component-build;h=428540e0f6aa56cd7e78eb1c831aa7fe9496a08f;hb=master
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> --
>> >> >>>>>>> --
>> >> >>>>>>> Evan Chan
>> >> >>>>>>> Staff Engineer
>> >> >>>>>>> ev@ooyala.com  |
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> --
>> >> >>>>> --
>> >> >>>>> Evan Chan
>> >> >>>>> Staff Engineer
>> >> >>>>> ev@ooyala.com  |
>> >>
>> >>
>> >>
>> >> --
>> >> --
>> >> Evan Chan
>> >> Staff Engineer
>> >> ev@ooyala.com  |
>> >>
>>
>>
>>
>> --
>> --
>> Evan Chan
>> Staff Engineer
>> ev@ooyala.com  |
>>



-- 
--
Evan Chan
Staff Engineer
ev@ooyala.com  |

Mime
View raw message