spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konstantin Boudnik <...@apache.org>
Subject Re: excluding hadoop dependencies in spark's assembly files
Date Tue, 07 Jan 2014 01:01:32 GMT
Well,

somehow I managed to send an email as Roman Shaposhnik :) I guess he got himself a
stacker ;)

Cos

On Mon, Jan 06, 2014 at 04:55PM, Roman Shaposhnik wrote:
> Alex,
> 
> I don't know if it helps or not but sometimes back I made maven assembly to be
> able to package Spark in Bigtop. That assembly exclude all hadoop
> dependencies. So, you can simply build it using maven, instead of sbt.
> 
> Regards,
>   Cos
> 
> On Mon, Jan 06, 2014 at 02:33PM, Alex Cozzi wrote:
> > I am trying to exclude the hadoop jar dependencies from spark’s assembly files,
the reason being that in order to work on our cluster it is necessary to use our now version
of those files instead of the published ones. I tried define the hadoop dependencies as “provided”,
but surpassingly this causes compilation errors in the build. Just to be clear, I modified
the sbt build file 
> > as follows:
> > 
> >   def yarnEnabledSettings = Seq(
> >     libraryDependencies ++= Seq(
> >       // Exclude rule required for all ?
> >       "org.apache.hadoop" % "hadoop-client" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib),
> >       "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib),
> >       "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib),
> >       "org.apache.hadoop" % "hadoop-yarn-client" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib)
> >     )
> >   )
> > 
> > and compile as 
> > 
> >  SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_IS_NEW_HADOOP=true sbt  assembly
> > 
> > 
> > but the assembly still includes the hadoop libraries, contrary to what the assembly
docs say. I managed to exclude them instead by using the non-recommended way:
> > def extraAssemblySettings() = Seq(
> >     test in assembly := {},
> >     mergeStrategy in assembly := {
> >       case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
> >       case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
> >       case "log4j.properties" => MergeStrategy.discard
> >       case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
> >       case "reference.conf" => MergeStrategy.concat
> >       case _ => MergeStrategy.first
> >     },
> >     excludedJars in assembly <<= (fullClasspath in assembly) map { cp =>

> >      cp filter {_.data.getName.contains("hadoop")}
> >     }
> > )
> > 
> > 
> > But I would like to hear whether there is interest in excluding the hadoop jar by
default in the build
> > Alex Cozzi
> > alexcozzi@gmail.com

Mime
View raw message