spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Upgrade the scala code using the most updated Spark version
Date Tue, 28 Mar 2017 19:43:31 GMT
I personally never add the _scala version to the dependency but always crosscompile. This seems
to be cleanest. Additionally Spark dependencies and hadoop dependencies should be provided
not compile. Scalatest seems to be outdated.

I would also not use a local repo, but either an artefact manager (e.g. Artifactory or Nexus)
or download them from the official Spark repos.

Can you publish the full source code? It is hard to assess if the merge strategy is needed.
Maybe start with a simpler build file and a small application and then add your source code.


> On 28. Mar 2017, at 21:33, Marco Mistroni <mmistroni@gmail.com> wrote:
> 
> Hello
>  that looks to me like there's something dodgy withyour Scala installation
> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i suggest you change
one thing at a time in your sbt
> First Spark version. run it and see if it works
> Then amend the scala version
> 
> hth
>  marco
> 
>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <anahita.t.amiri@gmail.com>
wrote:
>> Hello, 
>> 
>> Thanks you all for your informative answers. 
>> I actually changed the scala version to the 2.11.8 and spark version into 2.1.0 in
the build.sbt
>> 
>> Except for these two guys (scala and spark version), I kept the same values for the
rest in the build.sbt file. 
>> ---------------------------------------------------------------------------
>> import AssemblyKeys._
>> 
>> assemblySettings
>> 
>> name := "proxcocoa"
>> 
>> version := "0.1"
>> 
>> scalaVersion := "2.11.8"
>> 
>> parallelExecution in Test := false
>> 
>> {
>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>   libraryDependencies ++= Seq(
>>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>     "org.apache.spark" % "spark-core_2.11" % "2.1.0" excludeAll(excludeHadoop),
>>     "org.apache.spark" % "spark-mllib_2.11" % "2.1.0" excludeAll(excludeHadoop),
>>     "org.apache.spark" % "spark-sql_2.11" % "2.1.0" excludeAll(excludeHadoop),
>>     "org.apache.commons" % "commons-compress" % "1.7",
>>     "commons-io" % "commons-io" % "2.4",
>>     "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>     "com.github.scopt" %% "scopt" % "3.3.0"
>>   )
>> }
>> 
>> {
>>   val defaultHadoopVersion = "1.0.4"
>>   val hadoopVersion =
>>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION", defaultHadoopVersion)
>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" % hadoopVersion
>> }
>> 
>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.1.0"
>> 
>> resolvers ++= Seq(
>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL + ".m2/repository",
>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases",
>>   "Spray" at "http://repo.spray.cc"
>> )
>> 
>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>   {
>>     case PathList("javax", "servlet", xs @ _*)           => MergeStrategy.first
>>     case PathList(ps @ _*) if ps.last endsWith ".html"   => MergeStrategy.first
>>     case "application.conf"                              => MergeStrategy.concat
>>     case "reference.conf"                                => MergeStrategy.concat
>>     case "log4j.properties"                              => MergeStrategy.discard
>>     case m if m.toLowerCase.endsWith("manifest.mf")      => MergeStrategy.discard
>>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  => MergeStrategy.discard
>>     case _ => MergeStrategy.first
>>   }
>> }
>> 
>> test in assembly := {}
>> ----------------------------------------------------------------
>> 
>> When I compile the code, I get the following error:
>> 
>> [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_version_proxcocoa-master/target/scala-2.11/classes...
>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:40:
value mapPartitionsWithSplit is not a member of org.apache.spark.rdd.RDD[String]
>> [error]     val sizes = data.mapPartitionsWithSplit{ case(i,lines) =>
>> [error]                      ^
>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main/scala/utils/OptUtils.scala:41:
value length is not a member of Any
>> [error]       Iterator(i -> lines.length)
>> [error]                           ^
>> ----------------------------------------------------------------
>> It gets the error in the code. Does it mean that for the different version of the
spark and scala, I need to change the main code? 
>> 
>> Thanks, 
>> Anahita
>> 
>> 
>> 
>> 
>> 
>> 
>>> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč <dinko.srkoc@gmail.com>
wrote:
>>> Adding to advices given by others ... Spark 2.1.0 works with Scala 2.11, so set:
>>> 
>>>   scalaVersion := "2.11.8"
>>> 
>>> When you see something like:
>>> 
>>>   "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>> 
>>> that means that library `spark-core` is compiled against Scala 2.10,
>>> so you would have to change that to 2.11:
>>> 
>>>   "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>> 
>>> better yet, let SBT worry about libraries built against particular
>>> Scala versions:
>>> 
>>>   "org.apache.spark" %% "spark-core" % "2.1.0"
>>> 
>>> The `%%` will instruct SBT to choose the library appropriate for a
>>> version of Scala that is set in `scalaVersion`.
>>> 
>>> It may be worth mentioning that the `%%` thing works only with Scala
>>> libraries as they are compiled against a certain Scala version. Java
>>> libraries are unaffected (have nothing to do with Scala), e.g. for
>>> `slf4j` one only uses single `%`s:
>>> 
>>>   "org.slf4j" % "slf4j-api" % "1.7.2"
>>> 
>>> Cheers,
>>> Dinko
>>> 
>>> On 27 March 2017 at 23:30, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:
>>> > check these versions
>>> >
>>> > function create_build_sbt_file {
>>> >         BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/build.sbt
>>> >         [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
>>> >         cat >> $BUILD_SBT_FILE << !
>>> > lazy val root = (project in file(".")).
>>> >   settings(
>>> >     name := "${APPLICATION}",
>>> >     version := "1.0",
>>> >     scalaVersion := "2.11.8",
>>> >     mainClass in Compile := Some("myPackage.${APPLICATION}")
>>> >   )
>>> > libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" %
>>> > "provided"
>>> > libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0" %
>>> > "provided"
>>> > libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.0.0" %
>>> > "provided"
>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.0.0"
%
>>> > "provided"
>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka" %
>>> > "1.6.1" % "provided"
>>> > libraryDependencies += "com.google.code.gson" % "gson" % "2.6.2"
>>> > libraryDependencies += "org.apache.phoenix" % "phoenix-spark" %
>>> > "4.6.0-HBase-1.0"
>>> > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
>>> > libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.3"
>>> > libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.3"
>>> > libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.3"
>>> > // META-INF discarding
>>> > mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old)
=>
>>> >    {
>>> >     case PathList("META-INF", xs @ _*) => MergeStrategy.discard
>>> >     case x => MergeStrategy.first
>>> >    }
>>> > }
>>> > !
>>> > }
>>> >
>>> > HTH
>>> >
>>> > Dr Mich Talebzadeh
>>> >
>>> >
>>> >
>>> > LinkedIn
>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> >
>>> >
>>> >
>>> > http://talebzadehmich.wordpress.com
>>> >
>>> >
>>> > Disclaimer: Use it at your own risk. Any and all responsibility for any
>>> > loss, damage or destruction of data or any other property which may arise
>>> > from relying on this email's technical content is explicitly disclaimed.
The
>>> > author will in no case be liable for any monetary damages arising from such
>>> > loss, damage or destruction.
>>> >
>>> >
>>> >
>>> >
>>> > On 27 March 2017 at 21:45, Jörn Franke <jornfranke@gmail.com> wrote:
>>> >>
>>> >> Usually you define the dependencies to the Spark library as provided.
You
>>> >> also seem to mix different Spark versions which should be avoided.
>>> >> The Hadoop library seems to be outdated and should also only be provided.
>>> >>
>>> >> The other dependencies you could assemble in a fat jar.
>>> >>
>>> >> On 27 Mar 2017, at 21:25, Anahita Talebi <anahita.t.amiri@gmail.com>
>>> >> wrote:
>>> >>
>>> >> Hi friends,
>>> >>
>>> >> I have a code which is written in Scala. The scala version 2.10.4 and
>>> >> Spark version 1.5.2 are used to run the code.
>>> >>
>>> >> I would like to upgrade the code to the most updated version of spark,
>>> >> meaning 2.1.0.
>>> >>
>>> >> Here is the build.sbt:
>>> >>
>>> >> import AssemblyKeys._
>>> >>
>>> >> assemblySettings
>>> >>
>>> >> name := "proxcocoa"
>>> >>
>>> >> version := "0.1"
>>> >>
>>> >> scalaVersion := "2.10.4"
>>> >>
>>> >> parallelExecution in Test := false
>>> >>
>>> >> {
>>> >>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>> >>   libraryDependencies ++= Seq(
>>> >>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>> >>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>> >>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>> >>     "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>> >> excludeAll(excludeHadoop),
>>> >>     "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
>>> >> excludeAll(excludeHadoop),
>>> >>     "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
>>> >> excludeAll(excludeHadoop),
>>> >>     "org.apache.commons" % "commons-compress" % "1.7",
>>> >>     "commons-io" % "commons-io" % "2.4",
>>> >>     "org.scalanlp" % "breeze_2.10" % "0.11.2",
>>> >>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>> >>     "com.github.scopt" %% "scopt" % "3.3.0"
>>> >>   )
>>> >> }
>>> >>
>>> >> {
>>> >>   val defaultHadoopVersion = "1.0.4"
>>> >>   val hadoopVersion =
>>> >>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>> >> defaultHadoopVersion)
>>> >>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>> >> hadoopVersion
>>> >> }
>>> >>
>>> >> libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" %
>>> >> "1.5.0"
>>> >>
>>> >> resolvers ++= Seq(
>>> >>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>> >> ".m2/repository",
>>> >>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases",
>>> >>   "Spray" at "http://repo.spray.cc"
>>> >> )
>>> >>
>>> >> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old)
=>
>>> >>   {
>>> >>     case PathList("javax", "servlet", xs @ _*)           =>
>>> >> MergeStrategy.first
>>> >>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>> >> MergeStrategy.first
>>> >>     case "application.conf"                              =>
>>> >> MergeStrategy.concat
>>> >>     case "reference.conf"                                =>
>>> >> MergeStrategy.concat
>>> >>     case "log4j.properties"                              =>
>>> >> MergeStrategy.discard
>>> >>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>>> >> MergeStrategy.discard
>>> >>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>> >> MergeStrategy.discard
>>> >>     case _ => MergeStrategy.first
>>> >>   }
>>> >> }
>>> >>
>>> >> test in assembly := {}
>>> >>
>>> >> -----------------------------------------------------------
>>> >> I downloaded the spark 2.1.0 and change the version of spark and
>>> >> scalaversion in the build.sbt. But unfortunately, I was failed to run
the
>>> >> code.
>>> >>
>>> >> Does anybody know how I can upgrade the code to the most recent spark
>>> >> version by changing the build.sbt file?
>>> >>
>>> >> Or do you have any other suggestion?
>>> >>
>>> >> Thanks a lot,
>>> >> Anahita
>>> >>
>>> >
>> 
> 

Mime
View raw message