spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Compton <compton.r...@gmail.com>
Subject Re: best practice: write and debug Spark application in scala-ide and maven
Date Sat, 07 Jun 2014 19:16:45 GMT
Sounds like there's two questions here:

First, from the command line, if you "mvn package" and then run the
code with "java -cp targe/*jar-with-dependencies.jar com.ibm.App" do
you still get the error?

Second, for quick debugging, I agree that it's a pain to wait for mvn
package to finish every time a line of code changes. To avoid this
when working on a new (buggy) file you can add your working
-jar-with-dependencies.jar into the spark-shell using the ADD_JARS
variable, then, after making a few changes to the buggy file, use
":load" from spark-shell. This will let you try out the new class
without waiting for the whole mvn package.



On Sat, Jun 7, 2014 at 3:19 AM, Gerard Maas <gerard.maas@gmail.com> wrote:
> I think that you have two options:
>
> - to run your code locally, you can use local mode by using the 'local'
> master like so:
>  new SparkConf().setMaster("local[4]")  where 4 is the number of cores
> assigned to the local mode.
>
> - to run your code remotely you need to build the jar with dependencies and
> add it to your context.
> new
> SparkConf().setMaster("spark://uri").addJars(Array("/path/to/target/jar-with-dependencies.jar")
> You will need to run maven before running your program to ensure the latest
> version of your jar is built.
>
> -regards, Gerard.
>
>
>
> On Sat, Jun 7, 2014 at 3:10 AM, Wei Tan <wtan@us.ibm.com> wrote:
>>
>> Hi,
>>
>>   I am trying to write and debug Spark applications in scala-ide and
>> maven, and in my code I target at a Spark instance at spark://xxx
>>
>> object App {
>>
>>
>>   def main(args : Array[String]) {
>>     println( "Hello World!" )
>>     val sparkConf = new
>> SparkConf().setMaster("spark://xxx:7077").setAppName("WordCount")
>>
>>     val spark = new SparkContext(sparkConf)
>>     val file = spark.textFile("hdfs://xxx:9000/wcinput/pg1184.txt")
>>     val counts = file.flatMap(line => line.split(" "))
>>                  .map(word => (word, 1))
>>                  .reduceByKey(_ + _)
>>     counts.saveAsTextFile("hdfs://flex05.watson.ibm.com:9000/wcoutput")
>>   }
>>
>> }
>>
>> I added spark-core and hadoop-client in maven dependency so the code
>> compiles fine.
>>
>> When I click run in Eclipse I got this error:
>>
>> 14/06/06 20:52:18 WARN scheduler.TaskSetManager: Loss was due to
>> java.lang.ClassNotFoundException
>> java.lang.ClassNotFoundException: samples.App$$anonfun$2
>>
>> I googled this error and it seems that I need to package my code into a
>> jar file and push it to spark nodes. But since I am debugging the code, it
>> would be handy if I can quickly see results without packaging and uploading
>> jars.
>>
>> What is the best practice of writing a spark application (like wordcount)
>> and debug quickly on a remote spark instance?
>>
>> Thanks!
>> Wei
>>
>>
>> ---------------------------------
>> Wei Tan, PhD
>> Research Staff Member
>> IBM T. J. Watson Research Center
>> http://researcher.ibm.com/person/us-wtan
>
>

Mime
View raw message