groovy-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff MAURY <jeffma...@jeffmaury.com>
Subject Re: Apache Spark & Groovy
Date Sun, 26 Jul 2015 09:57:24 GMT
So it may be an object stored in your task that is not

Jeff
Le 26 juil. 2015 11:42, "tog" <guillaume.alleon@gmail.com> a écrit :

> Thanks Jeff for your quick answer.
>
> Yes, the tasks shall be serializable and I believe they are.
>
> My test script has 2 tasks (doing the same job) one is a closure, the
> other is a org.apache.spark.api.java.function.Function - and according to
> a small test in my script both are serializable for Java/Groovy.
>
> I am a bit puzzled/stuck here.
>
> On 26 July 2015 at 10:34, Jeff MAURY <jeffmaury@jeffmaury.com> wrote:
>
>> Spark is distribution tasks on cluster nodes so the task needs to be
>> serializable. Appears that you task is a Groovy closure so you must make it
>> serializable.
>>
>> Jeff
>>
>> On Sun, Jul 26, 2015 at 11:12 AM, tog <guillaume.alleon@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I am starting to play with Apache Spark using groovy. I have a small
>>> script <https://gist.github.com/galleon/d6540327c418aa8a479f> that I
>>> use for that purpose.
>>>
>>> When the script is transformed in a class and launched with java, this
>>> is working fine but it fails when run as a script.
>>>
>>> Any idea what I am doing wrong ? May be some of you have already come
>>> accros that problem.
>>>
>>> $ groovy -version
>>>
>>> Groovy Version: 2.4.3 JVM: 1.8.0_40 Vendor: Oracle Corporation OS: Mac
>>> OS X
>>>
>>> $ groovy GroovySparkWordcount.groovy
>>>
>>> class org.apache.spark.api.java.JavaRDD
>>>
>>> true
>>>
>>> true
>>>
>>> Caught: org.apache.spark.SparkException: Task not serializable
>>>
>>> org.apache.spark.SparkException: Task not serializable
>>>
>>> at
>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)
>>>
>>> at
>>> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)
>>>
>>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132)
>>>
>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:1893)
>>>
>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:311)
>>>
>>> at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:310)
>>>
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
>>>
>>> at
>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
>>>
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
>>>
>>> at org.apache.spark.rdd.RDD.filter(RDD.scala:310)
>>>
>>> at org.apache.spark.api.java.JavaRDD.filter(JavaRDD.scala:78)
>>>
>>> at org.apache.spark.api.java.JavaRDD$filter$0.call(Unknown Source)
>>>
>>> at GroovySparkWordcount.run(GroovySparkWordcount.groovy:27)
>>>
>>> Caused by: java.io.NotSerializableException: GroovySparkWordcount
>>>
>>> Serialization stack:
>>>
>>> - object not serializable (class: GroovySparkWordcount, value:
>>> GroovySparkWordcount@57c6feea)
>>>
>>> - field (class: GroovySparkWordcount$1, name: this$0, type: class
>>> GroovySparkWordcount)
>>>
>>> - object (class GroovySparkWordcount$1, GroovySparkWordcount$1@3db1ce78)
>>>
>>> - field (class: org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>>> name: f$1, type: interface org.apache.spark.api.java.function.Function)
>>>
>>> - object (class org.apache.spark.api.java.JavaRDD$$anonfun$filter$1,
>>> <function1>)
>>>
>>> at
>>> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
>>>
>>> at
>>> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>>>
>>> at
>>> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
>>>
>>> at
>>> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
>>>
>>>  ... 12 more
>>>
>>>
>>>
>>
>>
>> --
>> Jeff MAURY
>>
>>
>> "Legacy code" often differs from its suggested alternative by actually
>> working and scaling.
>>  - Bjarne Stroustrup
>>
>> http://www.jeffmaury.com
>> http://riadiscuss.jeffmaury.com
>> http://www.twitter.com/jeffmaury
>>
>
>
>
> --
> PGP KeyID: 2048R/EA31CFC9  subkeys.pgp.net
>

Mime
View raw message